The index to the articles in this series is found here.
Last time we left our network trainer, it was plugging away at 40GB of virtual space, looking to spend a couple of days per epoch for training. It never made it through the first epoch, some time overnight the memory pressure became so great that my desktop got OOMed, taking the training process with it (I hadn’t run it inside nohup).
Well, we had already decided that this run wasn’t going to work. So, I decided the next thing to try was to coarse-grain the data. This is an invasive change, so I set a git tag on the source code called “FirstVersion”, and started editing.
I updated my intermediate binary file to support multiple embedded coarsenesses. This changed the API somewhat, so several files that made use of RpBinReaders had to be modified. My coarsened files contain two data points, maximum and average for the 2×2, 3×3, or 4×4 (by default) non-overlapping coarsening stencils.
Now, my data points have always been single bytes, I didn’t want to go to floats because of the higher cost in disk space and reading (the files are internally compressed, so there’s a CPU cost with reading them). While the maximum value of a set of bytes is easily represented in a single byte, the exact average is not so simple. But, we don’t need exact averages, something close enough is fine. Our input data isn’t perfect to begin with, it’s binned into 14 levels. So, since the average of a set of non-negative numbers is necessarily between 0 and the maximum value, I reserved a second byte to hold a value such that max * VAL2 / 255 is close to the average value in the cell. Inexpensive, and clearly good enough for our purposes.
The changes have been pushed to the git tree, and I’m rerunning with a 4×4 coarsening. This is looking to be a lot faster, with only one-eighth of the number of neurons (1/16 in pixels, but each pixel is now two values). We’re expecting roughly 3 hours per epoch, and memory consumption is similarly reduced.
I’ll watch this to see how it’s running. Meanwhile, though, thinking about how the high number of neurons needed to be reduced, and the coarsening I’ve been using, has led me to thinking of a radically different solution that I’ll discuss soon. Still neural networks on coarsened pixels, but different from what I’ve been showing so far. Anyway, that’s coming up.