The index to the articles in this series is found here.
Well, over the course of the network design, we’ve gone from full precision inputs feeding modular neural networks up to LSTM, to 4×4 downscaled inputs feeding the same structure, to our current design, 800 total inputs per time step.
The effect of this has been to go from a system that took 2 days per epoch and overloaded my computer’s memory on batch sizes of 512 down to 2 hours per epoch, and then 90 seconds per epoch. I can also now fit the entire training set comfortably in memory.
Even with the speed improvements I put in with preprocessing the data, loading the entire set of data for one epoch took about 90 seconds. Suspiciously like the 90 seconds per epoch for running the system. Keras pre-loads the next batch of inputs in another thread so that the worker doesn’t have to wait for input, but if the worker is faster than the generator, it will still block. As we can now fit the entire dataset into memory, I modified the generator to cache the entire input set. This way it doesn’t have to go back to the disc between epochs. By not being disc-bound, we can get all cores on my box in use. I’ve got a 4-core machine.
With those further changes, resident space is about 2.5 GB per thread. Time per epoch is about 8 seconds. I’ll probably soon tear the generator out now that everything fits in memory. The current design actually keeps three entire copies of the training set in memory, and that’s silly. One copy is the cached data in the generator. One is the current data training the model, and the last is the preloaded data for the next batch.
The files are fairly minor variations of rpgenerator.py and rptrainer.py. Rather than reproducing them here, I just point you to their entries in the git archive. The files are in the top of the directory, named rpgenerator2.py and rptrainer2.py.
Well, now that we’ve got a system that trains nicely, it’s time to begin experimenting with settings. The first thing we’re going to want to do is to adjust the training parameters. That’s because I’m getting a loss improvement of about .0004 after each batch, regardless of batch size. So, when I train an entire epoch at once, I improve the loss by about 0.0004 in the early epochs, but if I divide the input set into 12 batches, I improve by about 0.005 per epoch. I’ve got to fix that.