Building a Rain Predictor. Coarse-scaled improvements.

The index to the articles in this series is found here.

So, with the 4×4 coarse-scaling active, the network now trains in under two hours per epoch. That’s certainly an improvement, but laying out the problem in code and trying it out has clarified some points for me, and I now think I can do much better.

Oh, to people interested in the code that I include in these articles, always check in the github archive for changes, since I won’t be going back into the postings for every bugfix.

Now, let’s look at the network in more detail. You’ll recall that my motivation over the course of these articles has been to try to reduce the number of nodes in the system. There are two reasons for this. First, a larger number of nodes will take longer to train and consume more memory, so it will be more difficult to get to a working network. Second, a large number of nodes implies a large number of free parameters in the model, and I want to reduce the risk of overfitting.

So, let’s think about the network I laid out. We’ll use the un-coarsened system. The active pixels form a disc with a radius of 240 pixels. That means we have roughly 180000 meaningful pixels per time step. Ignoring the tripwire, which is small on this scale, each pixel feeds into one module. The bullseye module is about 1250 pixels, the rest are in a ring module. The ring modules have 1000 nodes on their first layer, while the bullseye has 300. The ring module networks are shared, four input modules feed each network. This means that the first layer has a total of 45 million weights. That’s kind of ridiculously high. The modules have much smaller output node counts, 5 each, for an additional 5000 or 1500 weights, insignificant compared to the big first layer. We have about 120 inputs to the LSTM layer, still much less than a million weights through there, and the synth layer and output layers are also small.

I think I can do without those 45 million weights. While planning the coarsening code (think of max and avg pooling layers in TensorFlow), I started to wonder what the module layers are intended to do, and whether they were really necessary.

The module layers are there to do some automated feature selection before the LSTM layer. I thought they might pick up shapes like cold fronts which produce lines of rain, small spots of moderate rainfall intensity over a few dozen pixels, or diffuse uniform regions when we just have a light drizzle over huge areas. Those are the three patterns I see most often on the weather radar. The LSTM layer would then extract the through-time behaviour to figure out how these patterns were moving through the region. The synthesis layers would take this raw through-time data and figure out how to make the output values from it.

Still, 45 million weights. That’s a lot.

Recall the diagram I put up of the modules from before. The dartboard with 33 regions on it. What if I made that diagram significantly finer-grained? Maybe 400 regions to start with. Then, I skip the entire module step. Each region has only two numbers, average and maximum intensity. I feed these into the LSTM layer directly, with no lower layers. That throws away 45 million weights. Instead, we have 800 inputs, and let’s say 500 neurons in the LSTM layer. The computational burden of the LSTM layer is something I’m not ready to estimate yet, if all those 500 neurons feed back into the network, then it’s still only 650000 weights, and the cost of the time tracking math isn’t that high.

So, that’s the next thing to try. This means rewriting the data generator and the training code with its network topology. I’m going to make new files for these, rather than editing the old ones, just so that it’s easier to bring them up side by side from a git checkout.

I’m going to code this up next.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

反垃圾邮件 / Anti-spam question * Time limit is exhausted. Please reload CAPTCHA.