The index to the articles in this series is found here.
What has been happening since the phantom rain work? I’m happy with the training set y-values again, but I was having trouble generating useful networks. I discovered later that a bug introduced when I was working on the phantom rain problem effectively deleted half of the feature data, so there wasn’t very much for the neural network to chew on to train itself. I wrote a small script to read the features out of the preprocessed files and generate .gifs of them, so I could, in effect, see what the network sees.
My earlier networks were being trained on data that was the average rain value over all pixels in a sector (even pixels with no rain), and the maximum rain value for any pixel in the sector. This didn’t seem to be doing very well.
Recall that, early on, I decided that I wasn’t just going to feed the .gif files into the network and let it work out its own features. The data sets are too big, and the number of neurons that would be needed to handle them was too high. So, very early on I moved to synthesized features.
Now, I’m just trying different synthetic feature combinations to see what works well. Each iteration takes about two days, which is the amount of time it takes my current machine to preprocess the image files into a format suitable for the neural network training code.
The current feature combination I’m trying out consists of three bytes per sector, and still the same 400 sectors. The first byte represents the fraction of pixels in the sector that show any rain at all. The second byte is a measure of the mean rain intensity in all rain pixels. The third byte is the root-mean-squared rain intensity in all rain pixels. We shall see, once the preprocessing is complete, whether this produces something more useful for the neural network training.