Tag Archives: keras

Building a Rain Predictor. Index of articles.

So, this is all running a bit longer than I expected when I began the series of postings. Having links to previous articles at the top of each article isn’t very clear or efficient, so here’s an index page I’ll update as new postings are added.

  1. Introduction to the problem. I describe what problem I’m trying to solve, give an example of the data that will be used to train, and lay out a rough sketch of where I’m going with this project.
  2. Preparing baseline data. I show how I produce a single rain-free image that will be used as a baseline subtraction so that I can extract only rain-related pixels, without cluttering my data with local geography.
  3. Domain knowledge. I discuss the form of the data, its known characteristics, and how I think this will influence the design and topology of the neural network. I also discuss the possibility of generating synthetic data by rotations, to handle the rare, but possible cases of rain coming from the East. I explain why those rotations might be useful in the case of Ottawa, but in certain other cities, local geography would make such operations less valuable.
  4. Extracting rain information. Here I provide the code that converts the raw .gif files downloaded from Environment Canada into an intermediate binary representation of my own design, in which only actual rainfall intensities appear.
  5. Building data for single images. Here, I provide code that reads this intermediate binary representation and decides whether a particular image indicates rainfall present in Ottawa at that moment. The code also computes a sequence number, an integer which will differ by one between two radar images that were taken 10 minutes apart.
  6. Locating training candidates. Our neural network is a predictive model, it has to be trained on continuous runs of images. The ‘true’ values against which we will be computing errors and training the network are, themselves, a function of multiple input files. We sometimes have missing files due to radar downtime, network disconnects on my home machine, or delays in the .gif files being made available on the Environment Canada website. In this article I show how complete continuous runs of images are located and the true values extracted, then all this information is stored in a single record for later use.
  7. Github sources. With all the source files I’ve been supplying, and as I started to patch them, it becomes necessary to provide a more convenient way for readers to examine these files. This article describes the github archive, and how to locate it.
  8. Thinning the data. Some of our training candidates are uninteresting. Clear skies to the radar horizon producing no rain is something that we want in the training set, but we don’t need a thousand or more examples of that. Categorical neural networks train best on balanced training sets, where positive and negative outcomes are roughly equal. I go over this here.
  9. Neural network logical design. Here I use diagrams to show how the training data is interpreted in two-dimensional space, and how the neural network is to be laid out to process this.
  10. Terms and definitions. I decided that I was using a lot of technical terms without providing much context, and interested readers might be getting lost in some of the jargon, so I provided a very brief overview of terms specifically related to what I’ve been discussing in this series of postings. This is not an exhaustive overview of the terminology of neural networks.
  11. Basic Keras code. At last, I show you a python script that invokes Keras code. This is just a toy example, there to experiment with syntax, but it encapsulates all of the features that are neither obvious nor trivial in a short snippet for readability.
  12. Data discussion. Before training the network, I take an aside to discuss data and our training parameters. I also split the set of candidates into training and validation groups.
  13. The data generator. A short review of the code I’m using to feed training data into Keras. It allows Keras to load batches, subsets of the training set, and adjusting the weights after each batch as it walks through the training set.
  14. Training the model. So, now I can feed the data in and train the model. It’s straining the hardware of my machine, so we discuss next steps, how to make the model more compact.
  15. Coarse-scaling the model. The first neural network training attempt failed because our model was simply much too big. Even if the memory had been sufficient to run to completion, the training time was impractical. So, the next thing to try was to coarse-scale the data. This covers that approach.
  16. Observations from the coarse-scaling. I go over the results of the changes to the network, and describe the next approach I plan to take, which will greatly simplify the neural network.
  17. Preprocessing the data. It turns out that we now have a bottleneck in the conversion of raw data to inputs suitable for the neural network in the new topology. So, I’ve changed the preprocessor to compute input vectors and save them in the intermediate binary files.
  18. Performance improvements. I discuss the impact of the changes that I’ve made over the course of this project, and where we go next.
  19. Preparing to tune. I present the current state of the training code, and discuss how we’re going to be tuning the network.
  20. The holdout dataset. I discuss another dataset, and how we’ll be evaluating the usefulness of our model.
  21. Initial tuning of the network. I plot some graphs to show the effect of different optimizers and batch sizes.
  22. Overfitting and the Adagad optimizer. A short post to show a very clear example of network overfitting.
  23. Results of first experiments. We review the performance of our different optimizers in the first run of experiments. We decide that we need to re-run these experiments, and that we’ll also want to put in regularizers. A data formatting error invalidates these results.
  24. Preparing for regularization. I discuss the objective and methods of regularization that I intend to investigate for this project. We’re preparing to use TensorBoard. A data formatting error invalidates these results.
  25. Analysing the weights. I put up some histograms of the weights and talk about what we can conclude from them. A data formatting error invalidates these results.
  26. Found an error in the inputs. The recent analysis is invalidated due to a software bug. I have to start over.
  27. Repeat runs. I repeat the experiment runs, and get somewhat disappointing accuracy. Then I look at the failed predictions, and realize the training data is wrong, and the network is right.
  28. Success. The network now performs as well or better than I can by looking at the same images. I’ll be tweaking and experimenting with the aim of reducing incorrect predictions.
  29. Followup experiments. I tried to improve the discrimination of the model with a few network tweaks, but nothing really made a difference.
  30. Graphical widget. A little graphical widget now sits on my screen, giving me rain predictions at a glance.
  31. Summary of project. The project is essentially complete. I describe what I’ll be doing in future, and may write followup posts if something interesting comes up.
  32. More data. I’ve downloaded more training data. Discuss some obstacles, and future directions.
  33. Results with more data. Did some training with the larger dataset. It revealed problems in the earlier approach, and forces me to revisit some ideas.
  34. A data problem is found. I figure out why the larger training set is behaving so strangel
  35. New features. I’m still experimenting to find good features for the rain prediction network.
  36. Final thoughts. The project is complete, and working well. Thank you for your attention. This has been a very long series!

Writing a Rain Predictor. Basic Keras code.

The index to the articles in this series is found here.

OK, so we’re about to get to the actual writing of the program that will train our neural network. There are going to be many things going on there, and I wanted to start with a basic view of the network topology, with the accompanying Keras code.

Here’s a sample python program, keras-trial.py. It won’t be part of the final solution, but it demonstrates the way we’ll put pieces together in the rain predictor training code. I omit things like noise layers, pooling layers, and other modifiers that can be inlined trivially into this code.

#! /usr/bin/python3

# Figure out how to make our through-time siamese network with shared
# weights

import keras
from keras.layers import Input, Dense, Concatenate, LSTM
from keras.models import Sequential, Model

import sys
import numpy as np


ring0_pixels = 5
ring1_pixels = 4
timesteps = 2
batch_size = 128
ring0_module_nodes_0 = 4
ring0_module_nodes_1 = 3
ring1_module_nodes_0 = 4
ring1_module_nodes_1 = 3
synth_layer_nodes = 3
num_outputs = 3




ring00 = Input(batch_shape=(batch_size, timesteps, ring0_pixels))
ring01 = Input(batch_shape=(batch_size, timesteps, ring0_pixels))
ring10 = Input(batch_shape=(batch_size, timesteps, ring1_pixels))
ring11 = Input(batch_shape=(batch_size, timesteps, ring1_pixels))

ring0_model = Sequential()
ring0_model.add(Dense(ring0_module_nodes_0))
ring0_model.add(Dense(ring0_module_nodes_1))

ring1_model = Sequential()
ring1_model.add(Dense(ring1_module_nodes_0))
ring1_model.add(Dense(ring1_module_nodes_1))

scanned00 = ring0_model(ring00)
scanned01 = ring0_model(ring01)
scanned10 = ring1_model(ring10)
scanned11 = ring1_model(ring11)


aggregated = Concatenate()([scanned00, scanned01, scanned10, scanned11])


time_layer = LSTM(3, stateful=False, return_sequences=True)(aggregated)

synth_layer = Dense(synth_layer_nodes)(time_layer)
output_layer = Dense(num_outputs)(synth_layer)

model = Model(inputs=[ring00, ring01,
                      ring10, ring11],
              outputs=[output_layer])
# model.compile(optimizer='SGD', loss=keras.losses.mean_squared_error)

I wanted to show that I can have multiple neural networks, each operating independently on multiple partitions of the input data, these modules then feed up into a recurrent network, on to an interpretation layer, and output.

For simplicity in this demo implementation, I’ve divided my input data into four partitions, ring00, ring01, ring10, and ring11. I create a two-layer dense neural network called ring0_model that acts separately on ring00 and ring01, and a second two-layer dense neural network, ring1_model, that acts on ring10, and ring11. The four output tensors of these two models are then concatenated into a list of values which feed into the recurrent LSTM layer. This produces a set of outputs that are processed by a dense hidden layer, and then an output layer.

In reality, I’ll have 34 different rings feeding 10 different models. Each of those 10 models will produce some number of outputs that are passed to the LSTM layer for through-time analysis, then on to the synthesis layers for output. There will be 6 time steps passed through LSTM before output is collected for error estimation.

The batch_size is a count of the number of training candidates that will be passed through the network before back-propagation and weight updating occurs. This may be smaller than the total number of training candidates, so there will be multiple weight updates over the course of completing a single pass through the training data (called an epoch).

And there we have the bare bones of our Keras implementation. Next, we write the actual code that feeds image data into the network, and start experimenting with settings and parameters.

UPDATE #1 (2019-08-23): Included a link to an index of articles in this series.

Writing a Rain Predictor. Some terms and definitions.

The index to the articles in this series is found here.

I’ve been throwing about a lot of somewhat technical terms, and it occurs to me that I probably should have spent some time explaining them up front.

If your understanding of neural networks is along the lines of saying “a directed graph in layers, with neurons that sum their inputs in a linear combination, add a bias, and feed through a sigmoid function before going to the next layer”, that’s a start, but there’s much more subtlety than that once you get into the details.

Some types of layers:

I’ve mentioned various types of layers before. Note, however, that a neural network can be made without layers at all, though doing so poses its own difficulties of mathematics and implementation.

The simplest layer people usually think of in the context of neural networks is the dense, or fully connected layer. Such a layer might have N inputs, and consist of M neurons. Each neuron will take, as input, all of the N inputs and produce a single output. The topology is symmetric, from a connection standpoint all of the neurons look equivalent. Of course, as the layer is trained, weights will change, and the neurons will not behave identically. From an ease of implementation standpoint, one usually treats the bias term as an additional input, with its value set to 1. This means that each neuron has N+1 weights, and there are M neurons, for a total of M*(N+1) weights. In implementation, these will generally be stored in a tensor. Think of a tensor as an array with an arbitrary number of dimensions.

I’ve mentioned convolutional layers a few times in this series. This is a layer that acts on groups of proximate data. The simplest way to think of a convolutional layer is to imagine a system for computer vision. The inputs are separate pixels in the image laid out on a two-dimensional grid, and we can describe certain pixels as being adjacent to certain other pixels. We can construct a neighbour graph that states that the pixel at (10,9) is adjacent to the pixel at (10,8). A convolutional layer, rather than taking the values of the pixels as inputs, performs a convolution operation on the pixels. For each pixel it produces one or more values that are functions of the pixel and its neighbours. If you think of the values on the pixels as representing a function in two dimensions, then the convolution is a function of the function. There are several different convolutions one could apply, including nonlinear ones that involve min() and max(), but a simple linear one would be this: imagine that you want to use only the eight nearest neighbours in a square grid, then you have the centre pixel and the eight pixels that surround it, for a 3×3 grid of pixels with their associated values. The convolution function itself is another 3×3 grid of numbers. The convolution operation is to multiple these two matrices together, then add the values of the resulting 3×3 product matrix together to produce a single number. The convolution operation then slides over to a new next pixel and the operation repeats. There is a lot more behind convolutional layers, but I’m not going to do a thorough discussion of them here. You might want to imagine a one-dimensional problem, say a noisy signal on a measurement, and an associated kernel that looks like [0.2,0.2,0.2,0.2,0.2], and what this convolution would do to the signal.

A sparse layer is the converse of a dense one. It’s a layer in which not all neurons receive all inputs. One typically generates these topologies by hand, due to some specific knowledge of the problem space, though there are automated techniques, such as training a dense network for a while and then forcefully cutting low-weight inputs to zero. The aim here is to reduce computational effort by removing connections that have little bearing on the final answer. There have been reports, however, that these layers yield networks that are prone to convergence problems, so one should bear that possibility in mind when using sparse layers.

A recurrent layer is one that feeds back into itself. That is, its inputs at time N+1 include one or more values derived from a function of its outputs from one or more earlier timesteps. The details of this function lead to different types of recurrent networks, including simple, GRU, and LSTM. A recurrent neural network is one way in which the network can be designed to track behaviour through time, which is exactly what I’m trying to do in the rain predictor.

These four cases make a good introduction to the kinds of active layers that are often used in neural network problems. I’m not including layers that Keras and TensorFlow define but which I think of more as topological transformations, such as pooling or merge layers, or simple functional layers where each neuron takes a single input, applies a mathematical operation on that one value (such as computing its square or adding noise) and produces a new value.

Activation functions

Without a nonlinear activation function, the outputs of a neural network are a simple linear combination of inputs (including bias), which means it can be represented by a single matrix, so the layers could be collapsed into a single layer. To get interesting behaviour, you need a nonlinear activation function. This is the function that is applied to the linear combination of inputs to produce the output.

Keras supplies several activation functions, the choice of function is a bit subtle. Typically one doesn’t use sigmoid or tanh activation functions on the intermediate layers because it’s easy to find oneself in the regime of small but non-zero derivatives that cause significant slowdowns in the training process, as it depends on gradient descent. Common choices for the intermediate layers are ReLU, leaky ReLU, and hard sigmoid, all of which have bounded derivatives that exclude small non-zero values.

For our output layer, we will be using sigmoid functions, since we’re looking for a 0/1 distinction, and sigmoid is the obvious candidate.

Regularization

There are some common problems that can arise in neural networks. Regularization can help address at least two big ones: overfitting and unrealistic high weights on certain inputs.

Overfitting is a common concern. It’s one reason I’ve spent so much effort on minimizing neurons in our neural network. With enough parameters in play, you can run a line through any collection of points. A neural network might become obsessively accurate at reproducing the training data, at the expense of generality. This is what is generally referred to as overfitting. We will address this concern statistically in a later posting.

The other common problem is an unrealistically high dependence on a particular input. The neural network might sniff out some coincidental correlation between some subset of the inputs and the desired outputs, and put a lot of weight on that. This is, in effect, another manifestation of overfitting, but stands a bit in its own category.

Regularization is how one typically avoid overfitting. A first obvious technique is referred to as “early stopping”. As you continue to train, your network becomes better at matching the training data. It may, however, start to drift ever further from your validation data (you will have validation data). Early stopping just asserts that once the network’s performance on the test data starts to degrade, that you stop your training. Naturally, it’s not quite that simple, the agreement with the validation data can worsen before ultimately improving further, so various heuristics are used to decide when to stop training.

Some other regularization techniques rely on the observation that neural networks are, or should be, fairly robust in the face of neuron dropouts or noisy data. You train the network while it is faced with problems like the deletion of a significant fraction of its nodes. Between training passes, the nodes that are deleted change. This prevents any small set of nodes from dominating the behaviour of the system, because if some of those nodes are deleted, the network will suddenly perform poorly, and will train away from that configuration.

One can also introduce noise in the outputs of one layer, as inputs to the next. This helps to force the network to configure itself for a broader volume of the phase space of inputs, thereby becoming more general, and less likely to overfit.

Finally, to avoid the problem of extremely large coefficients in some places in the network, a penalty term can be applied in the error estimation simply due to the presence of large coefficients. This causes the network to push itself away from configurations that have large coefficients.

One regularization technique that is sometimes applied is to delete weights that are small but non-zero, and set them to zero. This is helpful for reducing network complexity, but, as noted above, it is suggested that convergence suffers if this is applied too liberally.

In conclusion

So, that’s a brief overview of some of the terms I’ve been throwing around in this set of articles. One can practically write an entire book around the content of each paragraph above, so there’s a lot more detail to explore.

UPDATE #1 (2019-08-23): Included a link to an index of articles in this series.

Writing a Rain Predictor. Designing our neural network

The index to the articles in this series is found here.

Machine learning is, as they say, a very active field. There are many ways we can go from here, and we’re likely to try a few different approaches as we explore this problem.

Now, as I’ve alluded to in previous articles, my initial thought is to make this a modular neural network. Divide the input space into separate modules that are then processed, rather than producing a network that cares intimately about the details of individual pixels. Rain far away is probably not as interesting as rain close to Ottawa, so we can use bigger buckets of points far away. I described a dartboard-like module layout. Here’s an example of what I’m thinking might work:

Modules for the neural network topology

Here, the radar station is at the red star, and Ottawa is roughly on the green spot. This divides the pictures into 33 clusters.

I’d also like to add another module. I’ll call this a tripwire. It will process a ring of pixels around Ottawa, at a distance of, say, 30 pixels. Our graphs have a resolution of one kilometre per pixel, so this represents a sensitive region extending roughly an hour out of Ottawa, for moderate wind speeds. If rain starts falling in this tripwire, there’s a good chance that rain is imminent in Ottawa, and I want my neural network to pay particular attention to this condition.

These modules will do some analysis of the pixels they contain, and then will produce a set of outputs. We haven’t decided yet how many numbers will be output, we’ll experiment a bit later.

So, that’s a logical description of the layout, now to details. I’m expecting each module to involve a dense neural network of two layers. I’m not planning to make them convolutional to begin with, but might try that later, as convolutional layers might be able to distinguish incoming cold fronts, where a line of rain crosses the Ottawa Valley. In an earlier posting I dismissed the idea of using convolutional layers, but on further consideration, I’m not going to rule it out now.

Now, what about independence? Might it be possible to use a single network to process all modules in the outer ring, another for the ring inward of that, and so on? This would reduce the number of weights in our model almost eight-fold, and might therefore make the problem more tractable in terms of training time and memory consumption. This is tempting, but remember the issue with transformations that aren’t part of the symmetry group of the square grid. We have D_2 and D_4 symmetry groups available, which means that we can do rotations of 90 degrees without loss, but 45 degree rotations are a bit tricky, we lose the guarantee of 1-1 mapping, and we can’t ensure that all modules have exactly the same number of pixels in them. So, a compromise, we’ll use two independent networks per ring. Two adjacent modules will have independent neural networks, and then rotations of 90 degrees will span the entire ring. We can build a mapping of pixels to modules that guarantees identical geometries and numbers of pixels, preserving neighbour relationships.

Above these modules, I expect to have at least two more dense layers that process the inputs from the modules and compute the outputs. These layers will be relatively light-weight. If each module produces 5 outputs, and we have 34 modules, then we’ve got about 170 inputs to the synthesis layers at the top.

And, what about time dependence? I’m not thinking of using embedding at this time, where we treat the time dimension as just one more spatial dimension and so wind up with six times as many inputs and six times as many weights. I don’t really think that the output at time T+1 depends on subtle calculations of the input at pixel P1 at time T-3 and also pixel P2 at time T-1 and pixel P3 at time T, so having all that data on hand and feeding into the same network simultaneously seems undesirable.

The way I’m planning to handle the time series is with recurrent layers. I’m thinking GRU or LSTM layers, rather than simple recurrent neural network layers. My initial thought was that the recurrence would be in the modules, but I’ve changed my mind. The problem we’re trying to solve isn’t best described by the time evolution within a module (geometric sector), but the time evolution of movement between modules. So, the recurrent layer(s) would be in the synthesis layers above the modules.

Here’s a layout of the flow of data in this topology. The arrows indicate flow of information, not single values. Images are fed into the modules, and some analysis is done. The modules produce some information which is fed into one or more recurrent layers whose job it is to analyse the time-dependent behaviour of the system. This then feeds into one or more layers that roll everything up into our rain predictions.

So, what I’m imagining the network is probably going to do when it has trained is that the modules will produce some sort of proxies for rain intensity, radial movement, and tangential movement of rain structures. The layer(s) above will tie together this information from different geometric sectors through time, along with the tripwire module, to produce some outputs that are related to how rain is moving through the system. The top layer(s) then produce the outputs.

In the next posting, I plan to review some terminology and approaches, for those who are not yet familiar with the details of neural network programming.

UPDATE #1 (2019-08-23): Included a link to an index of articles in this series.

Writing a rain predictor. Domain knowledge.

The index to the articles in this series is found here.

So, I’ve got a baseline image I can use to extract active precipitation data from the government website. Now, can we just throw everything into a neural network?

The temptation is there, let the network do everything, but we have to understand our problem well enough to make sure we are using the network well. After all, neural networks turn into big black boxes, you can’t easily pull out weights one at a time and examine them to see if they look right.

Well, our time series of data for rainfall is going to have clouds moving more or less in a straight line and at a speed in the tens of kilometres per hour. Most of these events will move from roughly WSW towards ENE, but we should expect that rain might blow in from any direction. The problem is that my training set almost certainly doesn’t have any data for rain blowing in from the East. If I train with the data I have, the neural network will decide that rainfall to the East of Ottawa can never cause rainfall in Ottawa, and so input weights for pixels on that side will be zero or small. Then, when rain really does come in from there, the neural network will be surprised, and I will get wet.

It might be nice, then, to synthesize data for rain coming from the East. The most obvious way to do this is to rotate the incoming data. Note, though, that I don’t want the position of Ottawa itself to change on the plot, and the radar dish is not in Ottawa, but some distance to the South-West, in Franktown. Now, by rotating the input data we are assuming that weather patterns that approach from the East are very similar to those that approach from the West. They display a similar lack of high deflections, have similar speeds of approach, and that the unconsidered causal agents responsible for the weather (such as fronts, temperature trends, moisture loss, etc.) can be treated as symmetrical. This assumption would probably not be true for a place like Hamilton, where lake effect snows mean that precipitation to the East is qualitatively different from that to the West, or Vancouver, with a huge ocean to the West and snow-covered mountains to the East. These places don’t demonstrate the degree of rotational symmetry that we consider important for such a simple data synthesis method. Ottawa, however, far from lakes and in a large plain, with only relatively low hills in the vicinity, doesn’t have any obvious external asymmetries aside from that imposed by the prevailing winds.

If you look at the map in part 1 of this series, you’ll see that this data rotation approach on the Franktown radar station data wouldn’t work very well at all for rain in Montreal, as it is near the East edge of the imaging area, so we don’t actually have much data for rain inbound from East of Montreal. For Ottawa, though, we might be able to manage. Montreal has its own McGill radar station, so we wouldn’t be using Franktown data for Montreal in any case.

Rotations are a bit tricky on square data. We can easily rotate by steps of 90 degrees, because those operations are part of the symmetry group of the square (D_4 for all you group theory enthusiasts out there). Outside of those simple rotations, issues arise with floating-point roundoff, voids, and many-to-one mappings. If we have two adjacent pixels with value ‘3’ in the unrotated input set, and these pixels, after rotation, project partially onto four different pixels, how are we going to assign values to those pixels? The ‘3’ indicates intensity, so we don’t want to smooth it out to 1.5 on each pixel, and we’re using discrete intensities rather than continuous, so it might be awkward to try to sum up contributions on pixels in the rotated dataset. For now, we will just use the three available 90 degree rotations, as they do not distort the data, and we’ll evaluate the system’s behaviour to see if smaller rotations are necessary for good predictions.

We think, because our radar image is a disc, and rain travels more or less in straight lines, that radial movement of rain will be important. That is, rain moving toward the radar station will influence the likelihood of rain in Ottawa. Also, we’re probably not too worried about the fine details of rain falling far away, but close to Ottawa we need some finer resolution. So, let’s cut up the space a bit like a dartboard. We’ll cut the disc into a certain number of equal-width wedges, and each wedge will be cut by arcs of fixed radius. We’ll adjust the parameters of these regions to suit our model later.

So, do we centre these regions on Ottawa, or on Franktown? I’m going to centre them on the radar facility itself, because that allows for the possibility that, some time down the road, we decide to set up a system that monitors rain in two different places. We can’t centre our regions of interest on both, to do that would be just to say we’re creating two completely independent neural networks. At least to begin with, then, we’ll centre our division of the disc on the radar facility in Franktown.

Why aren’t we using convolutional neural networks? Well, that’s something we’ll keep in mind for later, but they’re the sort of thing you would use to investigate the shape of rainfall patterns within a region, and, at least for now, we think rainfall prediction is not so strongly dependent on the fine-scale features of the rainfall intensities within a region of space. So, my inputs are going to be broken into these disjoint regions, and the first layer of our network will only consider values within these regions, not between them. This is called a modular neural network. A module of neurons that look only at pixels in their area of responsibility, and feed some outputs up to the next layer for more analysis. The higher layer(s) will be able to work with outputs from multiple modules. For now, I’m not going to say how many outputs each of these first-layer regions will produce. With only one output, the network would probably train itself to something like “fraction of pixels in the region which show rain”, but it’s likely that we need a bit more information than that, so we will keep in mind the possibility that these regions will each be allowed produce multiple outputs if we decide that’s necessary. Also, the region(s) immediately around Ottawa are special. In there, we probably are interested in the fine-scale details of rainfall amounts, so we’ll either want to make those regions quite small, or we’ll want to expose those pixels separately to the higher layer of the network. It’s something to keep in mind.

OK, so that’s a basic outline of our proposed neural network topology. What about training data? We need some outcomes to validate predictions.

We’ll pick four pixels near the middle of Ottawa, and look at the rainfall in those pixels. If any pixel shows any rainfall, our real outcomes will have that bit set. If at least one pixel shows heavy rain, our real outcomes will have that other bit set. Since our bits are binned into one hour predictions and we have 6 radar images per hour, we’ll say that any rain in any one of those six radar observations will result in the corresponding bits being set.

But, as always, there are complications. Recall in the previous article that there’s a close-range phantom rainfall observed when the skies are clear and it’s humid out. Ottawa lies within this region of phantom rainfall, which means that a naive extraction would conclude that it’s almost always raining over Ottawa on muggy days. Here’s an example of such phantom rainfall:

That rain near the white cross isn’t really there

What rule are we going to use to distinguish this phantom rainfall from real rain? Well, domain knowledge time again. These regions of phantom rainfall only appear as the lightest rainfall rate, and appear as scattered pixels rather than solid regions. In real light rainfall, we always see large areas of light rain, usually with heavier rain inside. So, that’s going to be our rule. If there are only light blue active pixels within, say, 20 pixels of the radar facility, and if the density of such pixels is less than 0.5, then we’ll say that the purported rain in Ottawa is, in fact, not there. We might have to tune this condition later, but this will make a starting point.

OK, so we’ve laid out a preliminary approach. We might discover after testing that we need to do things differently, but we now have a viable first attempt.

In our next posting, we’ll process the input data to make it suitable for feeding to the neural network.

UPDATE #1 (2019-08-23): Added a link to an index page of articles in this series.