Tag Archives: machine learning

Writing a rain predictor, preparing the data

The index to the articles in this series is found here.

It’s time to get a baseline reference, the radar image as it would appear with no rain anywhere. I picked out three images that were quite clean. This isn’t trivial, as the radar seems to produce false short-range returns on clear, humid days. I assume this is because, in the absence of any precipitation, there’s no strong reflected signal, and the radar analysis is interpreting some close-range backscatter from the air as slight rainfall. This means that we often have light blue pixels surrounding the radar station when there isn’t rain elsewhere. Still, I found three images that voted to produce a good consensus.

Here’s the code I used to analyse those .gif files and produce a consensus image:

#! /usr/bin/python3

# This script reads in three .gif files and produces a new file in
# which each pixel is set to the majority value from the three inputs.
# If there is no majority value (i.e. all three files have a different
# value at that point), we exit with an error so that a better set of
# inputs can be found.

# We are using this script to analyse machine-generated files in a
# single context.  While the usual programming recommendation is to be
# very permissive in what formats you accept, I'm going to restrict
# myself to verifying consistency and detecting unexpected inputs,
# rather than trying to handle all of the possible cases.

# This is a pre-processing step that will be used by another script
# that reads .gif files.  Therefore it is reasonable to make this
# script's output be a .gif itself.

# The script takes 4 arguments.  The first three are the names of the
# input files.  The fourth is the name of the output file.

# The script will return '1' on error, '0' for success.

import sys
import gif


class SearchFailed(Exception):
    def __init__(self, message):
        self.message = message


def find_index_of_tuple (list_of_tuples, needle, hint = 0):
    if list_of_tuples[hint] == needle:
        return hint
    for i in list_of_tuples:
        if (list_of_tuples[i] == needle):
            return i
    raise SearchFailed('Tuple {0} not found in list.' % needle)


if len(sys.argv) != 5:
    print ("Require 3 input filenames and 1 output filename.")
    sys.exit(1)

file = [None, None, None]
reader = [None, None, None]

for i in range(3):    
    try:    
        file[i] = open(sys.argv[i+1], 'rb')
    except OSError as ex:
        print ("Failed to open input file: ", sys.argv[i+1])
        print ("Reason: ", ex.strerror)
        sys.exit(1)
    reader[i] = gif.Reader()
    reader[i].feed(file[i].read())
    if ( not reader[i].is_complete()
         or not reader[i].has_screen_descriptor() ):
        print ("Failed to parse", sys.argv[i+1], "as a .gif file")
        sys.exit(1)

# OK, if we get here it means we have successfully loaded three .gif
# files.  The user might have handed us the same one three times, but
# there's not much I can do about that, it's entirely possible that we
# want to look at three identical but distinct files, and filename
# aliases make any more careful examination of the paths platform
# dependent.

# So, we're going to want to verify that the three files have the same
# sizes.

if ( reader[0].width != reader[1].width
     or reader[1].width != reader[2].width
     or reader[0].height != reader[1].height
     or reader[1].height != reader[2].height ):
    print ("The gif logical screen sizes are not identical")
    sys.exit(1)

for i in range(3):
    if ( len(reader[i].blocks) != 2
         or not isinstance(reader[i].blocks[0], gif.Image)
         or not isinstance(reader[i].blocks[1], gif.Trailer)):
        print ("While processing file: ", sys.argv[i+1])
        print ("The code only accepts input files with a single block of "
               "type Image followed by one of type Trailer.  This "
               "constraint has not been met, the code will have to be "
               "changed to handle the more complicated case.")
        sys.exit(1)
    
    
# Time to vote

try:
    writer = gif.Writer (open (sys.argv[4], 'wb'))
except OSError as ex:
    print ("Failed to open output file: ", sys.argv[4])
    print ("Reason: ", ex.strerror)
    sys.exit(1)

output_width = reader[0].width
output_height = reader[0].height
output_colour_depth = 8
output_colour_table = reader[0].color_table
output_pixel_block = []

for ind0, ind1, ind2 in zip(reader[0].blocks[0].get_pixels(),
                            reader[1].blocks[0].get_pixels(),
                            reader[2].blocks[0].get_pixels()):
    tup0 = reader[0].color_table[ind0]
    tup1 = reader[1].color_table[ind1]
    tup2 = reader[2].color_table[ind2]

    # Voting
    if ( tup0 == tup1 or tup0 == tup2):
        output_pixel_block.append(ind0)
    elif ( tup1 == tup2 ):
        try:
            newind = find_index_of_tuple(output_colour_table,
                                         tup1, ind1)
            output_pixel_block.append(newind)
        except SearchFailed as ex:
            print ('The colour table for file %s does not hold the '
                   'entry {0} that won the vote.  You may be able '
                   'to fix this problem simply by reordering your '
                   'command-line arguments.' % sys.argv[1], tup1)
            sys.exit(1)

writer.write_header()
writer.write_screen_descriptor(output_width, output_height,
                               True, output_colour_depth)
writer.write_color_table(output_colour_table, output_colour_depth)
writer.write_image(output_width, output_height,
                   output_colour_depth, output_pixel_block)
writer.write_trailer()

So, what does this do? After verifying that it received the correct number of arguments, that it can open the three inputs, and that the input files are all valid .gif files, it checks to make sure they all have the same image dimensions.

Now, it would be a bit more work to support multiple image blocks, though the GIF specification does allow that. So, I verified that these files from the government website do not use multiple image blocks, and coded in a check. This script will exit with an error if it is presented such files. This way I don’t have to write the extra code unless some future change forces me to accept the more complicated format.

Now, the files I chose did not have identical colour tables, but the tables differed only in the ordering. This might not always be true, but it is at the moment. I use the colour table from the first input .gif as my output colour table. Then, I walk through the pixels in the three files and look up the tuple of colours for that pixel. If the first and second input files agree on the value of that tuple, then we simply insert the appropriate index into the colour table. If the first disagrees, but the second and third agree, then we have to find the index of this tuple in the output colour table. It’s probably the same, so we hint with the offset into the colour table of the second file, but my function will walk the entire colour table if it has to, to find an index matching that tuple. If it fails to do so, that’s an error, and we exit.

Finally, we write out the consensus .gif file, and exit normally.

In the next article we’ll have a discussion of how to set up the neural network.

UPDATE #1 (2019-08-23): Included a link to an index of articles in this series.

A machine learning project

The index to the articles in this series is found here.

Well, four years ago I mentioned that I was going on a brief hiatus, and there hasn’t been very much here since then. Turns out that having a baby in the house does eat into the free time a bit. Now, though, I find myself with some more free time, after the parent company closed the entire Ottawa office and laid off the staff here. If anybody’s looking for an experienced mathematical programmer with a doctorate in physics, get in touch.

So, here’s a project I was about to start four years ago. I had collected some training data, but never got the project itself started.

I like to bicycle in the summer time, but I don’t like to ride in the rain. So, when I remember, I check the local weather radar and look for active precipitation moving toward the city. I can decide from that whether to go for a bicycle ride, and whether to ride to work, or find another way to get to the office.

The weather radar website, https://weather.gc.ca/radar/index_e.html?id=XFT, shows an hour of rain/snow detection at 10 minute intervals, played on a loop. You can look at the rain and guess how long it will take to get to the city. This won’t help you if rain forms directly over the city, but most of the time the rain moves into town, rather than beginning here.

The interpretation of these sequences seemed to me to be something I could automate. Maybe have a program that sends a warning or email to my cellphone if rain is imminent, in case I’m out on the bike.

I collected over 11000 .gif files by downloading individual files via a cron job. The images don’t have an embedded copyright message, and are government-collected data, but I’m not confident that this gives me the right to make this dataset available online, so I will satisfy myself with reproducing a single example for illustrative purposes. Here is a typical downloaded image:

The city of Ottawa is located roughly North-East of the white cross, just South of the Ottawa river that runs dominantly West to East. Near the right edge of the active region you can see the island of Montreal.

The very light blue represents light rainfall, something you might barely notice while riding a bicycle. Anything at the bright green or higher would be something I would try to wait out by sheltering under a bridge or similar construction. Weather patterns in this area, as in much of the continent, are dominantly blown from the West to the East, though there are some exceptions, and we will, very occasionally, have storms blow in from the East.

So, here’s the project. I haven’t actually written code yet, so we’ll explore this together. I would like to set up a neural network that can watch the radar website, downloading a new image every 10 minutes, and use this to predict 10 binary states. The first five values will be the network’s confidence (I’m not going to call it probability) that there will be any rain at all in the next 0 to 1 hours, 1 to 2 hours, 2 to 3 hours, and so on out to 5 hours. The next five values will be the confidence of heavy rain, defined as rain at the bright green or higher level, in the same intervals.

Ideally, this network would also update itself continuously, as more data became available.

This isn’t a substitute for the weather forecasts made by the experts at Environment Canada, they use a lot more to inform their forecasts than just the weather radar in the area, but it aims to answer a different question. My project will try to estimate only confidence of rain specifically in the city of Ottawa, and over a relatively short projection interval, no more than 5 hours. It’s answering a more precise question, and I hope it turns out to give me useful information.

Now, we might be tempted to just throw the raw data at a neural network along with indications of whether a particular image is showing that it is raining in Ottawa, but we don’t have an unlimited data set, and we can probably help the process along quite a bit by making some preliminary analysis. This isn’t feature selection, our input set is really a bit too simple for any meaningful feature selection, but we can give the algorithm a bit of a head start.

The first thing we’ll want to do is to pull out the background image. The radar image shows precipitation as colours overlaid on a fixed background. If we know what that background is in the absence of any rain, we can call that ‘0’ everywhere in the inputs, and any pixels that differ will be taken as coming from rain, with a value that increases as we climb that scale on the right side of the sample image.

I’ll pick out three images that are rain-free to my eye. There might be tiny pockets of precipitation that escape my notice, but by choosing three that appear clean and letting them vote on pixel values, I should have a good base reference.

We’ll be writing this project in Python3, with Keras interfacing onto TensorFlow.

The next posting will cover the baseline extraction code.

UPDATE #1 (2019-08-20): I’ve made the source files I’m posting in this series available on github. You can download them from https://github.com/ChristopherNeufeld/rain-predictor. I’ll continue to post the source code in these articles, but may not post patches there, I’ll just direct you back to the github tree for history and changes.

UPDATE #2 (2019-08-23): Added a link to an index page.