Writing a rain predictor, preparing the data

The index to the articles in this series is found here.

It’s time to get a baseline reference, the radar image as it would appear with no rain anywhere. I picked out three images that were quite clean. This isn’t trivial, as the radar seems to produce false short-range returns on clear, humid days. I assume this is because, in the absence of any precipitation, there’s no strong reflected signal, and the radar analysis is interpreting some close-range backscatter from the air as slight rainfall. This means that we often have light blue pixels surrounding the radar station when there isn’t rain elsewhere. Still, I found three images that voted to produce a good consensus.

Here’s the code I used to analyse those .gif files and produce a consensus image:

#! /usr/bin/python3

# This script reads in three .gif files and produces a new file in
# which each pixel is set to the majority value from the three inputs.
# If there is no majority value (i.e. all three files have a different
# value at that point), we exit with an error so that a better set of
# inputs can be found.

# We are using this script to analyse machine-generated files in a
# single context.  While the usual programming recommendation is to be
# very permissive in what formats you accept, I'm going to restrict
# myself to verifying consistency and detecting unexpected inputs,
# rather than trying to handle all of the possible cases.

# This is a pre-processing step that will be used by another script
# that reads .gif files.  Therefore it is reasonable to make this
# script's output be a .gif itself.

# The script takes 4 arguments.  The first three are the names of the
# input files.  The fourth is the name of the output file.

# The script will return '1' on error, '0' for success.

import sys
import gif


class SearchFailed(Exception):
    def __init__(self, message):
        self.message = message


def find_index_of_tuple (list_of_tuples, needle, hint = 0):
    if list_of_tuples[hint] == needle:
        return hint
    for i in list_of_tuples:
        if (list_of_tuples[i] == needle):
            return i
    raise SearchFailed('Tuple {0} not found in list.' % needle)


if len(sys.argv) != 5:
    print ("Require 3 input filenames and 1 output filename.")
    sys.exit(1)

file = [None, None, None]
reader = [None, None, None]

for i in range(3):    
    try:    
        file[i] = open(sys.argv[i+1], 'rb')
    except OSError as ex:
        print ("Failed to open input file: ", sys.argv[i+1])
        print ("Reason: ", ex.strerror)
        sys.exit(1)
    reader[i] = gif.Reader()
    reader[i].feed(file[i].read())
    if ( not reader[i].is_complete()
         or not reader[i].has_screen_descriptor() ):
        print ("Failed to parse", sys.argv[i+1], "as a .gif file")
        sys.exit(1)

# OK, if we get here it means we have successfully loaded three .gif
# files.  The user might have handed us the same one three times, but
# there's not much I can do about that, it's entirely possible that we
# want to look at three identical but distinct files, and filename
# aliases make any more careful examination of the paths platform
# dependent.

# So, we're going to want to verify that the three files have the same
# sizes.

if ( reader[0].width != reader[1].width
     or reader[1].width != reader[2].width
     or reader[0].height != reader[1].height
     or reader[1].height != reader[2].height ):
    print ("The gif logical screen sizes are not identical")
    sys.exit(1)

for i in range(3):
    if ( len(reader[i].blocks) != 2
         or not isinstance(reader[i].blocks[0], gif.Image)
         or not isinstance(reader[i].blocks[1], gif.Trailer)):
        print ("While processing file: ", sys.argv[i+1])
        print ("The code only accepts input files with a single block of "
               "type Image followed by one of type Trailer.  This "
               "constraint has not been met, the code will have to be "
               "changed to handle the more complicated case.")
        sys.exit(1)
    
    
# Time to vote

try:
    writer = gif.Writer (open (sys.argv[4], 'wb'))
except OSError as ex:
    print ("Failed to open output file: ", sys.argv[4])
    print ("Reason: ", ex.strerror)
    sys.exit(1)

output_width = reader[0].width
output_height = reader[0].height
output_colour_depth = 8
output_colour_table = reader[0].color_table
output_pixel_block = []

for ind0, ind1, ind2 in zip(reader[0].blocks[0].get_pixels(),
                            reader[1].blocks[0].get_pixels(),
                            reader[2].blocks[0].get_pixels()):
    tup0 = reader[0].color_table[ind0]
    tup1 = reader[1].color_table[ind1]
    tup2 = reader[2].color_table[ind2]

    # Voting
    if ( tup0 == tup1 or tup0 == tup2):
        output_pixel_block.append(ind0)
    elif ( tup1 == tup2 ):
        try:
            newind = find_index_of_tuple(output_colour_table,
                                         tup1, ind1)
            output_pixel_block.append(newind)
        except SearchFailed as ex:
            print ('The colour table for file %s does not hold the '
                   'entry {0} that won the vote.  You may be able '
                   'to fix this problem simply by reordering your '
                   'command-line arguments.' % sys.argv[1], tup1)
            sys.exit(1)

writer.write_header()
writer.write_screen_descriptor(output_width, output_height,
                               True, output_colour_depth)
writer.write_color_table(output_colour_table, output_colour_depth)
writer.write_image(output_width, output_height,
                   output_colour_depth, output_pixel_block)
writer.write_trailer()

So, what does this do? After verifying that it received the correct number of arguments, that it can open the three inputs, and that the input files are all valid .gif files, it checks to make sure they all have the same image dimensions.

Now, it would be a bit more work to support multiple image blocks, though the GIF specification does allow that. So, I verified that these files from the government website do not use multiple image blocks, and coded in a check. This script will exit with an error if it is presented such files. This way I don’t have to write the extra code unless some future change forces me to accept the more complicated format.

Now, the files I chose did not have identical colour tables, but the tables differed only in the ordering. This might not always be true, but it is at the moment. I use the colour table from the first input .gif as my output colour table. Then, I walk through the pixels in the three files and look up the tuple of colours for that pixel. If the first and second input files agree on the value of that tuple, then we simply insert the appropriate index into the colour table. If the first disagrees, but the second and third agree, then we have to find the index of this tuple in the output colour table. It’s probably the same, so we hint with the offset into the colour table of the second file, but my function will walk the entire colour table if it has to, to find an index matching that tuple. If it fails to do so, that’s an error, and we exit.

Finally, we write out the consensus .gif file, and exit normally.

In the next article we’ll have a discussion of how to set up the neural network.

UPDATE #1 (2019-08-23): Included a link to an index of articles in this series.

Wordpress on cneufeld.ca

Moved here from Taiwan Yahoo blogs

Writing a rain predictor, preparing the data

Leave a Reply Cancel reply