A machine learning project

The index to the articles in this series is found here.

Well, four years ago I mentioned that I was going on a brief hiatus, and there hasn’t been very much here since then. Turns out that having a baby in the house does eat into the free time a bit. Now, though, I find myself with some more free time, after the parent company closed the entire Ottawa office and laid off the staff here. If anybody’s looking for an experienced mathematical programmer with a doctorate in physics, get in touch.

So, here’s a project I was about to start four years ago. I had collected some training data, but never got the project itself started.

I like to bicycle in the summer time, but I don’t like to ride in the rain. So, when I remember, I check the local weather radar and look for active precipitation moving toward the city. I can decide from that whether to go for a bicycle ride, and whether to ride to work, or find another way to get to the office.

The weather radar website, https://weather.gc.ca/radar/index_e.html?id=XFT, shows an hour of rain/snow detection at 10 minute intervals, played on a loop. You can look at the rain and guess how long it will take to get to the city. This won’t help you if rain forms directly over the city, but most of the time the rain moves into town, rather than beginning here.

The interpretation of these sequences seemed to me to be something I could automate. Maybe have a program that sends a warning or email to my cellphone if rain is imminent, in case I’m out on the bike.

I collected over 11000 .gif files by downloading individual files via a cron job. The images don’t have an embedded copyright message, and are government-collected data, but I’m not confident that this gives me the right to make this dataset available online, so I will satisfy myself with reproducing a single example for illustrative purposes. Here is a typical downloaded image:

The city of Ottawa is located roughly North-East of the white cross, just South of the Ottawa river that runs dominantly West to East. Near the right edge of the active region you can see the island of Montreal.

The very light blue represents light rainfall, something you might barely notice while riding a bicycle. Anything at the bright green or higher would be something I would try to wait out by sheltering under a bridge or similar construction. Weather patterns in this area, as in much of the continent, are dominantly blown from the West to the East, though there are some exceptions, and we will, very occasionally, have storms blow in from the East.

So, here’s the project. I haven’t actually written code yet, so we’ll explore this together. I would like to set up a neural network that can watch the radar website, downloading a new image every 10 minutes, and use this to predict 10 binary states. The first five values will be the network’s confidence (I’m not going to call it probability) that there will be any rain at all in the next 0 to 1 hours, 1 to 2 hours, 2 to 3 hours, and so on out to 5 hours. The next five values will be the confidence of heavy rain, defined as rain at the bright green or higher level, in the same intervals.

Ideally, this network would also update itself continuously, as more data became available.

This isn’t a substitute for the weather forecasts made by the experts at Environment Canada, they use a lot more to inform their forecasts than just the weather radar in the area, but it aims to answer a different question. My project will try to estimate only confidence of rain specifically in the city of Ottawa, and over a relatively short projection interval, no more than 5 hours. It’s answering a more precise question, and I hope it turns out to give me useful information.

Now, we might be tempted to just throw the raw data at a neural network along with indications of whether a particular image is showing that it is raining in Ottawa, but we don’t have an unlimited data set, and we can probably help the process along quite a bit by making some preliminary analysis. This isn’t feature selection, our input set is really a bit too simple for any meaningful feature selection, but we can give the algorithm a bit of a head start.

The first thing we’ll want to do is to pull out the background image. The radar image shows precipitation as colours overlaid on a fixed background. If we know what that background is in the absence of any rain, we can call that ‘0’ everywhere in the inputs, and any pixels that differ will be taken as coming from rain, with a value that increases as we climb that scale on the right side of the sample image.

I’ll pick out three images that are rain-free to my eye. There might be tiny pockets of precipitation that escape my notice, but by choosing three that appear clean and letting them vote on pixel values, I should have a good base reference.

We’ll be writing this project in Python3, with Keras interfacing onto TensorFlow.

The next posting will cover the baseline extraction code.

UPDATE #1 (2019-08-20): I’ve made the source files I’m posting in this series available on github. You can download them from https://github.com/ChristopherNeufeld/rain-predictor. I’ll continue to post the source code in these articles, but may not post patches there, I’ll just direct you back to the github tree for history and changes.

UPDATE #2 (2019-08-23): Added a link to an index page.

Wordpress on cneufeld.ca

Moved here from Taiwan Yahoo blogs

Leave a Reply Cancel reply