Building a Rain Predictor. Training the model.

The index to the articles in this series is found here.

Finally, we’re training the model. I started with the most basic model. That is, no noise introduction, no special regularization, I just wanted to get the training running first, to see how things go. It is… not quick.

Here’s the training code, rptrainer.py:

#! /usr/bin/python3

# Here we go.  Training the neural network.

import rpreddtypes
import rpgenerator
import argparse
import random

import tensorflow as tf
# from tensorflow.keras.callbacks import TensorBoard, EarlyStopping

import keras
from keras.layers import Input, Dense, Concatenate, LSTM
from keras.models import Sequential, Model

import sys
import numpy as np


    

### Main code entry point here


defBatchSize = 512
ring_module_nodes_0 = 1000
ring_module_nodes_1 = 5
bullseye_module_nodes_0 = 1000
bullseye_module_nodes_1 = 5
tripwire_module_nodes_0 = 40
tripwire_module_nodes_1 = 5
synth_layer_nodes = 200
num_outputs = 10


parser = argparse.ArgumentParser(description='Train the rain '
                                 'prediction network.')
parser.add_argument('--continue', dest='Continue',
                    action='store_true',
                    help='Whether to load a previous state and '
                    'continue training')
parser.add_argument('--pathfile', type=str, dest='pathfile',
                    required=True,
                    help='The file that maps sequence numbers to '
                    'the pathnames of the binary files.')
parser.add_argument('--training-set', type=str, dest='trainingset',
                    required=True,
                    help='The file containing the training set '
                    'to use.')
parser.add_argument('--veto-set', type=str, dest='vetoset',
                    help='If supplied, its argument is the name '
                    'of a file containing training set entries that '
                    'are to be skipped.')
parser.add_argument('--test-set', type=str, dest='testset',
                    required=True,
                    help='The test set used to detect overfitting '
                    'during training.')
parser.add_argument('--savefile', type=str, dest='savefile',
                    help='The filename at which to save the '
                    'trained network parameters.  A suffix will be '
                    'applied to the name to avoid data '
                    'incompatibility.')
parser.add_argument('--override-centre', type=list, dest='centre',
                    default=[240,239], help='Set a new location for '
                    'the pixel coordinates of the radar station')
parser.add_argument('--override-sensitive-region', type=list,
                    dest='sensitive',
                    default=[[264,204], [264,205], [265,204], [265,205]],
                    help='Set a new list of sensitive pixels')
parser.add_argument('--heavy-rain-index', type=int, dest='heavy',
                    default=3, help='Lowest index in the colour table '
                    'that indicates heavy rain, where 1 is the '
                    'lightest rain.')
parser.add_argument('--batchsize', type=int, dest='batchsize',
                    default=defBatchSize,
                    help='Override the batch size used for training.')

args = parser.parse_args()


trainGenerator = rpgenerator.RPDataGenerator(args.trainingset,
                                             args.pathfile,
                                             args.vetoset,
                                             args.centre,
                                             args.sensitive,
                                             args.heavy,
                                             args.batchsize)

validationGenerator = rpgenerator.RPDataGenerator(args.testset,
                                                  args.pathfile,
                                                  args.vetoset,
                                                  args.centre,
                                                  args.sensitive,
                                                  args.heavy,
                                                  args.batchsize)

hashval = rpreddtypes.genhash(args.centre, args.sensitive, args.heavy)
modsizes = trainGenerator.getModuleSizes()

if args.savefile:
    args.savefile = args.savefile + str(hashval)


if args.Continue:
    if not args.savefile:
        print('You asked to continue by loading a previous state, '
              'but did not supply the savefile with the previous state.')
        sys.exit(1)

    mymodel = keras.models.load_model(args.savefile)
    
else:

    inputslist = list(map(lambda x:
                          Input(batch_shape=(args.batchsize, 6, x)),
                          modsizes))

    evenmodels = []
    oddmodels = []

    for i in range(4):
        onemod = Sequential()
        onemod.add(Dense(ring_module_nodes_0, activation='relu'))
        onemod.add(Dense(ring_module_nodes_1, activation='relu'))
        evenmodels.append(onemod)
        onemod = Sequential()
        onemod.add(Dense(ring_module_nodes_0, activation='relu'))
        onemod.add(Dense(ring_module_nodes_1, activation='relu'))
        oddmodels.append(onemod)

    bullseyemodel = Sequential()
    bullseyemodel.add(Dense(bullseye_module_nodes_0, activation='relu'))
    bullseyemodel.add(Dense(bullseye_module_nodes_1, activation='relu'))

    tripwiremodel = Sequential()
    tripwiremodel.add(Dense(tripwire_module_nodes_0, activation='relu'))
    tripwiremodel.add(Dense(tripwire_module_nodes_1, activation='relu'))

    
    
        
    scanned = []
    for i in range(32):
        mtype = i % 2
        ringnum = i // 8
        if mtype == 0:
            scanned.append(evenmodels[ringnum](inputslist[i]))
        else:
            scanned.append(oddmodels[ringnum](inputslist[i]))

    scanned.append(bullseyemodel(inputslist[32]))
    scanned.append(tripwiremodel(inputslist[33]))

    aggregated = Concatenate()(scanned)

    time_layer = LSTM(6, stateful = False, activation='relu')(aggregated)

    synth_layer = Dense(synth_layer_nodes, activation='relu')(time_layer)
    output_layer = Dense(num_outputs, activation='sigmoid')(synth_layer)

    mymodel = Model(inputs=inputslist, outputs=[output_layer])

mymodel.compile(loss='binary_crossentropy', optimizer='sgd')
#                metrics=[tf.keras.metrics.FalsePositives(),
#                         tf.keras.metrics.FalseNegatives()])

ie = 0
while True:
    mymodel.fit_generator(initial_epoch = ie, generator=trainGenerator,
                          validation_data = validationGenerator,
                          shuffle=False)
    ie += 1
    if args.savefile:
        mymodel.save(args.savefile)


This is essentially the same thing I did before in the Keras test code. I have created two dense models per ring, one for the bullseye, and one more for the tripwire. Put an LSTM layer atop those for time series tracking, another dense layer above for synthesis, and an output layer. Compile, and run.

I tried to run multiprocessor, my data generator was deliberately made thread-safe to that end, but ran into an integer overflow error deep in the guts of keras, so turned it back to single processor. I tried to use metrics for FalsePositive and FalseNegative from TensorFlow in the compile operation, but this lead to an error at runtime after the first epoch was fed through the network. This is, apparently, due to some incompatibilities between the versions of TensorFlow and Keras that were installed on my Kubuntu machine. No problem, I can always write a custom Keras metric class later. For now, I commented out the offending metrics and re-ran.

As of right now, I’m 12.5 hours of CPU deep in the first epoch, and the estimated time to completion is showing another 16 hours. Oh, and I put another 40GB of swap online, my python process has a virtual address space size of 35.9GB, 10.5GB resident, and 22GB of swap in use.

So, we’re going to have to back up a bit and figure out a better plan. Two possibilities are coarse-graining the inputs, and convolving them. Coarse-graining is the simpler option at this moment, so I’m going to be doing that first.

Convolving is a bit challenging. Typically, one generates a set of a few dozen random convolutions, possibly pooling them and convolving a second time before training on the whole set. The network then picks out those convolutions that have predictive power, essentially an automated feature selection step. Trying to find convolutions that will work will wind up greatly increasing the size of the model, and we’re already pushing hard against the edge of reasonable. Note, too, that because our ring models are duplicated and rotated, the convolutions would also have to be rotated in four 90-degree quadrants so that consistent data is passed through to the training.

Right, so coarse-graining it is. I’m going to test 2×2, 3×3, and 4×4 reductions, and each generated pixel will be taken from the maximum value of the pixels in its area of coverage.

I’ll be bumping the version number of my intermediate binary file format and putting those coarse-graining options right into the files.

So, we’ve reached a model that compiles and probably trains. We’ll know in about 14 hours whether or not it trains the first epoch successfully. Now, we start to adjust and tweak.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

反垃圾邮件 / Anti-spam question * Time limit is exhausted. Please reload CAPTCHA.