Still Yet Another Look at LSTM Time Series Regression

One of my personality weaknesses is that when a technical problem gets stuck in my head, I’m incapable of letting it go. Literally. I can think of several problems that stuck in my brain for several years until I finally solved them.

Time series regression (predicting the next number in a sequence) using an LSTM neural network (a very complex NN that has a memory) is one of these problems. This weekend I made a step forward in fully understanding LSTM time series regression. In particular, I figured out one reason why these problems are so difficult.

My usual example is the Airline Dataset. There are 144 values which represent the total number of international airline passengers each month, from January 1948 through December 1960. The data looks like:

1.12, 1.18, 1.32, 1.29, 1.21, 1.35 . . 4.32

The raw values are x100,000 so the first value means 112,000 passengers. In many of my attempts to predict the next number in the sequence. I thought I saw a phenomenon where all my predictions were off by one month. For example, suppose the input is a set of size four values. I saw things like:

Input                    Actual Predicted
=========================================
1.12  1.18  1.32  1.29    1.21    1.28    
1.18  1.32  1.29  1.21    1.35    1.20
1.32  1.29  1.21  1.35    1.48    1.34
etc.   

In other words, the model was predicting not the next value, but rather the last value of the current input. I figured I had just made some sort of indexing mistake because by shifting the predicting values up by one position, the predictions become very accurate. But I was wrong about that.

I now believe this effect is a fundamental problem with LSTM time series regression. An LSTM uses as its input, the new input and also part of the previous output. If the LSTM is ineffective, it could be using just the previous output, which because of the way rolling window data is set up, would give the bad results above. (Note: my explanation here isn’t fully correct — a complete explanation would take a couple of pages).

Put another way, time series regression problems with an LSTM appear to be extremely prone to a form of over-fitting related to rolling window data.

I coded up yet-another-example using the CNTK library. By adding a dropout layer I was able to lessen the effect of this over-fitting. There’s still much more to understand in my search for truth. My current working hypothesis is that LSTMs for time series regression can work well for modeling the structure of a dataset (which can be used for anomaly detection), or for predicting a very short time ahead (perhaps one or two time steps), but not for extrapolating several time steps ahead.



Apollo 11 flight to the moon in 1969 – a brilliant conclusion to a search for scientific truth and an incredible achievement by men who are true heroes.

Advertisements
This entry was posted in CNTK, Machine Learning. Bookmark the permalink.

2 Responses to Still Yet Another Look at LSTM Time Series Regression

  1. PGT-ART says:

    Some neural networks tackle this problem a bit different way.
    They use neurons with a delay, (waiting one or more cycles).
    And some networks combine that width loops, ea layer 3 sends back to input layer or layer 2, and layer 4 is output layer. I once had a very good pdf describing the mechanism but am not sure where i kept it. But i guess you can try it with your code as well.

    Maybe time series makes one realize that common neural networks, work different then neurons in our brain. The brain hasn’t a strict layer architecture, neither do the neurons fire that way. The neural networks we code these days are just a workable approximation of it, but with less details.

  2. PGT-ART says:

    While i wait on a php install here’s a code hint, for my earlier comment.
    Basically you could store it in a list of lists of double’s
    A neuron will fire zero if the fifo is not yet full (maybe make it a small random value instead).
    But once the FiFo buffer has it size (per neuron index different fifo size can be set), neuronMemory_get will return a result from n cycli before.

    static List<List> neuronMemory = new List<List>();
    static List neuroMemoryQueue = new List();
    public double neuronMemory_get( ref int index, double value )
    {
    if (neuroMemoryQueue[index] > neuronMemory[index].Count)
    {neuronMemory[index].Add(value);
    return 0; //or a small random value }
    else
    { //update and retrieve a Fifo for neuron X, fifo queuelength is set in neuromemoryqueue
    double M = neuronMemory[index][0];
    neuronMemory[index].RemoveAt(0);
    neuronMemory[index].Add(value);
    return M;
    }
    }
    public void SetMemoryNeuron(int index, int FifoSize)
    {
    while (neuroMemoryQueue.Count index) neuroMemoryQueue[index] = FifoSize;
    }

    Your coding examples where always very good but as far as I can recall you didnt use lists, so maybe this helps. lists have no static size like arrays have, and can grow (and shrink).
    But once the FiFo buffer has it size (per neuron index different fifo size can be set), neuronMemory_get will return a result from n cycli before (while you put current value in it).

    Some more logic should be added so you decide which neurons will make use of this construct

Comments are closed.