*UPDATE: After working on this problem for many weeks, I realized I was doing something fundamentally wrong when I set up the data files. So, basically, scratch this entire blog post. I keep it to remind myself how tricky time series regression problems can be.*

*Most of what follows is wrong, wrong, wrong . . . *

Time series regression problems — predicting the next value in a sequence — look simple but are almost always extremely difficult. There are many different techniques you can use to tackle a time series regression problem. One of the newest approaches is to use an LSTM (long short-term memory) deep recurrent neural network.

Over the past year or so, I’ve seen about a dozen different attempts to use an LSTM for time series regression, and I’ve tried LSTMs myself — and almost always failed badly.

My most recent attempt seems to have promise. I used a well-known airline passenger count from 1949 to 1960 dataset. I used the CNTK code library and set up my training data file like:

|prev 1.12 1.18 1.32 1.29 |pct 1.08035714 |prev 1.18 1.32 1.29 1.21 |pct 1.14406780 |prev 1.32 1.29 1.21 1.35 |pct 1.12121212

Each sequence of four counts is used to predict a percentage change, not the next count. For example, the first four actual counts are (1.12, 1.18, 1.32, 1.29) and the actual percentage change is 1.08035714. This means the actual next count is 1.12 * 1.08035714 = 1.21.

The key code in my demo that creates an LSTM network is:

model = None with C.layers.default_options(): model = C.layers.Recurrence(C.layers.LSTM(shape=256))(X) model = C.sequence.last(model) # model = C.layers.Dropout(0.10, seed=1)(model) model = C.layers.Dense(output_dim)(model)

The 256 is an internal memory for the network. LSTMs are extremely complex and have only been widely used for the past couple of years.

Anyway, after training, I graphed the actual passenger counts and the predicted counts and the results look decent Then I used the trained model to extrapolate predicted counts for 1961 — the first two or three predicted values look OK but the model gets weaker quickly.

The interesting side note to all of this is the way engineers like me can become completely obsessed with a problem. In technology, almost any problem can be solved given enough time and resources. This leads to situations where engineers, like me, just cannot rest until a technical problem has been solved. I lost a couple of days of sleep over this problem, but it’s very satisfying when a tough problem is resolved.

*I didn’t like Moby Dick or Lolita — the two most famous novels about obsession*

I know that feeling, what if another network was trained to limit the error even further.

So you take your results as a first raw line, then have another NN to refine it even more ?.