Understanding the LSTM Input-Output Process

An LSTM cell (“long short-term memory”) is a software component that can be used to create a neural system that can make predictions on sequences of input values. I’m been looking very closely at LSTMs using the CNTK code library.

My goal was to completely understand what the following CNTK code does:

# lstm_peek.py
# explore CNTK LSTM I/O

import numpy as np
import cntk as C

input_dim = 2   # context pattern window
output_dim = 1

X = C.ops.sequence.input_variable(input_dim)

model = None
with C.layers.default_options():
  model = C.layers.Recurrence(C.layers.LSTM(shape=4))(X)
  model = C.sequence.last(model)
  model = C.layers.Dense(output_dim)(model) 

inpt_array = np.array([[1.0, 2.0],
                       [3.0, 4.0],
                       [5.0, 6.0]], dtype=np.float32)

result = model.eval({X:inpt_array})

Briefly, three sequences of two items each are fed to an LSTM cell that has a state-memory of size 4. The (unseen) internal output is three states of size 4 each. The last of these states of 4 items is fetched and then fed to a neural layer and condensed to a single value, which is then displayed.

Whew! The code is short but very deep and it took me several hours to completely understand what was happening.

Unfortunately, I can’t show you the exact calculations because the weights and biases in the LSTM and the last neural layer are initialized to random values.

Moral of the story: When using a code library like CNTK, a lot of black-box components are used. But if you spend some time you can understand their behavior.

This entry was posted in CNTK, Machine Learning. Bookmark the permalink.