Coding an LSTM Cell from Scratch using Python

I never completely understand some technology concept unless I can code it up from scratch. I’ve been exploring LSTM (long short-term memory) cells for several weeks. LSTM cells are complex software components that can be used to create a recurrent neural network — a prediction system that has state (a memory) so you can predict the next item in a sequence.

I finally felt I knew enough to code a demo up from scratch — and solidify my confidence that I fully understand LSTM cells. There are about half a dozen well-known Web articles on LSTMs. They all explain LSTMs from a different perspective. Somewhat surprisingly to me, the Wikipedia article on LSTMs is very good (many Wikipedia articles on machine learning topics are not good at all) so I used Wikipedia as my primary reference.

Anyway, I used Python and coded an LSTM from scratch, following the Wikipedia article as closely as possible — same variable names, etc. The process was quick and easy, although I’m not sure how easy things would have been without the hours of background time I spent reading about LSTMs.

My demo coded the LSTM input-output process. I set fixed weights and biases to arbitrary constants. A full explanation of the IO process would take pages, so, I’ll write the full process up someday when I have a free full day.

Every software developer and researcher I know has a driving curiosity to understand things, and then use that knowledge to create things. Every company I’ve ever worked for has, quite correctly, told employees how important it is to “be continuous learners” or “have a growth mindset” or similar. My colleagues have always kind of smiled internally at these deep words of wisdom — continuous learning is literally wired into our DNA and we find it difficult to understand how anyone can’t be driven by the search for new knowledge and skills.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.

One Response to Coding an LSTM Cell from Scratch using Python

  1. asdlfj says:

    Hi James,

    Another interesting post, thank you for taking the time to share your thoughts.

    Would you be able to share the full lstm_io.py code? I can only see about half of it from the screen shot. Thanks in advance!

Comments are closed.