My Ten Favorite Science Fiction Films of the 1950s

I grew up watching 1950s science fiction movies. To be honest some of them haven’t held up too well over time but many of them are quite good and I have a soft spot in my heart for all of these films. Here is a list of my top 10 science fiction films of the 1950s, meaning, if I was going on a trip for three months and could only take ten sci-fi films from the 50s, these would be the ten.

I first published this list in 2012. I revisited it eight years later in 2020 and I still pretty much agree with my original thoughts.

1. Invaders from Mars (1953) – A young boy thinks he sees a flying saucer land during a storm at night. Soon, people, including his parents, start acting strangely. This movie still scares me. Some of the best music of any science fiction movie ever. Do not waste your time on the horrible 1986 remake.

2. Forbidden Planet (1956) – Fantastic special effects, innovative music, and Robby the Robot highlight a story where Leslie Nielson captains the C-57D to find out what happened to the colony on Altar IV.

3. Gog (1954) – Richard Egan stars as an investigator sent to a super-secret underground desert laboratory complex to solve a series of bizarre deaths. I love the robot Gog – what scientific research robot is complete without crushing claws and a flamethrower?

4. Quatermass 2 (1957) – A British film sometimes called “Enemy from Space” in the U.S. A somewhat crusty Brian Donlevy plays Dr. Quatermass (not Quartermass) as he investigates reports of strange meteorites. He ends up at a creepy industrial plant. Is this an alien invasion or just paranoia?

5. The War of the Worlds (1953) – A George Pal production with Oscar-winning special effects. Gene Barry desperately tries to find a way to stop an unstoppable Martian invasion. I love Sir Cedric Hardwicke’s introductory narration from the H.G. Wells book. I did not like the 2005 Spielberg remake.

6. Godzilla (1956) – Although later movies featuring Godzilla became cartoonish, the original 1954 Japanese version and the 1956 American-ized version are deadly serious. Raymond Burr watches the destruction of Tokyo from an ill-advised location on top of a tall antenna tower. The early scene on the island, when the scientists are hiking up the steep hill and Godzilla appears, gave me nightmares for years.

7. The Thing from Another World (1951) – Usually just called The Thing, this movie has the classic scenario of a group of people isolated (in this case at a polar research station) and menaced by an alien. Excellent acting and intelligent dialog set this movie apart. I prefer this version to the good 1982 remake.

8. Them! (1954) – The predecessor of all giant bug films has policeman James Whitmore and professor Edmund Gwenn discovering unexpected consequences (giant man-eating ants) of atomic testing in the desert. I like the suspense and the fact that the ants aren’t seen until well into the movie.

9. 20,000 Leagues Under the Sea (1954) – The Disney film is really more adventure than science fiction. I was fascinated by the Nautilus submarine and as a young man loved the 1960s exhibit featuring it and sets from the movie in a display on Main Street of the Anaheim Disneyland (where I ended up working many years later while going to school at UC Irvine).

10. It! The Terror from Beyond Space (1958) – Very tense film in which a crew lands on Mars to investigate the disappearance of all but one member of a previous expedition. On the return to earth they discover that they have a very unfriendly stowaway. This movie was a direct inspiration for the 1979 film “Alien”.

Honorable Mention – There are many films that didn’t quite make it into my top 10 list. The Atomic Submarine (1959) – Great scene in the alien saucer in total darkness, and innovative electronic music effects. The Trollenberg Terror (1958) – Known as The Crawling Eye in the U.S., Forrest Tucker is menaced by, well, giant crawling eyeballs in creepy fog. Attack of the Crab Monsters (1957) – Cheap but effective Roger Corman production has people trapped on an island. Fiend without a Face (1958) – Canadian production with very cool crawling brain creatures. When Worlds Collide (1951) – Earth must be evacuated before it’s too late. The Beast from 20,000 Fathoms (1953) – Nice Ray Harryhausen stop-action effects when the beast meets its end on a roller coaster. It Came from Beneath the Sea (1955) – More Ray Harryhausen effects featuring a giant octopus in San Francisco. This Island Earth (1955) – Earth scientists try to help the planet Metaluna against the Zagons. Earth vs. the Flying Saucers (1956) – The title says it all. Kronos (1957) – Earth is menaced by an enormous energy-collecting machine. The Monolith Monsters (1957) – The monsters are huge crystalline structures. The Man from Planet X (1951) – Some very scary scenes when people approach the spacecraft. Rodan (1956) – I like the early scenes in the mine before the appearance of the two flying dinosaurs.

Posted in Top Ten | Leave a comment

Correlation and Causation – Cities, Race, Crime

Correlation can indicate possible causation but correlation doesn’t prove causation. A common example that often appears in media is the relationship between the percentage of Black residents in a city and the violent crime rate. There’s a very strong statistical correlation between the race and crime variables.

The graph below plots data for 14 large U.S. cities. The Pearson R-squared is 0.82 — very strong. But this doesn’t necessarily mean that the minority-ness of a city causes violent crime. It’s just correlation. The only thing you can say with some confidence is that the violent crime rate in cities with large percentages of minority residents is much higher than in cities with low percentages. Other potential causes of high violent crime rates include percentage of children born out of wedlock, absence of fathers in the family unit, embedded culture, low education level, and so on. Statistics can suggest causes but only controlled experiments can prove causation.

The x-axis is the percentage of Black residents in a city. The y-axis is the violent crime rate per 100,000 residents. The raw data comes from FBI crime statistics.

In most cases, how statistics are used is up to human interpreters. One of the differences between classical statistics and machine learning is that machine learning is usually more predictive than classical statistics. For example, suppose you want to place a bet on one of two sports teams. Classical statistics might look at R-squared correlations, graphs, and all kinds of tables, and then you could use the data, combined with human intuition, to pick one of the two teams to bet on.

A machine learning approach might consist of a deep neural network that ultimately outputs a team to bet on, perhaps with an estimated probability that the selected team will win.

Of course, the difference between classical statistics and machine learning isn’t clear cut. There’s a lot of overlap between the two, and it’s a really matter of perspective.

Three clever photos using forced perspective.

Posted in Machine Learning | Leave a comment

A Minimal PyTorch Complete Example

I have taught quite a few workshops on the PyTorch neural network library. Learning PyTorch (or any other neural code library) is very difficult and time consuming. If beginners start without knowledge of some fundamental concepts, they’ll be overwhelmed quickly. But if beginners spend too much time on fundamental concepts before ever seeing a working neural network, they’ll get bored and frustrated. Put another way, even an experienced developer shouldn’t start with a PyTorch LSTM network, and on the other hand, he shouldn’t start with four weeks of learning about low-level details of Tensor objects.

To deal with this learning difficulty issue I created what I consider to be a minimal, reasonable, complete PyTorch example.

The idea is to learn in a spiral fashion, getting an example up and running, and then gradually expanding the features and concepts. My minimal example hard-codes the training data, doesn’t use any test data, uses online rather than batch processing, doesn’t explicitly initialize the weights and biases, doesn’t monitor error during training, doesn’t evaluate model accuracy after training, and doesn’t save the trained model. Even so, my minimal example is nearly 100 lines of code.

Some of my colleagues might use the PyTorch Sequential() class rather than the Module() class to define a minimal neural network, but in my opinion Sequential() is far too limited to be of any use, even for simple neural networks.

The training data is just 6 items from the famous Iris Dataset. Each item consists of four predictor values (sepal length and width, petal length and width) and a species to predict (0 = setosa, 1 = versicolor, 2 = virginica).

Even though I’ve coded hundreds of neural networks in many different ways, I underestimated how much information is contained in even a minimal neural network. Almost every line of code requires significant explanation — up to a certain point. When I use the minimal example in a workshop, I could easily devote over 8 hours of discussion to it. But that would defeat the purpose of a minimal example.

Weirdly, I think the complexity of neural networks with PyTorch is an appealing factor in some way. It creates an intellectual challenge that appeals to a competitive personality.

Somewhat unfortunately, there’s a lot of work that has to be done in order to set up a PyTorch environment to run a minimal example. Briefly, you have to install a Python distribution (I strongly prefer and recommend Anaconda), and then install PyTorch (and usually TorchVision if you work with image data). It doesn’t seem like much, but there’s a lot that can go wrong.

Interestingly, research has shown that men are much more competitive than women on average, and that men and women compete differently. See the Harvard Business Review article at Left: The current U.S. record holder in the javelin throw, Breaux Greer (since 2007). Center: The current world chess champion, Magnus Carlsen (since 2016). Right: Personal appearance is a form of competition (since approximately 3000 BC).

# PyTorch 1.5.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(4, 7)  # 4-7-3
    self.oupt = T.nn.Linear(7, 3)
    # (initialize weights)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = self.oupt(z)  # no softmax. see CrossEntropyLoss() 
    return z

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin minimal PyTorch Iris demo ")
  # 1. set up training data
  print("\nLoading Iris train data ")

  train_x = np.array([
    [5.0, 3.5, 1.3, 0.3],
    [4.5, 2.3, 1.3, 0.3],
    [5.5, 2.6, 4.4, 1.2],
    [6.1, 3.0, 4.6, 1.4],
    [6.7, 3.1, 5.6, 2.4],
    [6.9, 3.1, 5.1, 2.3]], dtype=np.float32) 

  train_y = np.array([0, 0, 1, 1, 2, 2], dtype=np.long)

  print("\nTraining predictors:")
  print("\nTraining class labels: ")

  train_x = T.tensor(train_x, dtype=T.float32).to(device)
  train_y = T.tensor(train_y, dtype=T.long).to(device)

  # 2. create network
  net = Net().to(device)    # could use Sequential()

  # 3. train model
  max_epochs = 100
  lrn_rate = 0.04
  loss_func = T.nn.CrossEntropyLoss()  # applies softmax()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)

  print("\nStarting training ")
  indices = np.arange(6)
  for epoch in range(0, max_epochs):
    for i in indices:
      X = train_x[i].reshape(1,4)  # device inherited
      Y = train_y[i].reshape(1,)
      oupt = net(X)
      loss_obj = loss_func(oupt, Y)
    # (monitor error)
  print("Done training ")

  # 4. (evaluate model accuracy)

  # 5. use model to make a prediction
  print("\nPredicting species for [5.8, 2.8, 4.5, 1.3]: ")
  unk = np.array([[5.8, 2.8, 4.5, 1.3]], dtype=np.float32)
  unk = T.tensor(unk, dtype=T.float32).to(device) 
  logits = net(unk).to(device)
  probs = T.softmax(logits, dim=1)
  probs = probs.detach().numpy()  # allows printoptions


  # 6. (save model)

  print("\nEnd Iris demo")

if __name__ == "__main__":
Posted in PyTorch | Leave a comment

Why Virtual Meetings are OK but Virtual Conferences Don’t Work

Online meetings can be successful if they’re 30 minutes or less and there’s a well-defined agenda. But trying to convert an in-person conference to a virtual conference just doesn’t work.

Briefly, an online conference fails because there’s no interaction, there are too many distractions, and there are no side effect benefits.

I often (about once a month or so) speak at tech conferences. Most of them are in Las Vegas. Well, I guess I should say that I used to speak at tech conferences. The covid-19 pandemic has eliminated in-person conferences as I write this blog post. Many of the conferences that were scheduled are attempting to go virtual and make the conference an online event. That’s just not going to work.

The reasons why online virtual conferences fail aren’t really technical. The reasons why they fail is due to human psychology and behavior.

First, in an online scenario, there’s no real interaction between attendees. Much of the value that attendees (and speakers) get from a conference derives from person-to-person interaction. Most of these interactions are unplanned and take place between session talks.

Second, humans just can’t concentrate for more than about 30 minutes when online. After 30 minutes, attention wanders, the email app opens up, the TV or music turns on, the dog starts asking to be walked, and so on. In a person-to-person event, there is constant sensory stimulation — movement in the conference room, voice modulation and gesturing from the speaker, attendees asking unexpected questions, and much more. All these things help attendees stay focused, often for hours at a time.

Third, attending an in-person conference in Las Vegas, or anywhere else, gets attendees out of their normal environment. In an online conference scenario, attendees are in their same old environment. Getting into a different environment has an energizing effect and when attendees return to their workplace, they’re recharged and more creative (there’s some solid research evidence on this, but mostly it’s common sense).

An online technical conference has no advantages over just watching a presentation on YouTube.

I’m already seeing online event overload with my colleagues. More and more of my colleagues are bailing out on virtual events that last longer than just a few minutes. Online virtual conferences just don’t work.

Left: There is a lot of energy at tech conferences. Center: Most conferences have an expo with all kinds of interesting vendors. Right: All the major tech companies are represented at tech conferences.

Posted in Conferences | Leave a comment

Understanding Variational Autoencoders – for Mere Mortals

I contributed to an article titled “Understanding Variational Autoencoders – for Mere Mortals” in the May 2020 edition of the PureAI Web site. See

Variational autoencoders (VAEs) are a type of deep neural network. VAEs are used to generate synthetic data such as synthetic images of people or synthetic music that sounds as though it was written and played by real musicians.

In terms of architecture, VAEs are closely related to regular autoecoders (AEs), however, AEs are used for other purposes than generating synthetic data. AEs are most often used for 1.) image visualization via dimensionality reduction, 2.) data denoising, 3.) data anomaly detection.

In the article, I describe why regular AEs don’t work well for generating synthetic data, and why VAEs do work well. In the image below, regular AEs on the left leave gaps in the graph but VAEs on the right give more coverage. This results in VAEs having a much better ability to generate synthetic data.

For the article, I created a VAE for the MNIST digit dataset — handwritten digits between ‘0’ to ‘9’. Then I used the VAE to generate 100 synthetic digits. Many of the synthetic digits look quite realistic.

There are other neural architectures that can generate synthetic data. Two other generative deep neural architectures are GANs (Generative Adversarial Networks), usually used for image generation, and BERT (Bidirectional Encoder Representations from Transformers), for natural language processing.

A question that’s not so easy to answer is, “Why is there so much research effort aimed at generative models?” The usual explanation from researchers is that understanding how deep neural systems can generate synthetic data may give insights into cognition — how humans acquire and learn knowledge. This would be a giant step forward towards the goal of creating general AI systems.

Other questions related to the ability to generate realistic synthetic data concern security. I commented that there’s no question that the ability to create realistic synthetic data using deep generative models such as VAEs raises serious security issues. I suspect that these sophisticated generative systems will increase the importance of ways to verify the provenance and authenticity of digital artifacts.

Posted in Machine Learning | Leave a comment

Why Does PyTorch Have Three Different softmax() Functions?

I’ve been using the PyTorch neural code library since version 0.2 in early 2017 and I like PyTorch a lot. Even though PyTorch is slowly but surely stabilizing, there are still quite a few things about PyTorch that let you know it’s a young library. For example, why does PyTorch have three different softmax() functions? There is a tensor.softmax() method, a tensor.nn.functional.softmax() method, and a tensor.nn.Softmax() class.

As an open source library grows and evolves, there are constant changes in the library structure because it’s just not possible to anticipate what new features will be needed, and what features will be dropped. Additionally, as library contributors come and go, each guy can possibly make changes to the library structure.

So, a certain amount of structural chaos is pretty much inevitable in PyTorch. This makes learning PyTorch somewhat more difficult than it could be because there are several ways to do anything, and best practices and standard techniques won’t emerge and gel for a couple of years.

Here’s a short demo program that illustrates three ways to compute softmax() on a tensor:

# PyTorch 1.4.0 Anaconda3 5.2.0 (Python 3.6.5)
# CPU, Windows, no dropout

import numpy as np
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

def main():
  print("\nWhy are there 3 softmax() in PyTorch? \n")

  my_arr = np.array([1, 2, 3], dtype=np.float32)
  my_tensor = T.tensor(my_arr, dtype=T.float32).to(device)

  print("source tensor: ")

  print("T.softmax(), T.nn.functional.softmax(), \
    T.nn.Softmax(): \n")

  t1 = T.softmax(my_tensor, dim=0).to(device)

  t2 = T.nn.functional.softmax(my_tensor, dim=0).to(device)

  C = T.nn.Softmax(dim=0)
  t3 = C(my_tensor).to(device)

if __name__ == "__main__":

The moral of the story is that if you are starting to learn PyTorch, or TensorFlow with Keras, or any deep neural technologies, don’t underestimate how difficult your learning path will be. But on the other hand, don’t get discouraged.

Just like there are multiple ways to do things in deep neural learning, there are multiple ways to spell at college football games. Left: University of Ohio Buckeyes flag team. Center: University of Notre Dame Fighting Irish cheerleader. Right: University of Tennessee Volunteers fans.

Posted in PyTorch | Leave a comment

K-Means Clustering Using Tournament Selection

Not all of my ideas for machine learning algorithms work out well. I usually do quite a bit of thinking before putting an idea down in code, so my failure rate with new algorithms is quite low. But recently I botched thinking through an idea.

My idea (which wasn’t good as it turns out) was to improve the k-means++ clustering algorithm by using tournament selection instead of roulette wheel selection. I coded up an implementation of my idea, and it works, but it doesn’t improve upon k-means++ with roulette wheel. Because k-means++ with roulette wheel is a de facto standard, any new algorithm must be significantly better to be appealing. My idea of tournament initialization is equal at best.

The k-means++ roulette wheel initialization picks a data item to act as a mean. The technique picks a data item in a way that is proportional to the squared distances to the closest means. The consequence is that any data item could be picked, even though a data item that is far from its closest mean will be most likely to be picked (which is good).

My idea of k-means clustering using tournament initialization works, but it didn’t improve upon standard k-means++ initialization. Darn.

For example, suppose there are only five data items (in a realistic scenario there’d be hundreds or thousands) and the distances to the associated closest mean are (2.0, 5.0, 1.0, 4.0, 3.0). The squared distances are (4, 25, 1, 16, 9). You want to pick item [1] because it has the largest squared distance.

Roulette wheel selection picks an item according to the probabilities (4/55, 25/55, 1/55, 16/55, 9/55) = (0.07, 0.45, 0.02, 0.29, 0.17). You will probably get a good data item that’s far from its mean but you could get a poor item that’s close to its means.

Tournament selection with tournPct = 0.80 picks the best item from an 80% of randomly selected items based on distance (not squared distance). My incorrect thought was that this would increase the chances of getting a good item compared to roulette wheel selection. My reasoning was that if the five distances-to-closest-means were very close to each other, then roulette wheel selection could easily give you the fifth best data item.

My reasoning was correct but it didn’t go far enough. The second part is that if the distances-to-closest-means are very close then it doesn’t matter if you get the fifth best data items because all five are pretty good.


But I wasn’t really annoyed at all with myself for coding up an idea that didn’t go anywhere. I learned a lot and picked up some new ideas and coding techniques that could be valuable in the future. As one of many similar saying goes, “Don’t be afraid to try and then fail; be afraid to fail to try.” It doesn’t take too much courage to take risks with machine learning algorithms — you only risk time.

The first moon landing in 1969 is clearly one of the greatest accomplishments in history. There were many heroic men who weren’t afraid to take risks, when failure had serious consequences. Left: Yuri Gagarin, the Soviet cosmonaut who was the first man into space in 1961. Right: The crew of Apollo 11, the first men to reach the surface of the moon. Neil Armstrong, Michael Collins, Edwin Aldrin.

  private static void InitTourn(double[][] data, int[] clustering,
    double[][] means, Random rnd, double tournPct)
    //  k-means init using tournament selection
    // clustering[] and means[][] exist
    int N = data.Length;
    int dim = data[0].Length;
    int K = means.Length;

    // select one data item index at random
    int idx = rnd.Next(0, N); // [0, N)
    for (int j = 0; j (lt) dim; ++j)
      means[0][j] = data[idx][j];

    for (int k = 1; k (lt) K; ++k) // find each remaining mean
      double[] distVals = new double[N];  // to nearest mean

      for (int i = 0; i (lt) N; ++i) // each data item
        // compute distances from data[i] to
        // each existing mean (to find closest)
        double[] distances = new double[k]; // curre have k means

        for (int ki = 0; ki (lt) k; ++ki)
          distances[ki] = EucDistance(data[i], means[ki]);

        int mi = ArgMin(distances);  // index of closest mean
        distVals[i] = distances[mi];
      } // i

      // select an item far from its mean using tournament
      // if an item has been used as a mean distance will be 0
      // so it's very unlikely to be selected again
      int newMeanIdx = TournSelect(distVals, rnd, tournPct);
      for (int j = 0; j (lt) dim; ++j)
        means[k][j] = data[newMeanIdx][j];
    } // k remaining means

      UpdateClustering(clustering, data, means);
  } // InitTourn

  static int TournSelect(double[] vals, Random rnd, double pct)
    // find index of a large value in vals
    int n = vals.Length;
    int[] indices = new int[n];
    for (int i = 0; i (lt) n; ++i)
      indices[i] = i;
    Shuffle(indices, rnd);

    int numCands = (int)(pct * n);  // number random candidates
    int maxIdx = indices[0];
    double maxVal = vals[maxIdx];
    for (int i = 0; i (lt) numCands; ++i)
      int idx = indices[i];  // pts into vals
      if (vals[idx] (gt) maxVal)
        maxVal = vals[idx];
        maxIdx = idx;
    return maxIdx;

  private static void Shuffle(int[] indices, Random rnd)
    int n = indices.Length;
    for (int i = 0; i (lt) n; ++i)
      int ri = rnd.Next(i, n);
      int tmp = indices[ri];
      indices[ri] = indices[i];
      indices[i] = tmp;

  private static double EucDistance(double[] item,
    double[] mean)
    // Euclidean distance from item to mean
    double sum = 0.0;
    for (int j = 0; j (lt) item.Length; ++j)
      sum += (item[j] - mean[j]) * (item[j] - mean[j]);
    return Math.Sqrt(sum);
Posted in Machine Learning | Leave a comment