A Sentence Fill-in-The-Blank Example Using Hugging Face

Deep neural transformer architecture (TA) systems have revolutionized the field of natural language processing (NLP). Unfortunately, TA systems are incredibly complex and implementing such a system from scratch can take months.

Enter the Hugging Face code library. Terrible name, excellent code library.

I’ve been wading through the Hugging Face (HF) documentation examples. I take an example and then refactor it completely. Doing so forces me to understand every line of code. Over time, by repeating this process for many examples, I expect to gain a solid grasp of the HF library.

My latest code refactorization was for a fill-in-the-blank example. I started with a sentence from Wikipedia:

“Machine learning (ML) is the study of computer
algorithms that can learn automatically through experience
and by the use of data.”

I erased to word “learn” to see if the demo program could find reasonable words to fill in the blank:

“Machine learning (ML) is the study of computer
algorithms that can (BLANK) automatically through experience
and by the use of data.”

To cut to the chase, the top five predicted words and their associated pseudo-probabilities were:

learn        (0.3484)
evolve       (0.1901)
operate      (0.0978)
work         (0.0247)
communicate  (0.0224)

Quite impressive.

Even though the documentation code was only about 20 lines, the code was extremely dense and it took me several hours of experimentation to get to the point where I felt I understood most of the key ideas.



Artists have to fill in the blank when the blank is an art canvas. Left: By Andre E. Marty (1882-1974). Center: By Georges Lepape (1887-1971). Right: By Rene Gruau (1909-2004). All three men lived through the beginning of flight to men landing on the moon. Amazing.


Code below.

# fill_blank_test.py
# refactored from Hug Face documentation example

import numpy as np
import torch as T
from transformers import AutoModelForMaskedLM, AutoTokenizer

print("\nBegin fill-in-the-blank using TA ")

print("\nLoading (cached) DistilBERT language model into memory ")
toker = \
  AutoTokenizer.from_pretrained("distilbert-base-cased")
model = \
  AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

sentence = "Machine learning (ML) is the study of computer \
algorithms that can (BLANK) automatically through experience \
and by the use of data."

print("\nThe target fill-in-the-blank sentence is: ")
print(sentence)

print("\nThe actual (BLANK) word from Wikipedia is \"learn\" ")

sentence = f"Machine learning (ML) is the study of computer \
algorithms that can {toker.mask_token} automatically through \
experience and by the use of data."

print("\nConverting sentence to token IDs ")
inpts = toker(sentence, return_tensors="pt")
# inpts["input_ids"]
# tensor([[  101,  7792,  3776,   113,   150,
#           2162,   114,  1110,  1103,  2025,
#           1104,  2775, 14975,  1115,  1169, 
#            103,  7743,  1194,  2541,  1105,
#           1118,  1103,  1329,  1104,  2233,
#            119,   102]])

# for i in range(27):
#   print(inpts["input_ids"][0][i])

print("\nComputing output for all 28,996 possibilities ")
blank_id = toker.mask_token_id             # ID of blank = 103
blank_id_idx = T.where(inpts["input_ids"] == blank_id)[1]  # 15
with T.no_grad():
  all_logits = model(**inpts).logits                 # 3D
pred_logits = all_logits[0, blank_id_idx, :]  # [1, 28996]

print("\nExtracting IDs of top five predicted words: ")
top_ids = T.topk(pred_logits, 5, dim=1).indices[0].tolist()
print(top_ids)

print("\nThe top five predicteds as words: ")
for id in top_ids:
  print(toker.decode([id]))

print("\nConverting raw logit outputs to probabilities ")
np.set_printoptions(precision=4, suppress=True)
pred_probs = T.softmax(pred_logits, dim=1).numpy()
pred_probs = np.sort(pred_probs[0])[::-1]  # high p to low p
top_probs = pred_probs[0:5]
print("\nThe top five corresponding probabilities: ")
print(top_probs)
# [0.3484 0.1901 0.0978 0.0247 0.0224]

print("\nEnd fill-in-the-blank demo ")
This entry was posted in Machine Learning, PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s