Deep neural systems based on Transformer Architecture (TA) have revolutionized the field of natural language processing (NLP). Unfortunately, TA systems are insanely complex, meaning that implementing a TA system from scratch is not feasible, and implementing TA using a low-level library like PyTorch or or Keras or TensorFlow is only barely feasible.
The Hugging Face library (I hate that name . .) is a high-level code library (but like the library . .) that makes writing TA systems simple — with the downside that customizing a TA system built on Hugging Face can be very difficult.
I recently started work on a speculative project that will use a TA system. In our first team meeting, we decided that our initial approach will be to start with a Hugging Face model and then attempt to customize it, rather than try to build the system using PyTorch or Keras.
Even though I’ve been a software developer for many years, I forgot how to tackle the project. I incorrectly started by looking at all the Hugging Face technical documentation. I quickly got overwhelmed. After taking a short break, I remembered how I learn technology topics — from specific to general. In other words, I learn best by looking at many small, concrete examples. Over time, I learn the big picture. This is in sharp contrast to how some people learn — from general to specific. Those people start by learning the big picture and then learn how to construct concrete examples.
So, my plan is to look at one or two concrete examples of Hugging Face code every day or so. I know from previous experience that it’s important to have buffer time between explorations. My brain can only accept so much technical information until the effect of psychological interference starts — new information bounces off and interferes with old information.
My first example was a paraphrase analysis. Briefly, two sentences are paraphrases if the essentially mean the same thing. I set up two sentences:
phrase_0 = "Machine Learning (ML) makes predictions from data" phrase_1 = "ML uses data to compute a prediction."
Although the concept of paraphrases is somewhat subjective, most people would say the two sentences are in fact paraphrases of each other. The demo program is remarkably short because the Hugging Face library is so high-level. The demo emitted two associated pseudo-probabilities: the probability that the sentences are not paraphrases, and the probability that the sentences are paraphrases. The pseudo-probability values were [0.058, 0.942] so the model strongly believed the two sentences are in fact paraphrases.
Next step: another concrete Hugging Face example. And then another, and another until the big picture gels in my head.
# paraphrase_test.py from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch print("\nBegin HugFace paraphrase example ") toker = AutoTokenizer.from_pretrained \ ("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained \ ("bert-base-cased-finetuned-mrpc") phrase_0 = "Machine Learning (ML) makes predictions from data" phrase_1 = "ML uses data to compute a prediction." print("\nFirst phrase: ") print(phrase_0) print("\nSecond phrase: ") print(phrase_1) phrases = toker(phrase_0, phrase_1, return_tensors="pt") # print(type(phrases)) # 'transformers.tokenization_utils_base.BatchEncoding' # derived from a Dictionary with torch.no_grad(): result_logits = model(**phrases).logits result_probs = torch.softmax(result_logits, dim=1).numpy() print("\nPseudo-probabilities of not-a-para, is-a-para: ") print(result_probs) print("\nEnd HugFace example ")