Getting Word Synonyms using WordNet and C#

Recently, I’ve been working with Natural Language Processing. It’s a very large field and one that I’m not very familiar with. I’m looking at the problem of comparing two sentences to compute a measure of similarity so that I can write a clustering algorithm. It became clear to me pretty quickly that for my project at least, it would be useful to be able to determine English word synonyms — words that are very close in meaning such as employment and work, or sum and add_up.

I learned that a dataset from Princeton University research, WordNet, is one of the de facto standards to use for NLP. Note the “Net” in “WordNet” has nothing to do with the .NET framework. The WordNet dataset consists of several text files but is very complex. There are some very good WordNet API libraries for languages such as Java and Python, but the few existing libraries I found for C# (my language of choice) were really bad for my purposes — they were poorly documented, quite buggy, had possible licensing issues, and were overkill for what I wanted to do, which was just find noun and verb synonyms.

So, I decided it would be worth my time to write my own routines from scratch. The hardest part was figuring out the format of the WordNet data files. File index.noun has lines that look like:

employment n 4 4 ! @ ~ + 4 4 13968092 00584367 01217859 00947128

The last four numbers are indexes in the form of byte positions into a second file, data.noun, that looks like:

00947128 04 n 06 use 0 usage 0 utilization 0 utilisation (etc)

Anyway, a full explanation would take pages. I’ll just cut to the chase and say I wrote a lightweight C# class named MyWordNet. It took about 8 hours, with most of that time spent trying to figure out exactly how the data and index files are related. Calling my code looks like:

string wnFilesPath = @"C:\Data\WordNet\3.0\dict";
MyWordNet wn = new MyWordNet(wnFilesPath);

string word = "sum";
string[] nounSynonyms = wn.GetNounSynonyms(word);
if (nounSynonyms == null)
  Console.WriteLine("No noun synonyms found");
  Console.WriteLine("Noun synonyms for '" + word + "' : );
  foreach (string s in nounSynonyms)

The constructor reads files index.noun, index.verb, data.noun, and data.verb into lookup data structures in memory. Getting an array of word noun synonyms involves a single lookup into the noun index data structure following by lookups into the noun data structure.

It was an interesting challenge. I can see that creating a full-featured C# API set into WordNet would take many weeks and require a lot of effort.


This entry was posted in Machine Learning, Software Test Automation. Bookmark the permalink.

2 Responses to Getting Word Synonyms using WordNet and C#

  1. Can u please post your MyWordNet class otherwise it’s not easy to understand.

    Thank you.

    • I will write up the code for Visual Studio Magazine, and publish it there, as soon as I get some free time. Whenever I am going to publish something, magazine editors ask that I not publish the code on my blog site, for some kind of copyright issues. I hope to get the code and article written within a couple of weeks.


Comments are closed.