Reading a Unicode File One Character at a Time using C#

As part of a recent project, I needed to convert a file that used a custom format into a plain text file. The majority of the custom file consisted of sentences with Unicode encoding characters. But there were some custom bytes added too that I wasn’t sure of, and I wanted to remove if possible.

So I needed to dissect the custom file one character at a time. I was using the C# language and I found that the information available on the Internet was not much help.

I ended up using a low level approach where I read each 2-byte character, one at a time. I used the C# BinaryReader class. The code looks like this:

static void Main(string[] args)
  string fn = "..\\..\\WeirdFile.phr";

  FileStream ifs = new FileStream(fn, FileMode.Open);
  using (BinaryReader br = new BinaryReader(ifs,
    byte[] bytes = new byte[2];
    long len = br.BaseStream.Length;
    while (br.BaseStream.Position < len)
      bytes = br.ReadBytes(2);
      char[] c = Encoding.Unicode.GetChars(bytes);

  Console.WriteLine("\nDone \n");

There’s actually quite a lot going on in this short amount of code. My next step is to use this code to examine the file and try and figure out the meaning of the bytes that aren’t part of the text.

This entry was posted in Miscellaneous. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s