Okay, so I ran into an issue while working on a project at work where an internal web service was returning a large string of text as driving directions that apparently was copied and pasted out of a word document.  The problem was that unicode characters (tiny rectangle representing a list item bullet for example) were strung all throughout the text.  Just imagine a blurb of text that is about 2000 characters but there isn't a single bit of formatting in it.  So I set out to find some code since I knew that someone had to have had this problem at some point or another.  I ran into the following, which ended up being exactly what I was looking for:

The blog post entitled "A .NET Unicode Puzzle (Revised)" had the answers I sought.  Below is an example of the method that I ended up using in my solution.

public static string RemoveUnicode(string s)
{
    try
    {
	string normalized = s.Normalize(NormalizationForm.FormKD);
	Encoding ascii = Encoding.GetEncoding(
	      "us-ascii",
	      new EncoderReplacementFallback(string.Empty),
	      new DecoderReplacementFallback(string.Empty));
	byte[] encodedBytes = new byte[ascii.GetByteCount(normalized)];
	int numberOfEncodedBytes = ascii.GetBytes(normalized, 0,
        normalized.Length, encodedBytes, 0);
	string newString = ascii.GetString(encodedBytes);
	return newString;
    }
    catch
    {
	return s;
    }
}
Share and Enjoy:
  • Digg
  • del.icio.us
  • Google Bookmarks