If Only I Have TIme
Wow, it is finish. What a relief because I’ve been trying to this pdf extractor and text filter for most than two months. It involve a lot of thinking and searching for the most best way or trick to do this. For the extracting part it is easy but when it comes to text filtering, hell set loose. I just found out that .Net support unicode not ascii code. Some of the character did overlap and won’t fire any problem. Problem do come when the code exceed a certain limit. I’ve experiance this when I tried to remove a postrophy (‘).
input = “home’s”;
input = input.Replace(“\'”, “”);
The program failed to identified it and this mean that I failed to tell the computer what it should do. At first I was blaming the computer and .Net for unable to recognize the character. After some times (and cursing a lot), I tried to break down the code see line by line what happen inside the process. I was (really3x) supprise that the postrophy value is 8217. This is the reason why the program failed to detect it. So what I do is I make a variable for that postrophy:
char postrophy = (char)8217;
string postrophyS = postrophy.ToString();
And with this, I reset my replace statement and able to detect the evil postrophy. Tq