My Projects: TheRarestWords, RarestNews, Suggestan, TheCraziestIdeas, Flim.me, MereFact, SemanticKernelBot, My development blog . Wanna help?


27th Jun 2008

TheRarestParser has been upgraded to 0.4b

I’ve upgraded the bot for TheRarestWords (about TheRarestWords) to 0.4b today, the new version has these improvements:

  • Umlauts are now recognized as letters and actually…. all national letters recognized, except for Japanese, Chinese, etc - the words there are actually phrases and due to the fact they don’t use spaces to separate words - I’ve no idea how to split them into the words. (Ideas, anyone?)
  • External domain redirects are recognized and ignored (this is usually either misspellings or SPAM-like-technique)
  • Internal domains redirects are recognized (META redirects too)
  • Multiple pages instead of just one (if your main page has less than 100 words - the bot goes further up to 10 pages deep, to find some)
  • Frames are now recognized too
  • Improved HTML support (more tolerable to errors)

Also the new bot stores datetime component for words, so now the trends can be built after a few walks around the web (one walk - about 55 days :) since this project still can’t make any money to cover the expenses and it still is on a single server).

This entry was posted on Friday, June 27th, 2008 at 1:48 am and is filed under site.

Subscribe via RSS: or e-mail (the form in right sidebar).