27th Jun 2008
TheRarestParser has been upgraded to 0.4b
I’ve upgraded the bot for TheRarestWords (about TheRarestWords) to 0.4b today, the new version has these improvements:
- Umlauts are now recognized as letters and actually…. all national letters recognized, except for Japanese, Chinese, etc - the words there are actually phrases and due to the fact they don’t use spaces to separate words - I’ve no idea how to split them into the words. (Ideas, anyone?)
- External domain redirects are recognized and ignored (this is usually either misspellings or SPAM-like-technique)
- Internal domains redirects are recognized (META redirects too)
- Multiple pages instead of just one (if your main page has less than 100 words - the bot goes further up to 10 pages deep, to find some)
- Frames are now recognized too
- Improved HTML support (more tolerable to errors)
Also the new bot stores datetime component for words, so now the trends can be built after a few walks around the web (one walk - about 55 days
since this project still can’t make any money to cover the expenses and it still is on a single server).
I’ve upgraded the bot for TheRarestWords (about TheRarestWords) to 0.4b today, the new version has these improvements:
- Umlauts are now recognized as letters and actually…. all national letters recognized, except for Japanese, Chinese, etc - the words there are actually phrases and due to the fact they don’t use spaces to separate words - I’ve no idea how to split them into the words. (Ideas, anyone?)
- External domain redirects are recognized and ignored (this is usually either misspellings or SPAM-like-technique)
- Internal domains redirects are recognized (META redirects too)
- Multiple pages instead of just one (if your main page has less than 100 words - the bot goes further up to 10 pages deep, to find some)
- Frames are now recognized too
- Improved HTML support (more tolerable to errors)
Also the new bot stores datetime component for words, so now the trends can be built after a few walks around the web (one walk - about 55 days
since this project still can’t make any money to cover the expenses and it still is on a single server).
Posted in site | No Comments »





