TheRarestWords | RarestNews | Suggestan | TheCraziestIdeas| SemanticKernelBot | Flim.me | My dev.blog | Йои Хаджи


Archive for June, 2008

27th Jun 2008

TheRarestParser has been upgraded to 0.4b

I’ve upgraded the bot for TheRarestWords (about TheRarestWords) to 0.4b today, the new version has these improvements:

  • Umlauts are now recognized as letters and actually…. all national letters recognized, except for Japanese, Chinese, etc - the words there are actually phrases and due to the fact they don’t use spaces to separate words - I’ve no idea how to split them into the words. (Ideas, anyone?)
  • External domain redirects are recognized and ignored (this is usually either misspellings or SPAM-like-technique)
  • Internal domains redirects are recognized (META redirects too)
  • Multiple pages instead of just one (if your main page has less than 100 words - the bot goes further up to 10 pages deep, to find some)
  • Frames are now recognized too
  • Improved HTML support (more tolerable to errors)

Also the new bot stores datetime component for words, so now the trends can be built after a few walks around the web (one walk - about 55 days :) since this project still can’t make any money to cover the expenses and it still is on a single server).

Posted in site | No Comments »

22nd Jun 2008

TheRarestNews site is unexpectedly FAST

Well, I barely left the TV when today Russia won against Hollland in soccer and went to shout out a few slogans (there are lots of people shouting and cars honking right now cause this is our greatest achievement in last decade or two), but by the time I was back at the computer - there was 4 news on The Rarest News about that in soccer section. Man, that’s fast! Right now there are more than 10 links to stories about that on main page.

At the same time Google News has not a word about it - only politics!

Posted in site | No Comments »

20th Jun 2008

The Rarest News

This post is outdated as it decribes previous version of RarestNews.
The current version is under heavy development. Preview can be seen here.

Please welcome The Rarest News :) It’s not yet quite what I bragged it would be, but it has only one reason not to - not enough server power :) Hopefully, AdSense would help with that.

Ok, so what is it? It’s my yet another hobby project (The Rarest Words being the first), it started because I couldn’t find anything interesting on Google/Yahoo News. Politics - politics - politics. I don’t care for politics. There’s much more to the World than Presidents meeting and Britney’s sister childbirth. Oh yeah - I don’t care for pop either.

So, I’ve decided that I could write something better and in fact I tried. (more…)

Posted in site | No Comments »

20th Jun 2008

The News project is coming closer

The news project I’ve been talking about is now much closer for public release than ever - currently it uses 9000 news sources, but the number only depends on how much can I optimize the software as it’s fully automatic (I don’t even have to point the sources to it - it does it on its own). I first started it by manually adding news sources, but then I’ve realized that I’d have to pay attention, so I’ve decided to use the Applauso-meter (The Simpsons) :) ok, jk, I’ve decided that I don’t want to do that, so next few weeks it was a matter of writing automated news-adder :)
The project is moving very slowly due to a lot of setbacks. Like when I started it with 100 000 sources and it took scheduler 28 hours to get the idea that it’s 28 hours late to pick up the news (it was still doing the first pass).

Schedule optimizer was also one of the worst thing I’ve yet had to develop. It tries to predict when’s the news coming to a site to optimize the number of visits to that site. The problem is it takes 5-10 hours for it to develop the schedule and only by that moment I can figure out if it works or not.

So, the project has already been restarted like 30-40 times from the ground up, but it finally seems to be working. It still needs a complete rewrite, but for now it should do.

Anyway, stay tuned, the release date is hopefully going to be within a week or maybe two. And if this project could bring in some money to pay for all the servers it uses - who knows - maybe it could even be up to 100 000 news sources this year :)

Posted in site | No Comments »

04th Jun 2008

How people are using the site

I’ve came up an interesting comment by Steven Dowd from http://newton-le-willows.com and thought that it should be posted here as well:

I have been log watching, and noticed quite a number of hits to my sites with rarestwords as the referrer, any hits are a bonus for a personal homepage site such as mine, so I am glad that I found your project and added a little bit into it..

What I have found useful is the ability to lookup similar sites, and sites also using the same key words. I have found that over this last week, I have managed to get my sites onto the top of Google for certain searches, which I have failed to get #1 position before, I believe this is purely down to filtering the content I have on the front page and key word usage that I have fine tuned through the use of TheRarestWords system.

I still think its brilliant, though I really do now think that its most definatly missing a ’search a word’ input box.

Well, the search box is definitely missing, but I’m still thinking of the way to do it without making 3 input boxes at top :) (My site, site search and word search)

So, have you found any usefulness on this site? Share your story.

Posted in site | No Comments »

04th Jun 2008

Why am I quiet and what’s next?

Well, as some of you noticed - not much has happened here in last days. That has a few reasons. First is that this project has almost all of what I thought would be useful in this incarnation. It lacks quite a few features, but…  the average server load is 2-3 times higher than normal, so I can’t even do the crawl - the server would die. So right now it’s a balance between of what I want it to be and what it could be.

Moving on. The second reason is that I’ve had another itch that I’ve wanted to scratch for years and finally got the idea for solution and I’ve been implementing it.

(more…)

Posted in site | No Comments »

04th Jun 2008

Why am I quiet and what’s next?

Well, as some of you noticed - not much has happened here in last days. That has a few reasons. First is that this project has almost all of what I thought would be useful in this incarnation. It lacks quite a few features, but…  the average server load is 2-3 times higher than normal, so I can’t even do the crawl - the server would die. So right now it’s a balance between of what I want it to be and what it could be.

Moving on. The second reason is that I’ve had another itch that I’ve wanted to scratch for years and finally got the idea for solution and I’ve been implementing it.

(more…)

Posted in site | No Comments »