TheRarestWords | RarestNews | Suggestan | TheCraziestIdeas| SemanticKernelBot | Flim.me | My dev.blog | Йои Хаджи


20th Jun 2008

The Rarest News

This post is outdated as it decribes previous version of RarestNews.
The current version is under heavy development. Preview can be seen here.

Please welcome The Rarest News :) It’s not yet quite what I bragged it would be, but it has only one reason not to - not enough server power :) Hopefully, AdSense would help with that.

Ok, so what is it? It’s my yet another hobby project (The Rarest Words being the first), it started because I couldn’t find anything interesting on Google/Yahoo News. Politics - politics - politics. I don’t care for politics. There’s much more to the World than Presidents meeting and Britney’s sister childbirth. Oh yeah - I don’t care for pop either.

So, I’ve decided that I could write something better and in fact I tried. First attempt was manual - I’ve added 2000 sources in one day :) But.. that’s not for me. I don’t want to sit all day adding local papers/sites, so I’ve decided that I need to write a system that could monitor and find the news around the Web from tens of thousands of sources (about 26 thousands now).

That is half of what “The Rarest News” is.

The second part of it is personalization. Although it’s not fully done yet. The ideal target is when I login to the site and enter my own category, like “seo in brazil” and see news that are about “seo in brazil” or related to it. It’s pretty much done actually, but the categorizer part sits on very small server (200mb RAM), so I’ve had to settle with entering about 4 thousands common categories and categorizing against them. In future however when resources will allow to - each person would be able to even add his/her name and see what interests them :) Yep, you heard it right - your nickname could be showing you news related to your interests :)
How’s it done? (For God’s sake, please, no more “semantic magic crazeness”.) There’s no magic - it’s simple - Yahoo exposes their search engine via the API and if you do a search on your name - you’ll find your own profiles/chat logs/resumes/et cetera… My algo can (and in fact that what it does) analyze those search results and categorize other news against these. In fact the categories is no more than “searches” that are categorizing the news.

But for now we have to settle with 4000 categories in the system. Oh, when I was adding categories I’ve realized that I’m too lazy to do that too, so I’ve scraped a couple hundred sites for their categories, as a result - some of them are less than perfect in grammar sense :) Sorry ’bout that.

Right now the project is more ofa prototype, rather than fully working project, but you can already have some fun with it. How about reading about those:

UFO and the unexplainable news :) Anti-war news
Real jewerly news - not about thieves who rob jewelry stores (Google News loves to publish such stories as “Jewelry news”, but RarestNews categoriezes them as Crime stories)
How about Viral Marketing news or Data Storage news? (That’s why I called it “The RAREST news” :) )

Video games - oh yeah, hundreds and hundreds of news per day on gaming
Gadget news - can you guess what’s this about?
The chocolate news (there are three buttons, btw, on each page - they control how related the news’ should be - the problem is some categories, like “chocolate”, get too many unrelevant results - clicking “most related” would filter them, but there are pages that are somehow related, but not necessarily about “chocolate” - the are hidden under “Less related (but more news)” - explore!)

I do plan to up the number of “news sources” to at least 100 000, cause right now the projet seems a little boring :)
Right now, the news show only things that are “averagely related”, but when there are enough news sources - I’d up that to “most related” too, so the categorization would be better by default. Right now there’s just not too much to read in “most related” sections.

This entry was posted on Friday, June 20th, 2008 at 11:47 pm and is filed under site.

Subscribe via RSS: or e-mail (the form in right sidebar).