04th Jun 2008
Why am I quiet and what’s next?
Well, as some of you noticed - not much has happened here in last days. That has a few reasons. First is that this project has almost all of what I thought would be useful in this incarnation. It lacks quite a few features, but… the average server load is 2-3 times higher than normal, so I can’t even do the crawl - the server would die. So right now it’s a balance between of what I want it to be and what it could be.
Moving on. The second reason is that I’ve had another itch that I’ve wanted to scratch for years and finally got the idea for solution and I’ve been implementing it.
The idea is that I like Google News and Yahoo News, but they mostly force me to read mainstream opinions. You can’t find a puppy story on main page of Google News
Just kidding. But seriously, I don’t care who praises donation from Islamic Development Bank or why Zimbabwe Opposition Leader is detained by police. It’s just not interesting for me. Why would I read about Zimbabwe if for example I like to read about Linux? Why would I read about Apple Computers if I like Linux (most of IT sites write about anything IT).
Sure, you can try to search for something. Ok, let’s search for something.
Lets search for something like ‘jewelry’? “Jewelry Television accused of false advertising” “Steven Zale Featured on Nationally Syndicated Fox Business Network” “Atlanta Jewelry Store Robbery” … what does this all have to do with shiny things? ![]()
And that’s just an example. But it
The problem is that Google searches only 4500 sources (or so they say). Ok, there’s Technorati and other RSS search engines that search RSS (obviously). But that technology is… well.. a bit trashy… every blog has an RSS (even this one) and there’s a lot of spam and nonsensical rants (like this one). ![]()
Ok, so what if I build a site that:
1) caters to what I want to read, that would skip the rants and would find stories for skiers even if it doesn’t have the word “skiing” in it?
2) has 10 000 or 100 000 sources (yes, no mistake here).
Well, there were obvious problems with this. And I’ve finally found solutions to most of them, except for a few. And I’ve been developing that system for a few of last days and actually testing the prototype as we speak. It has 4 000 sources right now (exactly as Google), but only because my Internet is pretty slow. It has 100 000 potential sources to start analyzing yet and it’s RSS impaired, so it won’t have to read all the blogs in the world to find something useful, it’s actually more of screen-scraping from sites.
Currently it’s learning to read and differentiate between news and boring stuff. After it’s done - the auto-categorizer would be revamped to better suit the needs (the categories it has right now are very stupid sometimes) and if I manage to optimize it to run at a reasonable speed - the site will be made public. ![]()
I can’t tell how useful it’s going to be, but it’s definitely a thing that I’m going to spend some time right now to develop. And as soon as it seems that it shows actual Linux news for Linux guys and Apple news for Mac guys and Zimbabwe Leaders to police guys - it’s going live. ![]()

|
|