My Projects: TheRarestWords, RarestNews, Suggestan, TheCraziestIdeas, Flim.me, MereFact, SemanticKernelBot, My development blog . Wanna help?


18th May 2008

Site’s history: Part III - the HDDs are seeing the light

So there I was with the site with somewhat a thousand people a day visit (from their referrer logs) and some ads. The ads weren’t even getting enough to cover a single server ($130/mth) and I needed three to keep my hobby of collecting the rarest words alive, but now everybody was convinced that my site is gaming the system somehow. Each and every person who stumbled into my site were building a theries of how I’m spamming the Internet. Cute.
Thankfully that was the time that my server HDD gave up living with my high-performance-made-to-kill-queue and died. I won’t even bother you with the details of how my datacenter supplied me other HDD that was dying already. That’s three HDDs in one year — just my luck. No backups, no site, only the list of rarest and popular words around the Internet on my local computer. Great! No server, no data and a bad rep all over the Net.

A few months I wasn’t doing anything about the project, until May. When I had an interesting idea. What if I compare the data from January with data from May. Probably there are words that have risen in popularity and some that have downward trend (you know, like Google Trends, but en mass).

Okay, so what’s the plan? Let’s go over the Internet once again. Probably people forgot all those things from January. I did everything from scratch again. This time I’ve groupped my sites into packs of 100’s and put them into MySQL database so that server won’t kill HDD again, that solved queue problem — now I’ve had only a million entries in database instead of 100 millions.

I’ve written «interesting stuff is due to arrive near May 20-25th» about the fact that I’m going to build trends by that moment. At least that was what I wanted to do ever since I discovered «django» - everybody needs to be kept current in their industry, so why not automate it? I could detect that words are gaining popularity.

The second problem that I’ve stumbled upon was domain sponsored ads. Those are unbelieveable. I mean if you go through all 100 millions of registered domains — 80 millions of them are ads (80%)! Full-page, no-content ads! I did eventually get to partly solve it with ip-blocking, but still they are there in huge numbers.

So, the spiders are out and now it’s time for the site. Site is done, but still it’s boring. I’ve decided to let people write something about the word they know something about or something funny. 100 letters about a word — why not?

I also decided that it should be ajax-based so that people won’t have to re-load page, and that’s when I’ve got the idea that it could be like a game! I’ve been coding all weekend and got a working system that if someone changes the description of the word that is on your page — you see that change! I’ve started waiting for first entry to appear.

This entry was posted on Sunday, May 18th, 2008 at 2:44 am and is filed under site.

Subscribe via RSS: or e-mail (the form in right sidebar).