TheRarestWords | RarestNews | Suggestan | TheCraziestIdeas| SemanticKernelBot | Flim.me | My dev.blog | Йои Хаджи


Archive for the 'php' Category

22nd May 2008

How to split glued words in domains into parts in PHP/Python

Well, I’ve finally got the idea of algorithm how to split things like belfastjobs.com into Belfast Jobs.com - i.e. detect words glued together. So, as soon as algo finishes crunching 70 million domains (which as you can see is goint to take more than 24hrs) - then you’re going to have a title on your site and all the links to related sites would finally become readable.

If you’re going to do that yourself - here’s how I did it:

(more…)

Posted in php, python | No Comments »

18th May 2008

Organizing a high-performance queue in PHP/MySQL in 6 steps or how to walk 70 million domains without repeating yourself

So, there was the problem of walking through the Internet and hopefully without repeats. THe first solution came as a plain-text file with list of domains and overwriting domain as we read it with new-line characters and reading from random point (fseek) until a non-empty string is found. It led to multiple HDD deaths (random reads hundred times per second is something to avoid). (more…)

Posted in php | No Comments »

18th May 2008

How to create a lot of site screenshots with PHP and Opera under Windows

While I was working on filtering domain-sponsored ads, I’ve came to an idea that might help someone. I have compiled a big list of most popular hosting ip’s and was trying to see screenshots of three sites on a single ip to see which have only domain ads (cause I’m too lazy to walk around looking at hundreds of sites - screenshots will do just fine). Here’s how to do that: (more…)

Posted in php | No Comments »