18th May 2008
How to create a lot of site screenshots with PHP and Opera under Windows
While I was working on filtering domain-sponsored ads, I’ve came to an idea that might help someone. I have compiled a big list of most popular hosting ip’s and was trying to see screenshots of three sites on a single ip to see which have only domain ads (cause I’m too lazy to walk around looking at hundreds of sites - screenshots will do just fine). Here’s how to do that:The new Opera 9.50 (browser) generates screenshots for it’s Speed Dial feature. Opera 9.27 could do that also, but only for 9 sites at a time. 9.50 could generate hundreds of screenshots in parallel (as long as you have enough processor power).
(Windows only - I’ll later tell how to do it in Linux, as soon as I have time to test it):
- Create a directory where your php script will reside (there are a lot of frameworks around, but being Russian I prefer Denwer one, however it’s not available in English - you could try with Google translate), but if you need screenshots - you probably know how to run PHP scripts;
- Create a subdirectory profile/ under where your php script is;
- Install Opera 9.50 (9.27 doesn’t do the trick!) from opera.com and remember where it is. To avoid mistakes, it’s better to install it into a directory without spaces, like d:\testopera\ (like I did);
- Here is the function that you call as getThumbs(array(’http://url.com/’,'http://url2.com/’)); an it’ll start Opera, generate thumbs and then you’d have to close Opera by hand (you can avoid that under Linux with kill command, but in Windows, maybe a short AutoHotKey script could take care of that).
- The screenshots are now in screens/$md5.png # where $md5 is md5(’http://url.com/’)
Download the function here cause WordPress adds a lot of nice curly quotes that break the code.
You have to use PHP5, or you have to define file_put_contents($file, $text) yourself (that’s not very hard).
How fast does it work? Depends on your system and connection. With (very slow) 50KB/s connection and (very fast) Core 2 Duo machine I was able to do 100 screens in a minute mostly due to non-responsive sites (that’s 1 screenshot per second).
There are a few known bugs:
- By no means this is a finished function - it works, but far from being perfect. Consider it a beta.
- It creates a lot of profile directories that you would have to cleanup afterwards (they are in profile/ path);
- It works under Windows only.
If you don’t have patience to wait for Linux version - here are a few tips to DIY:
- You need a working X window system on server and a way to run Opera to do screenshots (Anton writes a very in-depth tutorial about that, but his method only makes 1 screenshot at a time and is not using Opera’s speed-dial functionality).
- Instead of “/Settings” switch for Windows, you need to use “personaldir” switch for linux.
- Permissions and folders work differently than on Windows - remember that.
- At the same link above you could find a way to run multiple Opera’s. Although more than one with a hundred screenshots in parallel is not recommended (it’ll get slooooooow eating cpu).
Also you can use Jake’s method but it only generates 1 at a time, (the orginal link is gone, but the web.archive has the cache).
|
|