21st Aug 2008

TheCraziestIdeas reborn

TheCraziestIdeas has been expanded. It has now much more potential and an input field :) It’s still a joke project, but with more abilities. The “bikini inspector” query now yields better things, like “blouses exterminator”, “wax adjuster” and “bikini superintendant”.

It can also be used in some serious ways, like when you are searching for a good idea for domain name or company. Let’s say you have a car club, but the carclub.com is obviously taken. Spin it off with exclamation point, like so: “!car club”, and here you go: carcondo.com, carcafe.com, carsmart.com, etc…

For all you domaineers - there’s export to .com domains list, and there’s option to export to AdWords lists and plain text too.

Enjoy all new TheCraziestIdeas.com .

Posted by admin under site | Comments Off

21st Aug 2008

“Rarest Synonyms” or auto-related words magic

I was quiet for some time now. Actually, I was on vacation in a middle of nowhere, without TV, radio or Internet. Let me tell you something - those who tell you “Russia has two problems - idiots and roads” are wrong about the roads. I drove nearly 1000 miles and the worst problem was local road police extortion, not the roads :) Okay, enough chit-chat, let’s get back to business!

Ever thought of words that are close / almost synonymous to “ufos“? :)

Me neither, but still: ufos, alien, aliens, extraterrestrials, sedition, strange, ghosts, foreigners, predator, crop, extra, ufo, robots, colors, strangers, spooks, monsters, animals, weird, dead, …. Ok, that wasn’t really tought.

How about synonymous words for “TechCrunch“? Easy! Techcrunch, mashable, techmeme, slashdot, digg, engadget, correct, cnet, com, readwriteweb, gigaom, perezhilton.

Magic? No way, you can do it too now. New feature added to TheRarestWords - just go to the word page, as in http://therarestwords.com/word/apple and see for yourself.

If you want to know the technical details of how it’s done - take 3-grams (like from the wikipedia n-grams I released) and search for “red and blue“, “techcrunch or gigaom“, etc. See? Easy!

No, I didn’t use wikipedia’s n-grams, I used much broader sample from the pages all around the Internet with phrases detection and some more of smart stuff, but even if you parse the Wikipedia’s n-grams - you’d still get pretty good results (although, no synonyms for “TechCrunch” and other rare words).

Posted by admin under site | Comments Off

01st Aug 2008

All Wikipedia’s n-grams are REALLY belong to us

A few days ago I’ve thought about Google releasing it’s n-grams in the past. Damn, that was the second thing I’ve wanted to get after the TLD Zone Access Program [I did apply to it, but never heard back from them. In our open information age - the most wanted information even though seem open is usually out-of-reach, like in those 2 cases ]

So, I’ve decided to do the next best thing - build a ngram frequency list from Wikipedia. It’s not quite as big as Google’s (with trillion or so n-grams and 5 DVDs), but the licensing terms are much better (”Free” vs Google’s “$180″, “you can use it” vs “you can’t use it” Google’s license).

Read more and download is here

Posted by admin under downloads, word frequency | Comments Off

30th Jul 2008

Global Web Functions marketplace - a possible machine for making millionaires out of programmers

Well, I’ve been playing with my new toy, which might replace “Craziest Ideas” as it has a little more usefulness in it. Well, it’s actually a kind of “Google Sets”, but using slightly different technology (”Google Sets” particularly looks for <ul><li>red<li>white<li>blue</ul> on the Web).

Okay, so I’ve been playing with the “web framework” phrase when suddenly I’ve got a million dollar phrase: “web functions” :) Well, don’t get too excited - this million is up for grabs but it’s not low-hanging fruit. Read the rest of this entry »

Posted by admin under ideas | Comments Off

28th Jul 2008

All hail the Cuil, SearchMe, Technorati! New age Internet is ripoff-based and we need to evolve because of this.

Short version: If you are user - hail Cuil ! If you are developer/designer/any kind of creative person - possibly fear Cuil !

As you might already know - there’s a new sheriff in town. Well, not quite the sheriff, but rather the bunch of ex-Google guys (or so they say) that have built a new (not quite new) search engine - Cuil (at the moment of writing - unavailble, guess from the load).

Actually I like this engine. Mostly due to the fact that it matches in traffic numbers today to Google - i.e. the number of people came to TheRarestWords from Google at the moment is EQUAL to Cuil’s people. And if TheRarestWords were making money - today I would have been enjoying double profits :) I guess this is only temporary as today everybody is talking about them, anyway. Tomorrow we’re going to see much less traffic than today from them.

But with this great opportunity - there’s also a big evil in Cuil.

Read the rest of this entry »

Posted by admin under site | Comments Off

26th Jul 2008

Suggestan released

So the project “Suggestan” is released. As usual I have no perfect idea of what it is or the direction it is going. Well, it’s kind of “define a thing” project, where you can find or share the knowledge about the subjects/hobbies/professions/ideas that you know in form of suggestive questions.

Well, go and see for yourself and we’ll see if that’s going somewhere besides Trash Bin :) Go Suggestan!

Posted by admin under site | Comments Off

26th Jul 2008

Another project coming soon from the land of Suggestan

Probably within 24 hrs I would release my 4th hobby project (In case you’ve just turned on your TV - the first three are The Rarest Words, The Rarest News and The Craziest Ideas) - the 4th is called “Suggestan“.

The Webster defines “Suggestan” as “1. geo. A little country where everyone is suggesting something.” Ok, I’m just kidding :) The project is going to be yet another joke project which has some meaning. Expect something kind of like “The Rarest Words” but for the hobbies, proffesions and ideas instead of words. Expect something slightly more complex than 1 textbox under a hobby name. :)

This project is going to once again “tap into crowdsourcing”, and maybe even into “semantics” as it’s going to define some relations between words, hobbies, ideas, places, etc. It’s definitely not something revolutionary, but if you like “The Rarest Words” - you’re going to enjoy Suggestan too.

Posted by admin under ideas | Comments Off

20th Jul 2008

Testing SQL engines/queries with Django (avg.query time)

I love Django for many reasons and here’s one of them. Testing average time for queries I’ve done today to compare engines (mySQL vs postgreSQL) is easily done with django.

Read the rest of this entry »

Posted by admin under python, site | No Comments »

20th Jul 2008

I don’t get it - real web application with PostgreSQL vs mySQL MyISAM vs mySQL InnoDB (with Django’s ORM, 2008)

UPDATE: This has been Reddit. Read the comments. The main thing to understand that those results are for default settings of both databases for my case and my priorities. Yours could (and maybe even should) be different.

Well, this and last year I hear everywhere that PostgreSQL is the way to go and that usage of mySQL in 2008 makes people puke… But without any real arguments (besides “Postgres is the way to go”).Well, I don’t usually buy into fashion-style technologies shopping (it’s when someone can’t prove something’s better that what I use) and this time it wouldn’t be an exception.

Ok, so scouring the Internet I’ve found some comparative tests. Mostly in form of “INSERT 10000 items WITH COMMIT AT THE END”. Okay, how many people actually inserted 10000 items in a real web-application (besides dumping-restoring-moving data)? Some people did, but they were both unavailable for comments :) Just kidding.

Ok, so since I’m with Django - moving to Postgres and testing my application (RarestNews) should be a snap, isn’t it? Just change the database string in settings.py and install PostgreSQL, right? Wrong! :) But there’s a time for everything step-by-step.

Read the rest of this entry »

Posted by admin under site | No Comments »

20th Jul 2008

Django ORM + threading = memleak (workaround)

Well, after trying hundreds of ways to make Python’s carbage collector work with Django’s ORM and threading (see here - scroll to “Python 2.5 bug”) and sing many tools (heapy, valgrind) to try to find the leaks (all the tools show 15-30MB used, no leaks, but in reality program uses all available memory and starts to swap within minutes) I’ve to conclude that there doesn’t seem to be a workaround.

I’ve tried:

  1. passing only integers instead of Django objects;
  2. adding +” to strings to make copies of strings, not references, (copy.copy and copy.deepcopy too);
  3. creating threads inside of a threads, hoping that would lose references somehow;
  4. del Object; del everything;
  5. weakrefs;
  6. moving all Django code into a function and only passing integer to it;
  7. disabling Django and only leaving lxml (parsing library) in thread, and vice versa - still leaks;
  8. something else too, but can’t remember all.

The worst part is that nothing detects where those GIGABYTES are going, every tool I used shows 15-30MB memusage.

The workaround I’ve settled for - running separate child processes and connecting to parent “queuer” process via xmlrpc. Takes a lot of memory (each process is 17MB vs some KB for thread) and my guess is that xml isn’t the most effective, but at least no memleaks even if child is running infinite loop.

If you have other ideas to try - let me know.

Posted by admin under site | No Comments »