14th Nov 2008
More experiments with semantic kernels and TheRarestWords
More science for today. I’m trying to replace SEwOrdizer that I had in TheRarestWords, which I can’t use at this VPS and well, it was kind of disappointing, maybe even useless. So, I’m experimenting with semantic kernels on a more global scale. Well, not 10 million words global, but 80 000 kernels from 80 000 words (actually uni- and bigrams), i.e. I’m trying to build semantic kernels for every word that is now in the TheRarestWords database. I’m using the same process that I use for semantic kernel bot, but on a massive local scale. Well, the first results were a disappontment:
skiing: guests, decide, secure, body, categories, video, list
Even though somewhat related, but not by a long shot. And “secure”??.. tell it to my broken downhill ski
Well, it’s a disappointment more than ex-SEwOrdizer. Ok, but with a few imrovements to the algorithm:
skiing: majestic, magnificent, tourist, shops, guests, route, tennis, minute
Yes, you can see the majestic and magnificent nature views on a tourist ski route, while guests on a ski resort can take minute to visit the ski shops. While tennis is also a sport.
And this isn’t yet a final version. This should replace SEwOrdizer for the time being, I thikn I’ll call it SEmangic (Semantic + Magic). I think it’s due to be released in a few days (probably earlier unless I stumble memory problem somewhere).
And of course If TheRarestWords ever to return on a full scale (i.e. on a dedicated server) - the SEwOrdizer will make it comeback. Probably.
|
|