TheRarestWords | RarestNews | Suggestan | TheCraziestIdeas| SemanticKernelBot | Flim.me | My dev.blog | Йои Хаджи


Archive for the 'ideas' Category

31st Oct 2008

Semantic kernels can mean better (broader) search: usage #1

I’ve posted a usage example of how sematic kernels can be applied to real life. For those of you, who (like me) was wondering “how the heck Semantic Web is going to be better” - here’s a small part of my view of it.

The example is “traditional” search of Twitter for Linux vs “semantic kernel search” for same phrase. It uses a small random part of kernel - it is done just because I didn’t want to load Twitter servers with hundreds of queries.

The demo is here: http://semkernel.com/tweet/linux

The idea is that if you search for all the words that are closely related to the topic - you’re going to find broader results. It’s unnatural for people to write obvoius (for them) things, like if I say: “Visited St.Basil’s Cathedral today - the weather was nasty” - it would be unnatural for me to write “Visited St.Basil’s Cathedral in Moscow, Russia today” because it’s redundant information.

Another example would be “I’ve seen Obama today” vs “I’ve seen Obama today. Obama is a democrat. Democrats are political party. Political party is politics.” - you see how dumb should be phrase so that “traditional” keyword search find the “I’ve seen Obama today” on “politics” query.

Okay, if you enter politics into Google - you would probably find Obama’s site, but… that takes a lot of people linking it with “politics” phrase. And that leaves very little choice over what you’ll find. Actually mostly no choice - you wouldn’t find phrase “I’ve seen Obama today” in any way in search engine, because nobody would be linking to it. That leaves us with mainstream news sources. That’s exactly why blogging is dying today - single bloggers can’t compete with big corporate bloggers, who can buy media love.

Ok, back to the “St.Basil’s weather” phrase. So, if you were searching for “Russia” or “Moscow, Russia” - you would NOT find my post, because there’s “implied” knowledge, which is hidden inside of a phrase. Traditional search engines can’t see it.

But if the next-gen search engine would’ve used “semantic kernels” to expand the result - you’d have found it, because keyword “basil’s cathedral” is in semantic kernel of “Moscow” (although not yet in the semantic kernel of “Russia”). But this technology is still young and developing and actually those semantic kernels are only partial of what can be achieved.

Semantic kernels can mean that search engines would find your page if it has quality information without gazillions of links. Actually linking has nothing to do at all with relevancy search. Well, it SHOULDN’T have anything to do…

There wouldn’t be “googlebombing” if linking wasn’t a major part of Google’s algo. And the way they “solved” the problem now means that you have to write “democrats are political party, which is politics” so that you can be found on “politics” query.

And the search is only one field where semantics can be a major step forward.

That one of the usages my semkernel.com is going to try to help people with. The technology is there to try.. now.

Posted in ideas | Comments Off

30th Jul 2008

Global Web Functions marketplace - a possible machine for making millionaires out of programmers

Well, I’ve been playing with my new toy, which might replace “Craziest Ideas” as it has a little more usefulness in it. Well, it’s actually a kind of “Google Sets”, but using slightly different technology (”Google Sets” particularly looks for <ul><li>red<li>white<li>blue</ul> on the Web).

Okay, so I’ve been playing with the “web framework” phrase when suddenly I’ve got a million dollar phrase: “web functions” :) Well, don’t get too excited - this million is up for grabs but it’s not low-hanging fruit.

Please note that this is only an IDEA of CONCEPT, not a description of some real framework/library.

The concept is simple. We have a lot of API’s scattered around the Web in RSS, REST, XML, JSON, Atom, etc… Each of them has it own rules, registrations, signing mechanisms, etc. etc.. etc…. More popular ones get more attention, so the libraries are available in more languages, others are less popular, so you have to roll your own.

Okay, but why don’t someone (Amazon, Google, Yahoo, Facebook, I’m looking in your direction) actually create an open platform for remote calls, so that every API could be called with a simple call in one huge database of APIs. (”open” as in “we welcome all developers and programmers”, cause “open sourcing” here wouldn’t really be too applicable, because of billing involved…) So, if I want Google Images for “mars”, so I go to some site, let’s say globalwebfunctions.com (it’s not an actual site)  and search for Google Images, I find google_images call to be what I need (and also google_images2, google_images_with_descriptions or google_images_by_color - each of those are developed by independent developers, some of which would be doing exactly the same, but maybe for different price) and let’s say I do this in PHP:

$wf=new GlobalWebFunctions('my_login','my_secret_key');
$list_of_images = $wf->call('google_images','q:mars', 'expect:list');

So, now my program connects centralized site, finds out what server is responsible for google_images function, signs my request, deducts let’s say 0.1 cent from my account and returns me google images.

Here’s a kicker. Independant programmers could write those simple reusable functions, like:

submit_to_digg('url:myurl')
digg_get_my_recommendations('login:rarestwords')
define_from_urbandictionary('busted')
weather_in('city:Chicago','when:today', 'in:Celsius')
flickr_creative_commons_image('big sale', 'expect:jpeg')
get_page_obey_robotstxt('url:http://therarestwords.com/','ua:TheRarestParser/0.4b')
geolocate('city:Chicago')
resize_image('jpg:'+$myjpeg, 'w:800', 'h:600')
get_wordfrequency('disobedient')
big_distributed_table_set('n:user_1353_name', 'v:Mr V.')
alexa_grep('q:<li>(.*?)</li>')
convert_xml_to_json('q:<book><title>test</title></book>')

or maybe even:

map_reduce('mapper:global-function:resize','reduce:global-function:group_images_by_color')

Now for the interesting part. Some of those functions could be free, some could be paid (to cover the traffic expenses and machine time), so now anyone would register on that central site and become either developer or prgrammer:

Developer

Develops Global Web Functions - places the code on his own server in his own language of choice, using some kind of Global Web Functions API in his language of choice (Java, C, Perl, PHP, Python, Erlang, you name it…) Earns money for each call or just sets a number of free calls per user per day (per second, etc).

Programmer

Uses those Global Web Functions, pays some parts of cents for the usage :)
The idea is this would solve the learning curve for all those APIs. I’ve never got to the end of most of the APIs. And for most parts the usage patterns are the same. I bet a lot of people use Google Maps only for to display their place on Earth, a lot of people rewrite resize_image function in every possible language and have you ever tried to read Amazon’s APIs, when all you need is s3_put(’bucket’,'file’,'key’,’secret-key’,'text-text-text’) function and similar s3_read ???

Also, other two examples from the smaller world. Some people asked me for API for my TheRarestWords project, particularly to current word frequencies. And if I develop it - it would overload my server without even a cent of profits. I bet a lot of you have a lot of information they could sell or write resize_jpeg function in your language, put a few servers to do it and earn lifetime income :)
There should be a local caching mechanism included into the Global Web Functions API so that get_all_world_color_names() for example could be called just once, not for each furniture store order form load.

More ideas:

add_comment('id:http://rarestblog.com/2008/07/global-web-functions-how-to-make-web-more-interactive', 'comment:This idea really sucks')
get_comments('id:http://rarestblog.com/2008/07/global-web-functions-how-to-make-web-more-interactive', 'expect:html', '<ul><li>[[comment]]</li></ul>')
$instance_id=provision_virtualized_10_percent_part_of_amazon_ec2('duration:10days');
prolong_ec2('instance:'+$instance_id, 'duration: 20days')

and pay 10% price for 7% of resources of minimum machine instead of paying for full (where some enterpreneur buys full machine, divides it to 10 parts and oversells, even earning a profit of 30% for doing nothing)

write_blog_post('topic:World War 3', 'http_post_result_when_ready:url(http://myserver.com/accept)');

as an function-based interface for GetAFreelancer, where someone would manually take care of finding author, making him write and then return article to you
(think Amazon Mechanical Turk)

and even

amazon_mechanical_turk('task:Write an article','http_post:http:.....')

Some other ideas might include: programmers might request particular new function with prepared unit test, you should probably pass “prices:27_jun_2008″ to centralized server so that any call to function that changed it’s price after that period would be blocked. And you have a chance of either agreeing or switching to other similar cheaper function :) Damn, we would have a lot of resize_image_283909230 functions :)
Oh and Global Web Functions API is just a set of protocols that define how to use all those functions from you language, like for python it could be:

wf=GlobalWebFunctions('my_login','my_secret_key')
list_of_images = wf.call('google_images', q='mars', expect='list','price:0.0005')

PHP example is in the beginning (Perl would pretty much be the same). Maybe C:

list_of_images=GlobalWebFunctions.call('my_login', 'my_secret_key', 'google_images', 'q:mars', 'expect:list');

Et cetera…

And it should also define return format. I’d think REST returning JSON would be great idea (sometimes over https) would be great except that it really doesn’t define how to send binary data (like images).

Well :) I have neither idea, nor finances to create this kind of Behemoth :) Also I have some doubts about profitability of this for small startups, but maybe for guys like Google/Amazon it could be a big marketplace to expand their we-rule-all-of-the-world-knowledge efforts :) And yet another way for ultra-profitable Google to disburse cash :)
But, WARNING. If you are going to do this:

  1. The initialization should be as SIMPLE as
    wf=GlobalWebFunctions('my_login','my_secret_key')
  2. The call must be as SIMPLE as:
    list_of_images=GlobalWebFunctions.call('my_login', 'my_secret_key', 'google_images', 'q:mars', 'expect:list');
  3. The call must return NATIVE data structures (arrays, arrays-of-arrays, hashes (if any) or tuples, strings and integers), I don’t any JSON or XML to parse.
  4. The probable return of all functions in languages that doesn’t support Exceptions should be array of ( array of ( status=’ok’, exception ), actual_data), so that I could check ret[0][0]==’OK’ before proceeding, in those who support - well, only the result, but with throwing the Exception where appropriate.
    array( array('OK'), actual_result)
    array( array('ThrottleException','You are requesting too fast'), '' )
    array( array('InputException','Input values are wrong'), '' )

Just in case you want to tell me something - e-mail me rarestwords@mail.ru .

I don’t think Open Source community could raise something like this, but I might be wrong. If you believe in it more than I do - well, let’s try. My mail is rarestwords@mail.ru with ideas or what resources you could provide. But we would need servers, programmers in different language, money to pay for traffic and promotion, without any guarantee it’ll at least pay for itself :)

Posted in ideas | Comments Off

26th Jul 2008

Another project coming soon from the land of Suggestan

Probably within 24 hrs I would release my 4th hobby project (In case you’ve just turned on your TV - the first three are The Rarest Words, The Rarest News and The Craziest Ideas) - the 4th is called “Suggestan“.

The Webster defines “Suggestan” as “1. geo. A little country where everyone is suggesting something.” Ok, I’m just kidding :) The project is going to be yet another joke project which has some meaning. Expect something kind of like “The Rarest Words” but for the hobbies, proffesions and ideas instead of words. Expect something slightly more complex than 1 textbox under a hobby name. :)
This project is going to once again “tap into crowdsourcing”, and maybe even into “semantics” as it’s going to define some relations between words, hobbies, ideas, places, etc. It’s definitely not something revolutionary, but if you like “The Rarest Words” - you’re going to enjoy Suggestan too.

Posted in ideas | Comments Off

23rd May 2008

SEO by example words

While waiting for domain dissection to wrap up, I thought that maybe I could build another index - the rarest/rare words corpus by category. Like this:

       Category: Domain Names

words most used: domain, dns, ip, registrar, hosting, registration
  commonly used: register,  whois
    rarely used: here comes a thousand or so other words

(more…)

Posted in ideas, seo | No Comments »