01st Aug 2008
All Wikipedia’s n-grams are REALLY belong to us
A few days ago I’ve thought about Google releasing it’s n-grams in the past. Damn, that was the second thing I’ve wanted to get after the TLD Zone Access Program [I did apply to it, but never heard back from them. In our open information age - the most wanted information even though seem open is usually out-of-reach, like in those 2 cases ]
So, I’ve decided to do the next best thing - build a ngram frequency list from Wikipedia. It’s not quite as big as Google’s (with trillion or so n-grams and 5 DVDs), but the licensing terms are much better (”Free” vs Google’s “$180″, “you can use it” vs “you can’t use it” Google’s license).
Read more and download is here
|
|