Saturday, May 3, 2014

Experimental Blog # 178

Quotations and comments from "Uncharted - Big Data as a Lens on Human Culture" by Erez Aiden and Jean-Baptiste Michel

" ..Google has digitized more than 30 million books. That's about one in every four books ever published."

" ..nearly all irregular verbs are very frequent. Although only about 3 percent of verbs are irregular, the ten most frequent verbs are all irregular." The authors explain that very old, but common, verbs were less subject to change when a new linguistic rule was adopted for new verbs. "The use of -ed to signify the past tense emerged in Proto-Germanic, a language spoken between 500 and 250 BCE in Scandinavia."

" ..our idea was to create a shadow  dataset containing a single record{n-gram} for every word and phrase that appeared in English books." Except for those "that had been written only a handful of times." These measures are to avoid copyright infringement and "hacking".

" ..what we measure with n-grams is not fame itself but a simplification, a fame facsimile."
The authors have compiled a list of people born from 1800 to 1949 that they call the 150 valedictorians; whose full names have turned up the most number of times in their data base of 500 billion words. However, it seems that the most famous, not great and certainly not most popular, are those people who are often referred to by their last names only.

The top 10, paired with their "valedictorians" are: #1- Adolf Hitler{1889- Jawaharlal Nehru}, #2- Karl Marx{1818- with self}, #3- Sigmund Freud{1856- Woodrow Wilson}, #4- Ronald Reagan{1911- with self}, #5- Joseph Stalin{1879- Albert Einstein}, #6- Vladimir Lenin{1870- Frank Norris}, #7- Dwight Eisenhower{1890- Ho Chi Minh}, #8- Charles Dickens{1812- with self}, #9- Benito Mussolini{1883- William Carlos Williams}, and #10- Richard Wagner{1813- Henry Ward Beecher}.

No comments:

Post a Comment