, , , , , ,

Imagine a satellite map of Europe, color-coded for language. Add Twitter data. Blogger Frank Jacobs explains how Mike McCandless extracted the open source CDL (Compact Language Detector) software embedded in Chrome and Eric Fischer applied it to Twitter. The result is this graphically attractive map!

Image Source: bigthink.com via Catherine on Pinterest

In his blog, Mike McCandless explains how he applied a few tweaks to Python to detect language, as follows:

    import cld
    topLanguageName = cld.detect(bytes)[0]

“The detect method returns a tuple, including the language name and code (such as RUSSIAN, ru), an isReliableboolean (True if CLD is quite sure of itself), the number of actual text bytes processed, and then details for each of the top languages (up to 3) that were identified.” Mike McCandless thus extracted the Chrome CDL to a standalone library.

Enter Eric Fischer, a brilliant but discrete “geek of maps, failed transportation plans of the past, history of technology, computers, pedestrianism, and misspelled street signs.” (Note to self: MEET THIS GUY!) Eric built upon Mike’s work to produce the Map of Twitter’s Languages now buzzing around the web.

See also Eric’s photostream of “where people post geotagged photos to Flickr from and geotagged tweets to Twitter from”. Thanks to @Ludovic_P_ for first tweeting about this!