Saturday, December 18, 2010

A fan of Ngrams

A new toy! I found it not wrapped under the Christmas tree but nestled within an article in the New York Times: "In 500 Billion Words, a New Window on Culture" (read it here). "With little fanfare," writes Patricia Cohen, "Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities."

Scroll down to the third paragraph and click on the link to the Google Labs Book Ngram Viewer (here), and you can type in a pair of words or phrases and see a graph showing the frequency of your terms in the Google Books corpus since 1800. Want to know when the word "relatable" began its climb into ubiquity? It rarely occurs in the corpus before 1940 but then climbs to peak usage around 1980 before declining again (which makes me wonder why I don't remember hearing it before around 2005). Want to know when "center" overtook "centre" as the common spelling? Looks like around 1910. Want to compare the frequency of the names "Eleanor" and "Michelle"? "Eleanor" shows a sharp spike around 1950 and then declines, while "Michelle" is nearly invisible until the 1970 and then makes a gigantic jump upward around the mid-90s. Want to compare the relative frequency of "sunshine" and "clouds" in books written in English since 1800? Clouds consistently come out on top, although we see a sharp upturn in both terms starting around 2000.

What am I going to do with this fun new tool? Play with it, of course. One of these days I may find a way to employ Ngrams in literary analysis, but for now I'm just having fun. And you can too!

2 comments:

michele said...

What a neat tool - thanks for the heads up on it! So far, I'm just playing with it as well, but I've run across a couple of situations where the changing meaning of a word maps nicely on the chart because its frequency changes with its use.

I compared 'somewhat' to your example of 'relatable' since both are words that often make me cringe in undergraduate writing and was surprised to see 'somewhat' is on the decline in published work. Which perhaps explains why it feels so wrong when students pepper their writing with it...

I can't imagine how I'd use it in literary analysis either right now, but if I come up with something, I've got the tool now!

Dr. Rural said...

There's a hell of a big gap between uses of "man" and "woman," although it has been narrowing over time. I can't say this surprised me.