How the Ngram Viewer Works
An Ngram, also called an N-gram, is a statistical analysis of text or speech content to find n (a number) of some sort of item in the text. The search item can be all sorts of things, including phonemes, prefixes, phrases, and letters. Although an Ngram is obscure outside the research community, it is used in a variety of fields and has a lot of implications for developers who are coding computer programs that understand and respond to natural spoken language. In the case of the Google Books Ngram Viewer, the text to be analyzed comes from the vast number of books in the public domain that Google scanned to populate its Google Books search engine. For Google Books Ngram Viewer, Google refers to the body of text you are going to search as the corpus. The Ngram Viewer aggregates by language, although you can separately analyze British and American English or lump them together. Using Google’s Ngram Viewer, you can drill down into the data. If you’d like to search for the verb fish instead of the noun fish, you can do so by using tags. In this case, you’d search for fish_VERB. Google provides a complete list of commands other advanced documentation for use with Ngram Viewer on its website.
What Is Ngram Showing?
Google Books Ngram Viewer outputs a graph that represents the use of a particular phrase in books through time. If you entered more than one word or phrase, each one is represented by a color-coded line to contrast with the other search terms. This is similar to Google Trends, only the search covers a longer period.
Case Study
Consider the case study of vinegar pies. They’re mentioned in Laura Ingalls Wilder’s Little House on the Prairie series. Exploring with Google’s web search to learn more about vinegar pies reveals that they’re considered part of American Southern cuisine and are indeed made with vinegar. They hearken back to times when not everyone had access to fresh produce at all times of the year but is that the whole story? Search Google Ngram Viewer for vinegar pie, and you’ll encounter some mentions of the pie in both the early and late 1800s, a lot of mentions in the 1940s, and an increasing number of mentions in recent times. However, with a smoothing level of 3, you see a plateau over the mentions in the 1800s. Because there weren’t a lot of books published during that time and because the data is set to smooth, the picture is distorted. Probably only one book mentioned vinegar pie, and it was averaged to avoid a spike. By setting the smoothing to 0, you can see that this is precisely the case. The spike centers on 1869, and there’s another spike in 1897 and 1900. It’s unlikely that nobody talked about vinegar pies the rest of the time: There were probably recipes floating all over the place, but people didn’t write about them in books, and that’s an important limitation of Ngram searches.