Culturomics: a new digital method?
Previously I pondered how the Wikileaks embassy cables might next be subjected to the numerous tools and software now available to interpret content such as word pattern analysis. A recent example provides some clues. In December 2010 an article published in Science (‘Quantitative Analysis of Culture Using Millions of Digitized Books’) described how a team of some 12 natural scientists carried out the quantitative analysis of more than five million books culled from Google Books and published between 1800 and 2000. This database covers about 4 percent of books ever published; two-thirds are in English and the others Chinese, French, German, Hebrew, Russian, and Spanish. The authors claim that their method constitutes a new field called ‘culturomics’, defined as ‘the application of high-throughput data collection and analysis to the study of human culture’, which ‘extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.’
The term draws from what some describe as the 'omic' revolution in the biosciences. The ESRC Centre for Economic and Social Aspects of Genomics (CESAGen, based at Cardiff and Lancaster) for example is interrogating this through studies of how the capacity to compile and analyse complete data sets in the biosciences is transforming knowledge production. Cesagen is also extending ‘omics’ to include what it calls ‘sociomics’ - the study of the entanglements between the biosciences and the social sciences and humanities that study them.
But back to culturomics. If genomics involves gene-sequencing, then culturomics involves culture-sequencing. You can do your own culturomics using the ‘Books Ngram Viewer’ at the Google Lab site. I entered the term ‘social science methods’ to see its pattern and sequencing since 1800. Unsurprisingly I could see the term take off in the early 1930s and rise steeply to the end of the century. So far not too interesting. But then I could also do more. By clicking on a date range my query was submitted to Google Books where all the books in the database appear allowing me to drill down for the usual kind of data you get from Google Books: in some cases the full text, but more often select page images with the references to the term highlighted; a list of related books (though not confined to the time period); a word cloud of common terms and phrases in the book; a map of the world showing places mentioned in the book; a list of references to the book with hyperlinks to the sources; and bibliographic information.
In other words, beyond the nifty culturomics sequencing the tool opens up numerous possible angles for analysis: terms in context, books in relation, word metrics, and so on.
For some commentators this kind of method marks what Chris Anderson of Wired Magazine in 2008 called ‘The End of Theory.’ Apparently, with enough data, numbers speak for themselves: ‘Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity.’ Mathematical algorithms drive analyses, find patterns and reveal truths and thus context, meaning, and causality are obsolete notions. Google can translate languages without ‘knowing’ them, and so too can all text be analysed. Marketers and advertisers certainly know this and have used it to good advantage.
Linguist Geoffrey Nunberg notes some of the likely objections that will be raised in the humanities against what he calls trivialising quantitative analyses. Yes, the corpus will be used to produce uninformative graphs and to draw insignificant conclusions (like some of the analyses of Wikileaks). No need to worry though he says. These methods are a small addition to the repertoire of the field; there will be some good and not so good analyses, but in the end they will supplement and not replace other methods of literary analysis. But methods are performative and as John Law commented in relation to Wikileaks, we need to inquire about what methods do to what we know and how we act in the world. Rather than an end to theory digital methods call for more theory.