Posted on April 4, by Neal Caren Note: The tutorials assume no prior knowledge of Python or text analysis. In September ofScience magazine printed an article by Cornell sociologists Scott Golder and Michael Macy that examined how trends in positive and negative attitudes varied over the day and the week.

To do this, they collected million Tweets produced by more than two million people. They found fascinating daily and weekly trends in attitudes.


While some of this big data is only numbers, much of it also consists of text. Sociologists have long had tools to assist us in coding and analyzing dozens or even hundreds of text documents, but many of these tools are less useful when the number of documents is in the tens of thousands or millions.

Luckily, computer scientists have been working for quite a while on exactly this data problem—how do we collect, categorize and understand massive text databases.

The major challenges are 1 collecting and managing the data, 2 turning the text into numbers of some sort, and 3 analyzing the numbers. The third step involves techniques familiar to many quantitative researchers. Based on their supplementary fileit appears Golder and Macy used Stata to analyze the data.

While you can do this sort of analysis using one of several different programs or languages, one commonly used for this sort of quantitative text analysis is Python. It is free, used by millions so there are lots of resources availableand relatively straightforward to learn.

Or, you can just Google it. But, if I include any mistakes, please leave a comment or email me. And if you just want the code for this sentiment analysis, feel free to download it.

The words before your dollar sign will be different than mine, depending on your current directory and other factors.

On a Windows machine, you are likely to see something like C: You might have Python 2. Before we go any further, you might want to know how to get out of Python.

Just type exit, followed by an open and close parenthesis: On a Windows machine, you type exit without the parentheses and the command line will go away. In this case, a simple way to start is with one tweet.

