Suresh Naidu: analyzing the language of political partisanship

Home > data science, musing, statistics > Suresh Naidu: analyzing the language of political partisanship

Suresh Naidu: analyzing the language of political partisanship

October 9, 2012 Cathy O'Neil, mathbabe

I was lucky enough to attend Suresh Naidu‘s lecture last night on his recent work analyzing congressional speeches with co-authors Jacob Jensen, Ethan Kaplan, and Laurence Wilse-Samson.

Namely, along with his co-authors, he found popular three-word phrases, measured and ranked their partisanship (by how often a democrat uttered the phrase versus a republican), and measured the extent to which those phrases were being used in the public discussion before congress started using them or after congress started using them.

Note this means that phrases that were uttered often by both parties were ignored. Only phrases that were uttered more by one party than the other like “free market system” were counted. Also, the words were reduced to their stems and small common words were ignored, so the phrase “united states of america” was reduced to “unite.state.america”. So if parties were talking about the same issue but insisted on using certain phrases (“death tax” for example), then it would show up. This certainly jives with my sense of how partisanship is established by politicians, and for the sake of the paper it can be taken to be the definition.

The first data set he used was a digitized version of all of the speeches from the House since the end of the Civil War, which was also the beginning of the “two-party” system as we know it. Third party politicians were ignored. The proxy for “the public discussion” was taken from Google Book N-grams. It consists of books that were published in English in a given year.

Some of the conclusions that I can remember are as follows:

The three-word phrases themselves are a super interesting data set; their prevalence, how the move from one side of the aisle to the other over time, and what they discuss (so for example, they don’t discuss international issues that much – which doesn’t mean the politicians don’t discuss international issues, but that it’s not a particularly partisan issue or at least their language around this issue is similar).
When the issue is economic and highly partisan, it tends to show up “in the public” via Google Books before it shows up in Congress. Which is to say, there’s been a new book written by some economist, presumably, who introduces language into the public discussion that later gets picked up by Congress.
When the issue is non-economic or only somewhat partisan, it tends to show up in Congress before or at the same time as in the public domain. Members of Congress seem to feel comfortable making up their own phrases and repeating them in such circumstances.

So the cult of the economic expert has been around for a while now.

Suresh and his crew also made an overall measurement of the partisanship of a given 2-year session of congress. It was interesting to discuss how this changed over time, and how having large partisanship, in terms of language, did not necessarily correlate with having stalemate congresses. Indeed if I remember correctly, a moment of particularly high partisanship, as defined above via language, was during the time the New Deal was passed.

Also, as we also discussed (it was a lively audience), language may be a marker of partisan identity without necessarily pointing to underlying ideological differences. For example, the phrase “Martin Luther King” has been ranked high as a partisan democratic phrase since the civil rights movement but then again it’s customary (I’ve been told) for democrats to commemorate MLK’s birthday, but not for republicans to do so.

Given their speech, this analysis did a good job identifying which party a politician belonged to, but the analysis was not causal in the sense of time: we needed to know the top partisan phrases of that session of Congress to be able to predict the party of a given politician. Indeed the “top phrases” changed so quickly that the predictive power may be mostly lost between sessions.

Not that this is a big deal, since of course we know what party a politician is from, but it would be interesting to use this as a measure of how radical or centered a given politician is or will be.

Even if you aren’t interested in the above results and discussion, the methodology is very cool. Suresh and his co-authors view text as its own data set and analyze it as such.

And after all, the words historical politicians spoke is what we have on record – we can’t look into their brain and see what they were thinking. It’s of course interesting and important to have historians (domain experts) inform the process as well, e.g. for the “Martin Luther King” phrase above, but barring expert knowledge this is lots better than nothing. One thing it tells us, just in case we didn’t study political history, is that we’ve seen way worse partisanship in the past than we see now, although things have consistently been getting worse since the 1980’s.

Here’s a wordcloud from the 2007 session; blue and red are what you think, and bigger means more partisan:

Categories: data science, musing, statistics

Comments (3)

Diddly

October 9, 2012 at 10:00 am

Not the most scientific comment ever, but, by the way, I find a lot of authority appeals and mentions to economy in red , and many empty humanistic words in blue.
Pretty much what I would’ve expected.

LikeLike
Aaron

October 10, 2012 at 12:19 pm

I don’t understand your MLK example. Are you saying that the fact that democrats commemorate MLK’s birthday and republicans don’t is *not* indicative of any ideological difference? Because that’s definitely not obvious to me.

LikeLike
GD

October 11, 2012 at 12:19 am

Love the word cloud. Any opinion on word clouds and the software that generates them? Word cloud theory? Thanks.

LikeLike