Traditional text classification is a well-studied topic, especially in the domain of sentiment detection. Many techniques have been developed from classic text features. Recently, sophisticated methods leveraging Recurrent Neural Networks (e.g. LSTM) have also emerged. In a way, RNNs mimic how the human brain works as it allows for the learning of long-term dependencies, while still “forgetting” less relevant dependencies.
Over the past decade, political campaigns have increasingly made their way on to social media networks. Twitter and Facebook are the most prominent platforms hosting these discussions as strong platforms for voters to voice their opinions. With so many conversations taking place on these networks, the ability to predict whether a given text is politically-biased and identify its particular political leaning is interesting, not to mention, extremely useful.
Recently, we addressed a classification problem to identify political leaning in texts talking about the 2016 presidential election in the United States. In this instance, we classified messages as Democratic or Republican, based on the views expressed in the message. We built the training and test datasets by selecting users on Twitter whose political leanings are known to be either Democratic or Republican, through mining Twitter Lists (manually curated topical lists of users created by other Twitter users) labeled by political affiliation.
We found that the actual accuracy of the classification is highly dependent on the input data, especially the presence of Twitter mentions. A training data set that includes only tweets with no mentions under-performs by almost 20% in accuracy compared to a data set that includes mentions.
We also found that the task of detecting political leaning is challenging, as it’s highly context-dependent and temporal. The long-term context of the author of a tweet may have an effect on the leaning. For example, a Bernie Sanders supporter may speak negatively of Hillary Clinton, while be leaning towards the Democratic Party. Similarly, the dynamic nature of the problem may come from the fact that at different times, opinions can be voiced differently. Before the primary elections, the majority of the political “battle” was within a single party (Hillary Clinton vs. Bernie Sanders, e.g. Democrat vs. Democrat) while post-primary, the battle is between Hillary Clinton and Donald J. Trump (e.g. Democrat vs. Republican). This means the training data from one period of time may not be applicable for a prediction task at a later time.