What makes a blog post popular? Part II: subjectivity and polarity
November 24, 2008 – 4:06 pmThis post continues our series on investigating properties of popular feed items. In our last exploration, we failed to discover any correlation between reading difficulty and the NewsGator attention score. Now, we want to see how popularity is affected by subjectivity and polarity. Subjectivity measures the degree to which the statements in the text are subjective (as opposed to objectively written text). Polarity considers whether the subjective portions of the text express a positive or negative sentiment.
In the analysis below, I used the same feed items as in the previous post. To compute subjectivity and polarity, I used a slightly modified version of the hierarchical polarity classifier in the LingPipe Sentiment Analysis Tutorial. This tutorial demonstrates how to extract subjective sentences from text and estimate the polarity of the subjective portions of text.
Since my feed items are not in the same format as that of the training data, I had to make some modifications. I used LingPipe’s IndoEuropeanSentenceModel to segment the documents into sentences. To make the text of the items more comparable to that of the training data, I converted all of the text to lowercase and added a space around leading and trailing punctuation. While these may seem like small details, it is important that the feed items be as similar in form to the training examples as possible.
To visualize the relationship between predicted polarity and NewsGator attention score, I created a kernel density plot for the NewsGator scores of items predicted positive or negative:

These lines are practically on top of each other, so we can conclude that the polarity predictions are not predictive of the NewsGator attention score. I also performed a similar analysis on the strength of the polarity estimate and came to the same conclusion. Those data were harder to visualize because most of the predicted values were near the extremes, so I haven’t included a graph for that analysis.
A more basic question is whether or not the presence of subjectivity, regardless of polarity, is correlated with NewsGator attention scores. The following plot shows the fraction of sentences identified as subjective by the LingPipe classifier vs the NewsGator attention score:

The items with a high NewsGator attention score tend to have a greater percentage of sentences identified as subjective. There’s still pretty wide variance, so subjectivity is a weak predictor at best. This trend also only applies to items with a NewsGator attention score of at least 5, corresponding to the top 3.7% of items in this particular dataset. Nevertheless, we can conclude that there is a tendency for posts that receive a lot of attention to have more subjective sentences than those receiving less attention.

1 Trackback(s)