What makes a blog post popular? Part I: Comparing popularity and reading difficulty

November 5, 2008 – 4:16 pm

One of the beautiful things about the Internet is the ease with which anyone can become an author and publisher.  Unfortunately, the sheer volume of information out there makes it challenging to get your voice heard.  This post is the first of a series trying to tease apart what aspects of feed items correlate with their popularity.  I’m a fairly sporadic author, so don’t be surprised if there’s a substantial gap between posts.  The nature of this series will be a fairly exploratory data analysis. In this first post, I want to examine whether well-written feed items are more likely to receive attention than poorly-written ones.

To start, we’ll need a bunch of feed items and measures of popularity and writing quality.  For the popularity measure, I will use the feed item’s NewsGator attention score.  While Newsgator has done some additional work to renormalize the scores to range from 0 to 10 since this post,  it should give you a good idea about what factors into an item’s score.  Put simply, the larger the NewsGator attention score, the more popular the feed item.

I sampled 1,000 feeds collected by FeedHub between Thursday, October 16, 2008 and Thursday, October 23, 2008.  I discarded feeds that did not have an item with a non-zero NewsGator attention score.  I also filtered out feeds where less than 75% of the items in the feed in the date range were non-English or had less than 1,000 bytes of unformatted text.  I did this to focus the feed selection on full-text feeds written in English.

Measuring whether a feed item is well-written is difficult.  As a crude proxy, I use the php-text-statistics package to compute the Flesch-Kincaid Grade Level.  This measure has been around for years and looks at the number of words per sentence and the number of syllables per word to estimate the number of years of education expected for a reader to understand the text.  I also look at the length in bytes of the posts because that is easy to compute.

I use box-and-whisker plots to compare the Flesch-Kincaid Grade Level and feed item length to the NewsGator attention score.  Before we look at those plots, we should understand the distribution of NewsGator attention scores in my dataset.  A simple histogram shows the distribution:

Histogram of NewsGator Attention Scores

The small counts of items with NewsGator attention scores above 7 suggests that we might not want to trust the box-and-whisker plots in that data range.  Comparing the NewsGator attention score to the Flesch-Kincaid Grade Level reveals no correlation between these two measures:

Grade Level Distribution Plotted by NewsGator Attention Score

In the box-and-whisker plot, the box covers the middle 50% of the feed items in the bucket, the bold horizontal line shows the median value, and the circles show outliers.  While there is no correlation between the NewsGator attention score and Flesch-Kincaid Grade Level, it is interesting that the middle 50% of feed items have a grade level ranging from 6.7 to 10.8 with a median grade level of 8.7.

We find a similar lack of correlation when comparing length to the NewsGator attention score:

Item Length Plotted by NewsGator Attention Score

I used a log-scale on the vertical axis of this plot due to the skewed distribution of feed item length.  I won’t dwell on the length patterns shown here; I almost certainly introduced some bias in these numbers during feed selection.

It is also interesting to ask whether we see different patterns of popularity relative to a particular feed.  For example, are the more difficult posts more or less likely to be popular than other posts from that same feed?  To examine this question, I normalized the three measures into percentiles.  If an item has a NewsGator attention score percentile of 0.8, then we expect that the item has a score at least as large as 80% of the items in the feed.  This normalization process is a little noisy; many of our feeds only had a small number of items in the one week period I used to collect the data.  A histogram of the normalized NewsGator attention scores confirms this:
Histogram of Normalized NewsGator Attention Scores

If we had larger samples from each of the feeds, we’d expect this histogram to be a little more uniformly distributed.  When we compare the NewsGator attention score percentiles to the percentiles for Flesh-Kincaid Grade Level or feed item length, these data look remarkably uncorrelated.  There is little to suggest that feed item length or reading difficulty is predictive of item popularity within a feed.
Normalized Grade Level Distribution Plotted by Normalized NewsGator Attention Score

Normalized Item Length Plotted by Normalized NewsGator Attention Score

When I set out to write this post, I hoped to find some interesting correlations between feed item popularity and other features of the feed items.  I wasn’t naive enough to believe I’d find strong correlations, but I was hoping to confirm some common sense wisdom.  This post looked into some crude surface features of reading difficulty and post length in an attempt to understand whether a “well-written” post is more likely to be popular than poorly written posts.  I failed to find any correlations between these features and popularity.  Does that mean that I personally believe that the writing of the post doesn’t matter?  Absolutely not.

-Paul

  1. 24 Responses to “What makes a blog post popular? Part I: Comparing popularity and reading difficulty”

  2. Interesting post.

    The circumstances under which the author posts may provide greater clues to what drives ‘popularity’ if that is indeed where you are heading with this.

    How prolific the author/source has been, the time between posts, changes in emotional valence from historical norm, the credibility of the author on the subject, is the author known for posting time sensitive info eg ‘breaking’ news, etc etc

    The ‘history and circumstance’ variables under which an author composes and disseminates just feel heavier to me than a reading difficulty snapshot… (real scientific I know ;)

    Authors have an audience in mind when composing. An author’s change in ‘reading difficulty’ and it’s correlation to popularity would be more interesting.

    There are a lot of moving parts here for example NewsGator’s Attention Meter appears to super-set Popularity (total readers). Are you using Attention and Popularity interchangeably?

    It’s hard to tell where you are heading with the data analysis but you are onto something interesting…

    By Kevin on Nov 6, 2008

  3. very interesting n creative…..liked it!!

    By rajshree on Nov 2, 2009

  4. Eagerly waiting for your Part II of this blog to be posted…Keep up the good work dude !!!!
    The tips for measuring a blog popular are really good…
    Good post

    By Christing presents on Nov 16, 2009

  5. Good study and statistic, The tips for measuring a blog popular are really good…Good post waiting for part II.
    Canvas Printing

    By Canvas Printing on Nov 17, 2009

  6. You ca notify me by email when your part II is ready posted.
    Thanks

    By Canvas Printing on Nov 17, 2009

  7. I m looking forward for more of your articles. I m really impressed with your work……. Please keep posting

    By exterminating bed bugs on Nov 20, 2009

  8. здорово!

    By эротика on Nov 24, 2009

  9. I wasn’t defending my blog, I was trying to make the point that both objectives can be reached with one blog . About 15 million active blogs are read by 57 million people, a number that gives bloggers great credibility, power and influence as sources of information for everything from news to corporate reputations to product purchasing.

    By make money online ideas on Nov 25, 2009

  10. ahaha))

    By порно on Dec 9, 2009

  11. Hi The social media chart which is represented , there is an interesting investing on India . I think India turns the social media into new strategies.

    By best metal detector on Dec 16, 2009

  12. Business blogs should share good ideas with other business people; political blogs, for example, are about preaching to the choir and dissing the opposition. business blogs and bloggers should not be driven by ideology of opionion but should be driven by whatever is best for the business or the business principal being discussed.

    By iphone repair on Dec 18, 2009

  13. Interesting analysis. You did a lot of research on this I think. I think you did a great job. I really appretiated the effort. Thank you for posting and keep posting……..

    By sleep masks on Dec 22, 2009

  14. Hi You are absolutely correct . Every ad server reports impressions and clicks, but our social ad platform captures so much more all of which can be broke down into relevant social insights.

    By Rehoboth restaurant on Dec 22, 2009

  15. This is real good analysis. Your first, i.e. part 1 post is so good that I cant stop wondering how your part 2will be. Please keep up the good work.

    By Costa Blanca Property on Dec 24, 2009

  16. A intrestng article about Seo, Quality content that is original is the best way by far and I agree with you!

    By GetBacklinks on Dec 29, 2009

  17. best blog!!!

    By dle бесплатно on Jan 6, 2010

  18. I really enjoyed reading it. It’s good to know that there is much more knowledge to gain on this topic. Just keep posting your good work and keep enlightening your readers. Thank you..

    By reg plates on Jan 12, 2010

  19. Nice post. Lot of research is done on this subject. I think the title of the blog post plays a very important role. If the title is catchy, it will do the trick.

    By koi tattoo designs on Jan 18, 2010

  1. 6 Trackback(s)

  2. Nov 5, 2008: What makes a blog post popular? series at mSpoke blog « Information Retrieval on the Live Web
  3. Nov 5, 2008: Profitable Signals: » Blog Archive » mSpoke Research - Blog Post Popularity
  4. Nov 5, 2008: What makes a blog post popular? | Venture Chronicles
  5. Nov 6, 2008: What makes a blog post popular? | BRAND INFECTION
  6. Nov 6, 2008: No Correlation Between Reading Difficulty and Popularity? | The Noisy Channel
  7. Oct 16, 2009: mSpoke Research – Blog Post Popularity « News, Software and All you need

Post a Comment