Quantifying the Impact of Personalization: First Case Study - Doubling Number of Active Email Subscribers

October 11, 2009 – 7:45 pm

One of our goals for the second half of this year was to put together a series of case studies that quantitatively demonstrate the value of our personalization solutions. This week we formally released the first public step supporting this goal with our case study demonstrating the impact we’ve had on MedPage Today’s email newsletters.

After personalizing the email & subject line, MedPage Today has seen:

  • Double the number of active email subscribers
  • A lift of 18% to 60% (varying over time) in CME‐test‐taking activity by users who received the personalized email versus users who received nonpersonalized email (from an A/B test)
  • A three‐times increase in the number of previously inactive users who clicked through for the first time on an email, based solely on personalizing the email subject line.

For context on the business situation and further explanation of these benefits, you can read the case study (embedded at the bottom of this post) to see the full details on how the mPower solution was deployed and the impact it had.  You also can read B2B Media Business’ coverage in their article “Med Page Today improves results by automatically personalizing content for readers”

The thing I’m most proud of is we optimized for the business metrics that were most important for MedPage Today’s business. As we unveil more of these case studies, it should become obvious how our flexible relevancy platform allows each individual customer to enhance their specific metrics. If you’d like to discuss how we can move the needle on your most critical business metrics, please contact us.

mSpoke-MedPage Today Case Study

The Value of Online Display Advertising

August 25, 2009 – 12:06 pm

Yesterday, Jim Spanfeller the outgoing president & CEO of Forbes.com published an interesting post on PaidContent titled “Publishers Are Killing Web Advertising’s Potential With Misguided Pricing“  Jim makes some excellent points and does a great job challenging the online publishing industry not to undervalue it’s inventory.

However, I disagree with one statement in his conclusion: “We should price online inventory similarly to how we price offline units.”  While I certainly agree with the main thesis of his article: reducing dependencies on horizontal ad networks and remnant inventory is crucial for the online ad industry.  I don’t want to aspire to simply pricing online inventory similarly to offline units!

In the short-term, it would be great if we could get online display inventory to be valued similarly to other media channels.  It is embarrassing that average CPMs for display ads are less than half that of print newspapers and certainly remnant inventory sold through ad networks have a lot to do with this.

However, it’s also worth pointing out that the effective CPM of Search Advertising is almost twice the average for Network TV ads.

cpmbymediatype.gif

John Batelle in his book “The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture” puts forth a theory on why search advertising is so valuable.  He says that search engines leverage their ‘Database of Intentions‘ which he describes:

“Link by link, click by click, search is building possibly the most lasting, ponderous, and significant cultural artifact in the history of humankind: the Database of Intentions.  The Database of Intentions is simply this: the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result.”

Regardless of whether you like the term “database of intentions” or not, search engines do algorithmically understand the content they are displaying & their audiences’ intention so deeply they can serve very effective advertisements.  The online display industry needs to be focused on replicating this effectiveness and not simply getting  a similar value to offline advertising.

At mSpoke we are building solutions to help online publishers learn more about their content (mSense) & audience (mPower) everyday and are seeing results.  In fact, we’re actually in the process of publishing our first case study (of many in the queue) with a client  demonstrating exactly how valuable these type of solutions are.

Solving Hard But Valuable Problems

August 7, 2009 – 11:53 am

eMarketer’s has released a great report “Online Online Brand Measurement: Connecting the Dots” You can download the report for free and read the source material on the microsite they’ve created.

I actually downloaded the PDF to my kindle and have been reading it over the last few days.  It has a lot of relevance for the Audience Preference Modules we’re designing right now.  (By the way, if you haven’t yet please consider taking our 5 minute online survey to make sure we designing something that meets your needs.)

A piece of eMarketer’s qualitative research jumped out at me.  They conducted a poll asking “What single word or phrase would you use to describe the current state of online advertising measurement?”  The tag-cloud below is a summary of the answers eMarketer received:
eMarketer Tag Cloud: What single word or phrase would you use to describe the current state of online advertising

eMarketer then drilled into these answers with a number of marketing thought-leaders.  One of them, Michael Mendenhall, Senior VP & Chief Marketing Officer at Hewlett-Packard, commented:

“Marketers monitor the front-end and the back-end so they see clickstreams and commerce. The difficult part is the qualitative part in between, which is the level of engagement. The challenge is to begin to build technological capabilities that allow them to see the complete digital footprint that a consumer leaves when they engage with the brand and then be able to address that consumer in a relevant way—behaviorally, contextually or both.”

This is an interesting way to frame some of the “technology capabilities” we have developed and are now trying to expose via our audience preference modules.  As we’ve built this out we have come to believe, as Michael seems to, these are key to the future of advertising on the internet.   It’s a difficult problem, but there is nothing we’d rather be focusing on then solving hard but valuable problems at mSpoke.

Help me shape mSpoke’s Audience Preference Modules

July 28, 2009 – 2:17 pm

We’ve come a long way in a few years.  Our solutions are helping some of the best known web properties  (call me I’ll give you names) drive up conversion rates - click throughs, registrations, you name it - for content-rich media sites. But I know we can do more at mSpoke. We can leverage the knowledge that we use to predict which content readers want next to create more value for more parts of our customers’ companies. I know we can do that - I’m just not sure how to go about it.

That’s where you come in.

I’m going to lay out for you what I think we can do, and offer up some examples of what the user interface screens might look like. I’m even going to link you to a very short survey about it all. But I’m not trying to sell you anything - yet. :) Instead, I’m hoping you’ll tell me what I’ve missed, how YOU imagine leveraging the knowledge I’ll describe to create value across your organization, and how your staff really wants to interact with all this on those user interface screens.

To understand the kind of information we’re working with you have to know the basics about how mSpoke works. It’s best understood as a three-step process: Read the rest of this entry »

What makes a blog post popular? Part II: subjectivity and polarity

November 24, 2008 – 4:06 pm

This post continues our series on investigating properties of popular feed items.  In our last exploration, we failed to discover any correlation between reading difficulty and the NewsGator attention score.  Now, we want to see how popularity is affected by subjectivity and polarity.  Subjectivity measures the degree to which the statements in the text are subjective (as opposed to objectively written text).  Polarity considers whether the subjective portions of the text express a positive or negative sentiment.

In the analysis below, I used the same feed items as in the previous post.  To compute subjectivity and polarity, I used a slightly modified version of the hierarchical polarity classifier in the LingPipe Sentiment Analysis Tutorial.  This tutorial demonstrates how to extract subjective sentences from text and estimate the polarity of the subjective portions of text.

Since my feed items are not in the same format as that of the training data, I had to make some modifications.  I used LingPipe’s IndoEuropeanSentenceModel to segment the documents into sentences.  To make the text of the items more comparable to that of the training data, I converted all of the text to lowercase and added a space around leading and trailing punctuation.  While these may seem like small details, it is important that the feed items be as similar in form to the training examples as possible.

To visualize the relationship between predicted polarity and NewsGator attention score, I created a kernel density plot for the NewsGator scores of items predicted positive or negative:

Popularity vs polarity

These lines are practically on top of each other, so we can conclude that the polarity predictions are not predictive of the NewsGator attention score.  I also performed a similar analysis on the strength of the polarity estimate and came to the same conclusion.  Those data were harder to visualize because most of the predicted values were near the extremes, so I haven’t included a graph for that analysis.

A more basic question is whether or not the presence of subjectivity, regardless of polarity, is correlated with NewsGator attention scores.  The following plot shows the fraction of sentences identified as subjective by the LingPipe classifier vs the NewsGator attention score:

 Popularity vs subjectivity

The items with a high NewsGator attention score tend to have a greater percentage of sentences identified as subjective.  There’s still pretty wide variance, so subjectivity is a weak predictor at best.  This trend also only applies to items with a NewsGator attention score of at least 5, corresponding to the top 3.7% of items in this particular dataset.  Nevertheless, we can conclude that there is a tendency for posts that receive a lot of attention to have more subjective sentences than those receiving less attention.

What makes a blog post popular? Part I: Comparing popularity and reading difficulty

November 5, 2008 – 4:16 pm

One of the beautiful things about the Internet is the ease with which anyone can become an author and publisher.  Unfortunately, the sheer volume of information out there makes it challenging to get your voice heard.  This post is the first of a series trying to tease apart what aspects of feed items correlate with their popularity.  I’m a fairly sporadic author, so don’t be surprised if there’s a substantial gap between posts.  The nature of this series will be a fairly exploratory data analysis. In this first post, I want to examine whether well-written feed items are more likely to receive attention than poorly-written ones.

To start, we’ll need a bunch of feed items and measures of popularity and writing quality.  For the popularity measure, I will use the feed item’s NewsGator attention score.  While Newsgator has done some additional work to renormalize the scores to range from 0 to 10 since this post,  it should give you a good idea about what factors into an item’s score.  Put simply, the larger the NewsGator attention score, the more popular the feed item.

I sampled 1,000 feeds collected by FeedHub between Thursday, October 16, 2008 and Thursday, October 23, 2008.  I discarded feeds that did not have an item with a non-zero NewsGator attention score.  I also filtered out feeds where less than 75% of the items in the feed in the date range were non-English or had less than 1,000 bytes of unformatted text.  I did this to focus the feed selection on full-text feeds written in English.

Measuring whether a feed item is well-written is difficult.  As a crude proxy, I use the php-text-statistics package to compute the Flesch-Kincaid Grade Level.  This measure has been around for years and looks at the number of words per sentence and the number of syllables per word to estimate the number of years of education expected for a reader to understand the text.  I also look at the length in bytes of the posts because that is easy to compute.

I use box-and-whisker plots to compare the Flesch-Kincaid Grade Level and feed item length to the NewsGator attention score.  Before we look at those plots, we should understand the distribution of NewsGator attention scores in my dataset.  A simple histogram shows the distribution:

Histogram of NewsGator Attention Scores

The small counts of items with NewsGator attention scores above 7 suggests that we might not want to trust the box-and-whisker plots in that data range.  Comparing the NewsGator attention score to the Flesch-Kincaid Grade Level reveals no correlation between these two measures:

Grade Level Distribution Plotted by NewsGator Attention Score

In the box-and-whisker plot, the box covers the middle 50% of the feed items in the bucket, the bold horizontal line shows the median value, and the circles show outliers.  While there is no correlation between the NewsGator attention score and Flesch-Kincaid Grade Level, it is interesting that the middle 50% of feed items have a grade level ranging from 6.7 to 10.8 with a median grade level of 8.7.

We find a similar lack of correlation when comparing length to the NewsGator attention score:

Item Length Plotted by NewsGator Attention Score

I used a log-scale on the vertical axis of this plot due to the skewed distribution of feed item length.  I won’t dwell on the length patterns shown here; I almost certainly introduced some bias in these numbers during feed selection.

It is also interesting to ask whether we see different patterns of popularity relative to a particular feed.  For example, are the more difficult posts more or less likely to be popular than other posts from that same feed?  To examine this question, I normalized the three measures into percentiles.  If an item has a NewsGator attention score percentile of 0.8, then we expect that the item has a score at least as large as 80% of the items in the feed.  This normalization process is a little noisy; many of our feeds only had a small number of items in the one week period I used to collect the data.  A histogram of the normalized NewsGator attention scores confirms this:
Histogram of Normalized NewsGator Attention Scores

If we had larger samples from each of the feeds, we’d expect this histogram to be a little more uniformly distributed.  When we compare the NewsGator attention score percentiles to the percentiles for Flesh-Kincaid Grade Level or feed item length, these data look remarkably uncorrelated.  There is little to suggest that feed item length or reading difficulty is predictive of item popularity within a feed.
Normalized Grade Level Distribution Plotted by Normalized NewsGator Attention Score

Normalized Item Length Plotted by Normalized NewsGator Attention Score

When I set out to write this post, I hoped to find some interesting correlations between feed item popularity and other features of the feed items.  I wasn’t naive enough to believe I’d find strong correlations, but I was hoping to confirm some common sense wisdom.  This post looked into some crude surface features of reading difficulty and post length in an attempt to understand whether a “well-written” post is more likely to be popular than poorly written posts.  I failed to find any correlations between these features and popularity.  Does that mean that I personally believe that the writing of the post doesn’t matter?  Absolutely not.

-Paul

Celebrating Our Best Quarter!

October 9, 2008 – 9:39 am

Last week marked the end of another quarter at mSpoke.  Big deal – it was the end of the quarter for most companies.  But, since this is the  mSpoke corporate blog, we’re gonna tell you about our exciting end of the quarter!

Anyone who’s ever worked at a software company knows that the end of the quarter usually means a sprint to the finish.  Last week was no exception. The good news is that, when the dust settled at 12:01 AM on October 1, it settled on our best quarter ever by many metrics!

It took a herculean effort by the team, and I was really proud of the effort they made.  So we decided it was time for a little celebrating!
Celebration Toast
Step 1: Toast our success, with champagne compliments of one of our board members, Ed Engler.

Step 2: Tech talk by Paul Ogilvie, our principal scientist, on information retrieval techniques.  (Okay, we’re geeky – we thought it was fun.)

Step 3: Pizza and beer, before heading out to Arsenal Bowling Lanes for a little 10 pin action.  We all thought we were bringing similar skill levels to this outing (namely, zero).  But we discovered a ringer in our midst!  Turns out that Sean Colombo is a pretty amazing bowler, in addition to a solid programmer.  Here’s an action shot of Sean racking up another strike on his way to burying the Action Shotrest of us.

At the end of the day, I’m thinking the rest of the team is with me on keeping the scores confidential.  Even Sean C. is no “Deadeye” on his way to the PBA.  But we had a great time and got to blow off some steam after a crazy quarter.

And now we’re back at it, on our way to an even better quarter!

FeedHub Down for Maintenance

August 27, 2008 – 4:52 pm

5:25 pm EDT - The site is back up.  Please let us know if you have any  problems accessing FeedHub.

4:50 pm EDT - We had to take FeedHub down quickly due to some database issues.  We will update this post as we make progress and know more.

Sorry for any inconvenience…

- the mSpoke Team

A New Product Launched with NewsGator: Related Content Widget

August 15, 2008 – 11:11 am

nglogo.jpgAt mSpoke, we’ve been collaborating lately on a number of initiatives with NewsGator.  It has been fun to work with the talented team at NewsGator, but now it gets really exciting as we start rolling these offerings out.

The first was announced on the NewsGator Widget Blog yesterday - a Related Content Widget.  As the name implies, this recommends related content from a publisher defined set of sources.  While not a new use-case, the post highlights why we think our approach will recommend better content.

Obviously, if you’re interested in learning more, please don’t hesitate to contact us.

Wikipedia Categories

June 30, 2008 – 6:50 pm

As RSS content flows through mSpoke’s data center, we tag most items with the Wikipedia categories to which they most closely relate. This was originally developed to help the personalization in our FeedHub app, but — not surprisingly — API access to these category assignments has proven itself to be valuable to other applications (for example, Jeff Nolan did a great blog post on how NewsGator uses this data).

One of the perks of doing the job we do is that we get to work with this data every day. Since it’s not private user-data and it’s all neatly arranged in a database, we can take a look at it whenever we want! It’s pretty interesting to see the pulse of the blogosphere fly by every day. If you’re into that kind of thing, here’s a quick sample:

This table shows the number of items that were put into each of the top-level Wikipedia categories during a several-hour snapshot:

top-level category Items
(non-English) 55847
(unclassified) 22580
Agriculture 1201
Applied_sciences 4022
Archaeology 39
Architecture 2484
Arts 521
Biology 2589
Chemistry 529
Computing 25246
Crafts 524
Culture 2821
Earth 1041
Economics 6569
Education 2102
Entertainment 32778
Events 2231
Film 1067
Geography 951
Health 519
History 2264
Humans 5118
Language 773
Law 1508
Literature 5404
Mathematics 363
Medicine 4913
Military 398
Music 3511
Nature 383
People 10774
Philosophy 561
Physics 825
Psychology 681
Radio 1040
Religion 1313
Science 1201
Society 42207
Technology 8439
Visual_arts 1779

The items used here all come from feeds that were uploaded by FeedHub users, so this is a pretty good sample of the material flowing through popular feeds. One caveat is that this sample only includes English items right now, but we’re working on being able to classify other languages as well. If you feel strongly about support for a specific language, please let us know.

The top-level assignments are interesting, but the actual item-by-item assignments are even better (and they change more frequently). This table shows the 20 most common category assignments across FeedHub personalized feeds for the same time period.

Web 2.0 4820
Online social networking 2026
Photo sharing 1691
Politics 1462
Video games 1425
American culture 1257
Murder 1136
System administration 1071
Stock market 935
Occupations 900
Marketing 891
Management 851
Economics 751
Music 624
Laptops 585
Mobile phones 553
Personal development 492
Software 485
Urban issues 458
Futurology 435

Note: Flickr feed content goes into the “Photo sharing” category.

This is far from scientific, since our FeedHub users are probably atypical - they are primarily people who read several hundred RSS feeds and tend to be bleeding-edge, high-tech, totally cool, good-looking people with lots of friends. ;)

Even if our sample set isn’t exactly McKinsey-level dependable data, it’s still really interesting to look at. Hopefully we’ll spill more of this type of anonymous data out in the future if other people find it as interesting as I do.
- Sean C