mSpoke Logo
  • Engine
  • Team
  • Newsroom
  • Blog
  • Hiring
  • Contact
  • Subscribe

  • Subscribe via RSS
  • Enter your email address:

  • Follow mSpoke on Twitter


    follow mSpoke at http://twitter.com
  • See mPower Engine in Action

Wikipedia Categories

June 30, 2008 – 6:50 pm

As RSS content flows through mSpoke’s data center, we tag most items with the Wikipedia categories to which they most closely relate. This was originally developed to help the personalization in our FeedHub app, but — not surprisingly — API access to these category assignments has proven itself to be valuable to other applications (for example, Jeff Nolan did a great blog post on how NewsGator uses this data).

One of the perks of doing the job we do is that we get to work with this data every day. Since it’s not private user-data and it’s all neatly arranged in a database, we can take a look at it whenever we want! It’s pretty interesting to see the pulse of the blogosphere fly by every day. If you’re into that kind of thing, here’s a quick sample:

This table shows the number of items that were put into each of the top-level Wikipedia categories during a several-hour snapshot:

top-level category Items
(non-English) 55847
(unclassified) 22580
Agriculture 1201
Applied_sciences 4022
Archaeology 39
Architecture 2484
Arts 521
Biology 2589
Chemistry 529
Computing 25246
Crafts 524
Culture 2821
Earth 1041
Economics 6569
Education 2102
Entertainment 32778
Events 2231
Film 1067
Geography 951
Health 519
History 2264
Humans 5118
Language 773
Law 1508
Literature 5404
Mathematics 363
Medicine 4913
Military 398
Music 3511
Nature 383
People 10774
Philosophy 561
Physics 825
Psychology 681
Radio 1040
Religion 1313
Science 1201
Society 42207
Technology 8439
Visual_arts 1779

The items used here all come from feeds that were uploaded by FeedHub users, so this is a pretty good sample of the material flowing through popular feeds. One caveat is that this sample only includes English items right now, but we’re working on being able to classify other languages as well. If you feel strongly about support for a specific language, please let us know.

The top-level assignments are interesting, but the actual item-by-item assignments are even better (and they change more frequently). This table shows the 20 most common category assignments across FeedHub personalized feeds for the same time period.

Web 2.0 4820
Online social networking 2026
Photo sharing 1691
Politics 1462
Video games 1425
American culture 1257
Murder 1136
System administration 1071
Stock market 935
Occupations 900
Marketing 891
Management 851
Economics 751
Music 624
Laptops 585
Mobile phones 553
Personal development 492
Software 485
Urban issues 458
Futurology 435

Note: Flickr feed content goes into the “Photo sharing” category.

This is far from scientific, since our FeedHub users are probably atypical - they are primarily people who read several hundred RSS feeds and tend to be bleeding-edge, high-tech, totally cool, good-looking people with lots of friends. ;)

Even if our sample set isn’t exactly McKinsey-level dependable data, it’s still really interesting to look at. Hopefully we’ll spill more of this type of anonymous data out in the future if other people find it as interesting as I do.
- Sean C

Posted in mPower, FeedHub | No Comments »

FeedHub New Version Updates

June 3, 2008 – 6:48 am

Today, 6/3/08, we will be releasing a new version of FeedHub, which includes relevancy and performance improvements.  Check back here for further updates.

6/3/08 2:30pm Update

The latest release of FeedHub is up-and-running.

Posted in FeedHub | 1 Comment »

Extension updated & ready for FireFox 3

May 19, 2008 – 12:07 pm

Over the weekend, FireFox 3 RC1 was released.  That means that the final version of FireFox 3 is right around the corner.

We pushed out an auto-update of the FeedHub Feedback extension which has been tested with FF3 and appears to be working just fine.

When we sent out the new version for FF3RC1 compatibility, we also included an update which added the commonly-requested feature of keyboard-shortcuts for Google Reader. Now, if you want to send feed back for the currently-selected item, just type “,” for a thumbs-up or “.” for thumbs-down. These keys were chosen to work well with j/k navigation.  If you forget these shortcuts and don’t want to dig up this blog entry, you can check out all of Google Reader’s keyboard shortcuts by typing “?” at any time.

In addition, if you want to be more specific about why you liked or disliked an item, there is now a “tell us more” link that shows up after you click the thumbs which allows you to tell us how you felt about the memes that helped select that item for you.

If you’re excited (or at least intrigued) by these changes and don’t want to wait for FireFox to do it’s auto-update checks, go to the “Tools” menu, select “Add Ons” and then click the “Find Updates” button to get the newest version of the extension.

As always, if something is on your mind - we love to hear your feedback!
- Sean C

Posted in FeedHub | No Comments »

TextMate Freemarker Bundle

March 18, 2008 – 11:01 am

Being a both a Mac user and a web developer, I’ve become a big fan of TextMate for just about everything except straight Java (It’s pretty tough to beat Intellij IDEA for Java!). A while back we made the decision to move from JSP to Freemarker. Unfortunately, there doesn’t seem to be as much support for Freemarker as there is for JSP in TextMate.

So, I started a TextMate Freemarker bundle. It’s fairly basic at the moment, but does have a decent Language syntax definition that plays well with HTML, and a few snippets for Freemarker tags. It even has a few snippets for Spring macros.

If you use TextMate and Freemarker, head over to Google Code and check it out. It carries an Apache License, so no worries. Comments and suggestions are most certainly welcome!

- Brian

Posted in Uncategorized | No Comments »

FeedHub Feedback extension updated

March 11, 2008 – 1:06 pm

We’ve made some updates to the “FeedHub Feedback” FireFox extension.

I first alluded to its release a couple of months ago and shortly thereafter we announced the extension as part of a significant release.  Since that time, FireFox development has plunged forward and we’ve been releasing updates alongside the new versions of the FireFox betas.

With the release of FireFox 3 beta 4, we’ve released yet another update.  We missed the boat by a little less than a day with beta 4 (purely my fault), and InformationWeek called us on it.

You can rest assured that by the time FireFox 3 is officially released your transition should be completely seamless.

If you ever experience any problems or have any suggestions, please contact us at support@feedhub.com and we’ll make sure your comments find their way to the right people.  We love hearing from you!

Thanks for your time,
- Sean C

Posted in FeedHub | No Comments »

FeedHub New Release Updates

March 4, 2008 – 10:21 pm

Tonight, 3/4/08 at 10:00pm EST, we will be releasing a new version of FeedHub, which includes some relevancy improvements.  Check here for further updates. 

3/4/08 11:14pm Update

System is functional again.  Expect a little lag in the processing of Personalized Feeds as the system returns to normal.

3/5/08 2:24pm Update

Personalized Feed processing is making very good progress due to the upgrade and we expect it to be totally caught-up by sometime this evening.

Posted in FeedHub | No Comments »

FeedHub Update Complete - More on Comment Count Meme

February 9, 2008 – 1:53 pm

Today we released a new version of FeedHub. You can read an overview of the improvements in my earlier blog post and Sean Ammirati’s post on the release.

Comment Counts Meme Icon

One of the additions to FeedHub in this release is the comments count meme. This meme sponsors items that have a high number of comments. However, the notion of a “high number” of comments depends on the source feed. We look at the past history of comments for a particular source feed to choose an appropriate threshold for the feed.

This feed specific thresholding helps insure that we won’t unfairly bias our meme against sponsoring items from feeds which tend to have lower comment counts on their items. For example, an item with ten comments on the Machine Learning (Theory) blog at hunch.net is a lot of comments for that feed, so that item will be sponsored by the comment count meme. However, ten is a relatively low number of comments for an item in a feed like Engadget, so an item from Engadget with ten comments will not be sponsored by the comment count meme.

We don’t yet gather the comment counts for all of our source feeds, but we will be continually improving our tool that does the comment count extraction in order to allow this meme to work for a greater percentage of our feeds.

–Paul Ogilvie

Posted in FeedHub | No Comments »

FeedHub New Release Updates

February 9, 2008 – 9:26 am

Today, we are releasing a new version of FeedHub.  This release will contain new memes, improved relevancy, and other really slick new features.  During this time, some or all of FeedHub will be unavailable.  We expect FeedHub to be back in a fully operational state at 2:00pm EST.  Check here for further updates as the upgrade progresses.

12:48pm Update

UI is fully operational.  Still performing data migrations.

2:42pm Update

Estimated time for completion of data migrations is 4:00pm

4:00pm Update

FeedHub is fully functional. Upgrade is done.

Posted in FeedHub | No Comments »

FeedHub Relevance Updates

February 4, 2008 – 10:57 am

In September I joined mSpoke as a Principal Scientist to work on improving the relevance of items delivered by FeedHub. Prior to that, in the Language Technologies Institute at Carnegie Mellon University with Professor Jamie Callan, I worked extensively on search engines. I’m very excited to see research influence reality as we continue to improve FeedHub! Except for my last post about a learning update, I’ve been pretty silent on the blog.However, I’d like to be a more visible member of the mSpoke team going forward. The next upgrade of FeedHub includes some great relevancy improvements. Read below for some specifics.

Wikipedia Categories

One of the bigger changes is a revamping of our category infrastructure to use a subset of the Wikipedia categories. We feel that the Wikipedia categories are more current and have better coverage than our previous categories based on the open directory (DMOZ.) Improving our categories should also have a positive effect on our topic memes, because we use that information to assist us in the creation of topics. Don’t worry - we’ll be migrating your beloved topic and category memes into our new topic and category framework. Keep an eye out for an interaction post in your feeds that will explain the conversion.

Speaking Socially

While we firmly believe that understanding the topical content of feed items is crucial for effective personalization, we also believe that it can be important to make use of other predictors of relevance, such as those using “the wisdom of the crowds.” I am pleased that we can now announce a new “wisdom of the crowds” meme. The comment count meme sponsors items that have a noticeably larger number of comments than other items from the same feed. We believe this meme will be very helpful in delivering good items to our users, even when we have not yet had enough observations to learn a perfect set of topic and category memes for a personalized feed.

Give Us More Data

Getting enough observations from a user to accurately represent their interests is an important challenge for content personalization. We have added a much-requested thumbs-up link in our feed chrome to encourage you to provide more information about your interests. Perhaps even more exciting is our new Firefox extension, which allows you to use the thumbs-up and thumbs-down links in your RSS reader without being redirected to another web page. Keep an eye out for some great additions to this extension in the future.We’ve also improved how we use information from your digital identity (such as your link blog or your del.icio.us account). As a result, you may notice activity on your other digital identities having a more pronounced effect on your memes and your personalized feed than it did before.

Help Us Get Better

I’m delighted to deploy these improvements to help FeedHub deliver relevant content to you. The new Wikipedia categories, improved use of digital identity information, the comment count meme, and the Firefox extension should all help to create a more relevant experience. We are continually finding ways to improve relevance, and your ideas are welcome! Contact us anytime. 

— Paul Ogilvie

 

Posted in FeedHub | 3 Comments »

New Release of FeedHub

February 4, 2008 – 9:59 am

Introduction

FeedHub LogoAt the end of September, we launched the first public version of FeedHub. The reaction was tremendous. Today, we’re announcing the first major update to FeedHub. I’m excited – I think this update is a big step toward our goal of saving you time and keeping you informed. The system will be upgraded over the next week, and I’ll post a message on the blog when it’s complete. I do want to take a moment now to give you a sneak preview about the upcoming goodness.

Note: We don’t anticipate any significant interruptions to the FeedHub system, but there will be a few brief and scheduled outages to complete this upgrade over the next few days.

With that background, there are three categories of improvements incorporated into this release:

  • Streamlining Feedback Interactions
  • Relevancy Improvements
  • Infrastructure / Scalability Enhancements

Before going into more detail, I wanted to thank all of you for the feedback. A lot of these changes are based directly on your input. Both from email exchanges and a set of formal user studies we completed with the help of Lisa Spitz Design.

Streamlining Feedback Interactions

A very consistent request was the ability to provide direct feedback on a post recommended by FeedHub. We have added thumbs up / down buttons on every post to let you rate its relevance. The more feedback you give us, the faster we’ll learn, but don’t worry – we’ll still learn from implicit behavior (e.g., what you click on and ignore.) For more information on the new rating feature, check out our documentation.

It would be great to automatically process the feedback as an AJAX call without interrupting your reading experience. Unfortunately, most RSS readers (including Google Reader, Bloglines and NewGator) don’t allow javascript, causing the system to load a page each time you rate a post. Good news for Firefox users – we’ve created an extension that you can install to avoid the new page problem. If you use one of our other supported browsers, we’re actively investigating how to create the same seamless experience for you.

Relevancy Improvements

Wikipedia Logo My colleague, Paul Ogilve, wrote a post going into great detail on the relevancy improvements in this release. Personally, I’m most excited about leveraging Wikipedia for the taxonomy of our category memes, and a new meme that recommends posts with significantly more comments than are typical for other posts from that source. Obviously, you”re the judge – but, in my internal testing, I have found both of these changes to dramatically improve the quality of the items being recommended to me.

For more information go read Paul’s post.

Infrastructure / Scalability Enhancements

We made several improvements to ensure that we will scale effectively and support growth. These are pretty standard enhancements necessary as we move toward processing the majority of feeds on the web. One noteworthy enhancement is that we upgraded from Dojo 0.4.3 to 1.0. We continue to be really pleased with the Dojo Toolkit and believe this will both improve performance of web pages in the short-term and position us well for front-end enhancements moving forward.

Conclusion

I started this post saying I believe this is a big step forward and I do. However, we also realize that FeedHub is still a beta product and we’re continuing to work hard to improve it. As you’ve not doubt realized from this post, many of these improvements will be based on your feedback. Therefore, please keep it coming!

Posted in FeedHub | No Comments »

Previous Entries
Proudly powered by WordPress Entries (RSS) and Comments (RSS). Theme based on: SilverLight 0.1 by Bob.