All you have to do is help Netflix read people’s minds!
The Netflix Prize is a contest that has been going on since October 2006. I didn’t hear about it until today. When you rate movies on the Netflix DVD rentals site, their proprietary Cinematch algorithm will predict which other movies you might like based on ratings that have been made by all Netflix users. It is very similar to Amazon’s “people who bought X also bought Y” feature. The Netflix Prize challenge is to come up with a new prediction system that is 10% more accurate than Cinematch. Whoever does this will get the top prize of $1,000,000. Netflix is also rewarding a periodic “progress prize” of $50,000 to people who can beat the last progress prize winner by 1%. The current progress prize winner, an AT&T Labs team named BellKor, has a technique that yields a prediction improvement of 8.5% over Cinematch. Read about their technique here. Their technique makes my brain hurt — it blends together 107 different results from a large ensemble of data mining models, including neighborhood-based models (k-NN), factorization models (such as Ridge regression), regressions based on Gaussian priors, restricted Boltzmann Machines, asymmetric factor models, and regression models (using principal components analysis for feature selection, and SVD vectors as predictor variables). Basically, they packed a data mining shotgun with as much shot as they could find, and pulled the trigger. That’s a hell of a lot of work for $50,000!
Figure 1: Oh, no! Data mining engineers found out that I like terrible movies!
By comparison, Netflix’s own Cinematch algorithm uses the following techniques, as quoted in their Netflix Prize FAQ:
How does Cinematch do it?
Straightforward statistical linear models with a lot of data conditioning. But a real-world system is much more than an algorithm, and Cinematch does a lot more than just optimize for RMSE. After all, we have a website to support. In production we have to worry about system scaling and performance, and we have additional sources to data we can use to guide our recommendations.
Netflix has begun a very interesting bounty hunt. As of today, 23021 teams from 164 different countries are clamoring for the cash. I love the idea of putting up public bounties for innovation – the X-Prize comes to mind, particularly the Google-sponsored lunar X prize.