You enter a contest. A million dollars is at stake. Forty-one thousand teams from 186 different countries are clamoring for the prize and the glory. You edge into the top 5 contestants, but there is only one prize, and one winner. Second place is the first loser. What do you do?
Team up with the winners, of course.
The Netflix Prize is a competition that is awarding $1,000,000 to whomever can come up with the best improvement to their movie recommendation engine. Their system looks at the massive amounts of movie rental data to try to predict how well users will like other movies. For example, if you like Coraline, you may also like Sweeney Todd. But Netflix’s recommendation engine isn’t great at making predictions, so they decided to offer a bounty to anyone who could come up with a system that has a verifiable 10% improvement to Netflix’s prediction accuracy.
At first, it looked like BellKor’s Pragmatic Chaos won. But now it looks like The Ensemble won. Netflix says it will verify and announce the winner in a few weeks.
Who the hell cares? Why is this interesting in the slightest? Ten percent seems so insignificant.
Well, predicting human behavior seems impossible. But this contest has clearly shown that some amount of improvement in prediction of complicated human behavior is indeed possible. And what’s really interesting about the winning teams is that no single machine learning or statistical technique dominates by itself. Each of the winning teams “blends” a lot of different approaches into a single prediction engine.
Each technique has its strengths and weaknesses. Where one predictor fails, another can take up the slack with its own unique take on the problem.
BellKor, in their 2008 paper describing their approach, made the following conclusions about what was important in making predictions:
Movies are selected deliberately by users to be ranked. The movies are not randomly selected.
Movies go in and out of popularity over time.
User biases change. For example, a user may rate average movies “4 stars”, but later on decide to rate them “3 stars”.
User preferences change. For example, a user may like thrillers one year, then a year later become a fan of science fiction.
Not all data features are useful. For example, details about descriptions of movies were significant, and explained some user behaviors, but did not improve prediction accuracy.
Matrix factorization models were very popular in the contest. Variations of these models were very accurate compared to other models.
Neighborhood models and their variants were also popular.
For this problem, increasing the number of parameters in the models resulted in more accuracy. This is interesting, because usually when you add more parameters, you risk over-fitting the data. For example, a naive algorithm that has “shoe color” as an input parameter might see a bank that was robbed by someone wearing red shoes, and conclude that anyone wearing red shoes was a potential bank robber. For another classic example of over-fitting, see the Hidenburg Omen.
To make a great predictive system, use a few well-selected models. But to win a contest, small incremental improvements are needed, so you need to blend many models to refine the results.
; RMSE (error) goes down as the number of blended predictors goes up. But the steepest reduction in error happens with only a handful of predictors — the rest of them only gradually draw down the error rate.
Yehuda Koren, one of the members of BellKor’s Pragmatic Chaos and a researcher for Yahoo! Israel, went on to publish another paper that goes into more juicy details about their team’s techniques.
I hope to see more contests like this. The KDD Cup is the most similar one that comes to mind. But where is the ginormous cash prize???
This impressive augmented reality demo from GE inserts computer-generated 3D objects into live video. First, watch the short video. Then, try it yourself.
Israeli musician “Kutiman” took a big pile of seemingly random YouTube video clips and used them as instruments in his own musical compositions. I could not stop listening to these. My favorites are tracks 2 and 3. His site is overloaded at the time of this post; for now you can see samples here, here, and here.
Can you be an awesome DJ using nothing but a web browser and your computer’s keyboard? Yes you can.
T-SQL is a database-centric programming language for Microsoft SQL Server. It is used primarily for data processing, not for such geekery as drawing fractals. So, my hat’s off to the creative re-imagining of this previously dull language.
The Atlanta trip was a lot of fun. We stayed in a really nice rental house not far from Turner Field. The McCubbins managed to put up with me for a week and even went so far as to make dinner for me on several occasions. You haven’t had a good meal until you’ve tried their cooking. Mmm, tuna bites, fruit and goat cheese salad, steak, and black bottomed cupcakes. Our friends Jon and Kate are also accomplished cooks – we should host a cookoff between both couples so we can reap the delicious benefits.
Our rental house
Between home-cooked meals we sampled a variety of Atlanta dining options. Chris and I ate at Gladys and Rons Chicken and Waffles, a restaurant chain started by Gladys Knight that serves the most delicious fried chicken and waffles I’ve ever had. The waffles are warm and soft on the inside and nice and crispy on the outside. Perfect. And eating there has only amplified my weakness for fried chicken. I also managed to meet up with the Atlanta-based contingent of my LegalEdge coworkers at the Vortex Bar and Grill. While I did not sample the infamous bypass burgers, I did partake of the big-ass mushroom swiss burger, and my brain’s burger receptors were alight with glee. This celebration of manliness was a mere coverup for the earlier, emasculating experience of eating at the American Girl Boutique and Bistro to celebrate Eris’ fifth birthday. Despite the pink/pastel surroundings, bad service, and proliferation of little girls tending to their creepy shark-eyed dolls, we had a great time and the food was really good.
We checked out the Georgia Aquarium and World of Coca Cola during our stay. The aquarium is fantastic; definitely a must-see. The whale shark tank there is a massive, imposing, wondrous display. The Coca Cola museum is fun but they really hammer you with Coca Cola propaganda while you are there. YOU MUST DRINK IT. The tasting room is awesome, though. In it you can try unlimited samples of around 70 different Coca-Cola products sold around the world. We shuffled among dispensers over the sticky floors and tasted all sorts of delicious sugary carbonated water, and occasionally experienced horrid drinks such as Beverly. The tasting room gets you nice and high on sugar and caffeine before you go to the gift shop, which is the only way to exit the museum. A brilliant scheme to coax dollars from your wallet.
According to Nic McPhee’stwitters, GECCO 2008 had 471 attendees from 46 countries. This universal appeal of scientific research is one of things I liked about last year’s GECCO conference, too. This year’s conference, though, was almost as bad as last year’s when it came to feeding the attendees. Food was given out during the two hour poster session where I was presenting my poster, but all the food was gone in less than 20 minutes, and it was not replenished. Boo!
Nevertheless, there were plenty of fascinating presentations and workshops. The keynotes were again from experts in the field of biology, source of many of the great ideas in evolutionary computing. Dr. Martin I. Meltzer, senior health economist at the Centers for Disease Control, gave an interesting keynote talk about developing scientific models to study public health policy and how well we’d react to sudden outbreaks of disease. In particular, he used the example of pandemic influenza. Overall it was a fascinating but extremely depressing talk, since much of it dwelled on how unprepared we are for situations such as the 1918 flu pandemic. Well-known biology professor, popular science blogger, fierce skeptic, and “expelled from Expelled” victim PZ Meyers gave the second keynote talk. He spoke about the importance of development during evolutionary processes (aka “evo-devo”). For example, many animals share the same exact genes but the organisms themselves have vast differences. The differences are due to other important developmental processes outside of the animals’ genes. He talked about some interesting specific examples, such as development of bats and mice. Cretekos et al (2008) recently chopped out a regulatory sequence from bats and stuck them into mice. The mice then developed longer forelimbs, corresponding to the lengthened forelimbs of bats needed for flight. PZ’s blog post explains this in more detail (and with cool pictures!) This example showed that there are many elements at play that create variety in organisms, and many of these elements are strongly influenced by their environment. We are only barely beginning to understand how these things all work together to form life.
PZ’s other interesting example of environmental factors in evolutionary development was Suzuki and Nijhout’s 2006 paper “Evolution of a Polyphenism by Genetic Accommodation”. The researchers were able to evolve environmentally-driven traits in a type of caterpillar. In cooler temperatures, the evolved caterpillars remain black in color. But some of the ones exposed to hotter temperatures turn green in color; all the rest remain black. So the experimenters selected the greenest caterpillars and let them breed with each other. They also selected the black, unchanging caterpillars and bred them with each other. Over a small number of generations, descendants of the green-changing caterpillars more reliably changed colors, while the unchanging caterpillars reliably resisted changing colors in hotter temperatures. Read PZ’s post for more interesting info on how this all works. It reminds me of Ryan Somma’s post about Daisyworld, a thought experiment world in which black and white daisies affect the temperature of their planet as well as their own survival.
Chris and I entered the 2D packing problem contest, where you have to evolve a grid of numbers that yield the highest scores for adjacent pairs of numbers. We didn’t win but we did well enough to give a presentation about our approach. We developed an “embryonic” growth technique to evolve the 2D grid, using block-based growth, and something that ended up looking like mold.
Animation of the “mold” evolution technique. Brighter red spots indicate formation of high-scoring areas of the grid. Click to see the supersized version.
Other contests included evolving Rubik’s cube solvers on a massively parallel grid computing platform (the winner gets $1000), evolving L-systems to match pre-existing images, and evolving intelligent agents to search a virtual landscape for food as efficiently as possible. Nobody qualified for the Rubik’s cube contest, and nobody entered the L-system contest. Which is too bad, since they are very interesting problems. And the $1000 bounty for the Rubik’s cube problem remains unclaimed; Parabon is keeping the competition open and is upping the ante to $2000 for GECCO 2009.
There was also the so-called “HUMIES” awards, which is a competition to showcase evolutionary methods that match or outperform human efforts to solve problems. The winning paper used genetic programming to evolve ways to find special algebraic terms that no human has been able to accomplish.
Below is a list of some of the other presentations and papers that interested me:
Games: Moshe Sipper spoke about evolving artificial players for Robocode, backgammon, and chess. The evolved Robocode player was able to beat all of its competitors, which were all written by humans. The evolved backgammon player beat “pubeval”, a strong hand-written backgammon algorithm, 62% of the time. Sipper claims the evolved player will beat human players “most of the time”. The evolved chess-playing AI was able to win or draw against Crafty, a hand-written chess-playing program that is a top competitor at the World Computer Chess Championships.
Kenneth Stanley discussed “generative and developmental systems”. He explored the question, “how is it possible that 100 trillion connections in the human brain can arise from a mere 30,000 genes in the human genome?” The belief is that biological processes such as embryonic development cause complex things to emerge. If we stole some of these ideas from nature and used them in evolutionary computing, maybe we could solve some really interesting and difficult problems. Dr. Stanley cited the work of Gregory Hornby, who evolved robots and other objects using an L-system grammar. The advantage of using an L-system is you can get complex behaviors out of very compact, simplistic rules instead of describing or defining each part individually.
Dr. Stanley went on to describe a technique to evolve abstract representations for development, based on how evolutionary artwork is made. This was a big surprise to me; I’ve done a little bit of evolutionary art research, so I was happy to discover that there are real uses for the techniques other than just making pretty pictures. Along the way, the technique Dr. Stanley described turned into a side project called Picbreeder where computer-generated art is evolved via an online community. I’ve been wanting to see something like this!
In the defense applications track, we saw a talk about evolving swarm behaviors for unmanned aerial vehicles (UAVs). Some of the swarming, self-organizing, and attacking behaviors are inspired by behaviors of insects such as bees and wasps. Several interesting micro-UAV technologies were mentioned, such as the Black Widow, UAVs with flapping wings (including bat wings), and parasitic (!!!) UAVs such as SilentEyes (it is launched from larger UAVs). Don’t do anything suspicious, or a swarm of these damned things will form a stinging cloud around you.
Black Widow micro-UAV. Watch out for its poisonous bite.
In another talk, an Australian researcher described their technique of evolving collective intelligence for UAVs to detect and report bush fires (that’s Australian for “wildfires”). Their challenge was to evolve swarm behaviors so the UAVs have the best possible coverage of the surveyed area, and also so the UAVs can adapt if there are malfunctions in members of the swarm.
There were many more interesting papers, too numerous to describe, showcasing the widespread and diverse applications of evolutionary computing. Some examples include evolving circuits with high testability, automatic defect classification in electronic wafer manufacturing, quantifying quality degradation on voice-over-IP networks, detection of malware (including zero-day virus attacks) using techniques inspired by biological immune systems, evolving color schemes for people with color blindness, investment portfolio optimization, modeling the Milky Way galaxy using BOINC volunteer computing, developing no-loss strategies for tic-tac-toe, finding deadlocks in large concurrent java programs, radar jamming, evolving functions that can detect computer program code plagiarism by students (beware, cheaters!), automatic route planning that takes traffic into consideration, automatic composition of rock music using genetic algorithms (seriously?), interactive evolution of facial composites of suspects in criminal investigations, detection of moving objects in videos, using a bacterial foraging algorithm to detect circles on images (wait, what?), evolving a World Computer Chess Champion-beating chess program by mimicking the behavior of a mentor, and prediction of whether a company will have financial losses.
The list goes on. But I’ve punished you long enough with this mercilessly long post. Long story short: Trip good. Science good. Computers fun.
Brax Cisco, Wesley Hopper, and Michael Eaton have just beta-released their wonderful “Zodiac Decrypto” program, a fun piece of software designed to attack the Zodiac ciphers as well as other substitution ciphers. It uses letter-level n-gram frequency analysis (no, not Engram analysis!) to estimate the validity of solutions generated by the software’s hillclimbing process. The software runs in Windows.
All you have to do is help Netflix read people’s minds!
The Netflix Prize is a contest that has been going on since October 2006. I didn’t hear about it until today. When you rate movies on the Netflix DVD rentals site, their proprietary Cinematch algorithm will predict which other movies you might like based on ratings that have been made by all Netflix users. It is very similar to Amazon’s “people who bought X also bought Y” feature. The Netflix Prize challenge is to come up with a new prediction system that is 10% more accurate than Cinematch. Whoever does this will get the top prize of $1,000,000. Netflix is also rewarding a periodic “progress prize” of $50,000 to people who can beat the last progress prize winner by 1%. The current progress prize winner, an AT&T Labs team named BellKor, has a technique that yields a prediction improvement of 8.5% over Cinematch. Read about their technique here. Their technique makes my brain hurt — it blends together 107 different results from a large ensemble of data mining models, including neighborhood-based models (k-NN), factorization models (such as Ridge regression), regressions based on Gaussian priors, restricted Boltzmann Machines, asymmetric factor models, and regression models (using principal components analysis for feature selection, and SVD vectors as predictor variables). Basically, they packed a data mining shotgun with as much shot as they could find, and pulled the trigger. That’s a hell of a lot of work for $50,000!
Figure 1: Oh, no! Data mining engineers found out that I like terrible movies!
By comparison, Netflix’s own Cinematch algorithm uses the following techniques, as quoted in their Netflix Prize FAQ:
How does Cinematch do it?
Straightforward statistical linear models with a lot of data conditioning. But a real-world system is much more than an algorithm, and Cinematch does a lot more than just optimize for RMSE. After all, we have a website to support. In production we have to worry about system scaling and performance, and we have additional sources to data we can use to guide our recommendations.
Netflix has begun a very interesting bounty hunt. As of today, 23021 teams from 164 different countries are clamoring for the cash. I love the idea of putting up public bounties for innovation – the X-Prize comes to mind, particularly the Google-sponsored lunar X prize.
I had a lot of fun making it, and I am glad that folks are getting some use out of it. Thanks for the support! I hope we can sustain the Digg Effect!!
I’ve been getting a lot of good feature suggestions from people, such as adding frequency analysis, allowing multiple letters per symbol, and supporting cipher transpositions. I hope to make some of these improvements to the app in the near future.
Back in 1993, during freshman year of college, my friend Brian McEntire introduced me to the “demoscene“, which is, at its best, a group of extremely highly skilled (and often very young) computer sound/video programmers who specialize in creating dazzling presentations that run in real-time on computers. Demoscene folks spend a lot of time trying to out-program each other, showing off what kind of amazing audio and visual effects they can do with computer hardware. Demos at the time were amazing to watch. When I watch the older demos now, 14 years later, they seem very quaint and primitive.
Second Reality, a demo by Future Crew, one of the most famous demo groups back in the 1990s. This was cutting-edge realtime PC sound and graphics back then.
Some of the best demos have come out of the Assembly demoparty, an annual Finnish gathering of demoscene enthusiasts which features a demo competition. Many people enter their productions into the competition, and the winning entries are usually very high quality. The recent Assembly demoparty was held in August 2007, and I was amazed by the creative and dream-like stylistic quality of the winning demo, LifeForce, by Andromeda Software Development, a Greek demogroup.
Screenshots from LifeForce by Andromedia Software Development. Click for a larger view.
To see this production in glorious motion, download the high-quality 246MB AVI movie file via this link. It is a much better experience than watching the embedded lower-quality YouTube version below.
The pure skill and creative talent needed to generate these real-time productions (the animations are NOT pre-rendered), combined with the fact that the best demo groups consist mostly of teenagers and very young adults simply doing this stuff for fun in their free time, continue to amaze me.