Category Archives: movies

Craft time

Something I had to make during my lunch break.

Because seeing Totoro in Toy Story 3 was awesome.

United we stand

You enter a contest. A million dollars is at stake. Forty-one thousand teams from 186 different countries are clamoring for the prize and the glory. You edge into the top 5 contestants, but there is only one prize, and one winner. Second place is the first loser. What do you do?

Team up with the winners, of course.

The Netflix Prize is a competition that is awarding $1,000,000 to whomever can come up with the best improvement to their movie recommendation engine. Their system looks at the massive amounts of movie rental data to try to predict how well users will like other movies. For example, if you like Coraline, you may also like Sweeney Todd. But Netflix’s recommendation engine isn’t great at making predictions, so they decided to offer a bounty to anyone who could come up with a system that has a verifiable 10% improvement to Netflix’s prediction accuracy.

The contest recently ended with two teams jockeying for the prize. During the two and a half years the contest has been active, several individuals and small groups dominated the contest leaderboard;, with competition among 41,000 teams from 186 different countries. The competition became fierce, resulting in coalitions forming. The team “BellKor’s Pragmatic Chaos” formed from the separate teams “BellKor” (part of the Statistics Research Group in AT&T labs), “BigChaos” (a group of folks who specialize in building recommender systems), and “PragmaticTheory” (two Canadian engineers with no formal machine learning or mathematics training). Another conglomerate team, “The Ensemble“, is made up of “Grand Prize Team” (itself a coalition of members combining strategies to win the prize), “Vandelay Industries (another mish-mash of volunteers)”, and “Opera Solutions“.


At first, it looked like BellKor’s Pragmatic Chaos won. But now it looks like The Ensemble won. Netflix says it will verify and announce the winner in a few weeks.

Who the hell cares? Why is this interesting in the slightest? Ten percent seems so insignificant.

Well, predicting human behavior seems impossible. But this contest has clearly shown that some amount of improvement in prediction of complicated human behavior is indeed possible. And what’s really interesting about the winning teams is that no single machine learning or statistical technique dominates by itself. Each of the winning teams “blends” a lot of different approaches into a single prediction engine.

Artificial neural networks. Singular value decomposition. Restricted Boltzmann Machines. K-Nearest Neighbor Algorithms. Nonnegative matrix factorization. These are all important algorithms and techniques, but they aren’t best in isolation. Blending is key. Even the teams in the contest were blended together.

United we stand.

Each technique has its strengths and weaknesses. Where one predictor fails, another can take up the slack with its own unique take on the problem.

BellKor, in their 2008 paper describing their approach, made the following conclusions about what was important in making predictions:

  • Movies are selected deliberately by users to be ranked. The movies are not randomly selected.
  • Temporal effects:
    • Movies go in and out of popularity over time.
    • User biases change. For example, a user may rate average movies “4 stars”, but later on decide to rate them “3 stars”.
    • User preferences change. For example, a user may like thrillers one year, then a year later become a fan of science fiction.
  • Not all data features are useful. For example, details about descriptions of movies were significant, and explained some user behaviors, but did not improve prediction accuracy.
  • Matrix factorization models were very popular in the contest. Variations of these models were very accurate compared to other models.
  • Neighborhood models and their variants were also popular.
  • For this problem, increasing the number of parameters in the models resulted in more accuracy. This is interesting, because usually when you add more parameters, you risk over-fitting the data. For example, a naive algorithm that has “shoe color” as an input parameter might see a bank that was robbed by someone wearing red shoes, and conclude that anyone wearing red shoes was a potential bank robber. For another classic example of over-fitting, see the Hidenburg Omen.
  • To make a great predictive system, use a few well-selected models. But to win a contest, small incremental improvements are needed, so you need to blend many models to refine the results.

RMSE (error) goes down as the number of blended predictors goes up. But the steepest reduction in error happens with only a handful of predictors — the rest of them only gradually draw down the error rate.

Yehuda Koren, one of the members of BellKor’s Pragmatic Chaos and a researcher for Yahoo! Israel, went on to publish another paper that goes into more juicy details about their team’s techniques.

I hope to see more contests like this. The KDD Cup is the most similar one that comes to mind. But where is the ginormous cash prize???


links for 2009-03-06: Pile o’ toys

This impressive augmented reality demo from GE inserts computer-generated 3D objects into live video. First, watch the short video. Then, try it yourself.
Israeli musician “Kutiman” took a big pile of seemingly random YouTube video clips and used them as instruments in his own musical compositions. I could not stop listening to these. My favorites are tracks 2 and 3. His site is overloaded at the time of this post; for now you can see samples here, here, and here.
Can you be an awesome DJ using nothing but a web browser and your computer’s keyboard? Yes you can.
A curious programmer, inspired by Roger Asling’s evolution of the Mona Lisa, asks if the technique could be a good way to compress images. Also take a look at the nice online version of the image evolver he wrote, in which you can set your own target image.
Hilarious Livejournal diary done in the style of Rorschach from the Watchmen comic book series.
The Crisis of Credit, Visualized – An extremely well-produced video describing the credit crisis in simple terms. – “Netflix for impatient people”. A remix of the Netflix site that is “about a quadrillion times easier to browse than Netflix’s own site”.
$timator: How much is your web site worth?
Cursebird. A real time feed of people swearing on Twitter. THANK YOU, INTERNET!
Leapfish. An interesting new meta-search engine with a clean interface. “It’s OK, you’re not cheating on Google.”
Twittersheep. “Enter your twitter username to see a tag cloud from the ‘bios’ of your twitter flock.”
PWN! YouTube. This is a great idea. You just type “pwn” in front of “youtube” in the URL, and voila; instant links for downloading and saving the videos.

Do you want to be a millionaire?

All you have to do is help Netflix read people’s minds!

The Netflix Prize is a contest that has been going on since October 2006. I didn’t hear about it until today. When you rate movies on the Netflix DVD rentals site, their proprietary Cinematch algorithm will predict which other movies you might like based on ratings that have been made by all Netflix users. It is very similar to Amazon’s “people who bought X also bought Y” feature. The Netflix Prize challenge is to come up with a new prediction system that is 10% more accurate than Cinematch. Whoever does this will get the top prize of $1,000,000. Netflix is also rewarding a periodic “progress prize” of $50,000 to people who can beat the last progress prize winner by 1%. The current progress prize winner, an AT&T Labs team named BellKor, has a technique that yields a prediction improvement of 8.5% over Cinematch. Read about their technique here. Their technique makes my brain hurt — it blends together 107 different results from a large ensemble of data mining models, including neighborhood-based models (k-NN), factorization models (such as Ridge regression), regressions based on Gaussian priors, restricted Boltzmann Machines, asymmetric factor models, and regression models (using principal components analysis for feature selection, and SVD vectors as predictor variables). Basically, they packed a data mining shotgun with as much shot as they could find, and pulled the trigger. That’s a hell of a lot of work for $50,000!

Figure 1: Oh, no! Data mining engineers found out that I like terrible movies!

By comparison, Netflix’s own Cinematch algorithm uses the following techniques, as quoted in their Netflix Prize FAQ:

How does Cinematch do it?
Straightforward statistical linear models with a lot of data conditioning. But a real-world system is much more than an algorithm, and Cinematch does a lot more than just optimize for RMSE. After all, we have a website to support. In production we have to worry about system scaling and performance, and we have additional sources to data we can use to guide our recommendations.

Netflix has begun a very interesting bounty hunt. As of today, 23021 teams from 164 different countries are clamoring for the cash. I love the idea of putting up public bounties for innovation – the X-Prize comes to mind, particularly the Google-sponsored lunar X prize.

(another informative article about the Netflix Prize)

Happy DNA Day!

Today is National DNA Day. In celebration, marvel at the wondrous complexities of molecular biology:

I am boggled with wonder at the endless coils within coils within coils, and the molecular machine in the 2nd half of the clip. Life is amazing!

Spartans vs. Ghost Rider

A band of a few hundred intrepid, brave filmmakers continues to successfully defend their box office dominance as millions of moviegoers relentlessly descend upon them. The movie 300 is raking in huge profits (it has the largest March opening EVER). I managed to catch a matinee showing on opening day. The movie impressed me with its visuals but buried me in enough ponderous cheese to prevent me from fully enjoying the depictions of Sparta’s romanticized struggle against the Persians.

But the best part of the showing was not the cheesiness of the movie itself, but the jankiness of the local movie theater. My first mistake was showing up without cash – I bought the ticket using my debit card, which is a long struggle with their sole card-swiper which verifies your card in the time it would take to walk to an ATM and return with cash. Happy to receive my ticket with a few minutes to spare before showtime, I braved another purchase at the concession stand. Of course, they had to use the same slow card-swiper up front where I bought the ticket, so the clerk bounces merrily off to run my card through the torturous queue-increasing card-swiper. Meanwhile, folks behind me were curious over the disappearance of the clerk. “Where’d she go?” “They only have one card swiper.” “Oh.”

After passing their first test of customer patience, I walked towards the big theater. Above the door was the marquee, the title 300 emblazoned in bright LEDs. I walked into the rapidly filling theater, sat down, and endured the second test of customer patience: Annoying Commercials. Then something odd happened – a preview came on for 300! “Hmm, they are showing me a preview for the movie I’m already watching.” OK. Maybe I stepped into the wrong theater. Wait, no. I know for sure it said 300 outside. I chalked it up to MPAA stupidity (see? they’d rather sue their customers than market to them properly) and settled in for another 300 minutes of trailers.

Then, the third test of customer patience began. The title music for the feature began to swell. I think to myself, cool, Sparta’s getting ready to kick some ass. Then, a voiceover started talking about a “ghost rider.” Wait a minute. GHOST RIDER??? NOOOOOOOOOOO!!!! My anticipations of Sparta glory died as the title credits for Ghost Rider appeared.

“I should’ve used a soothing gel with my razor.”
“You primitive Spartan screwheads are no match for FIRE and MOTORCYCLES!”

Half the theater got up to walk out. A theater lackey outside spoke up and said, “Go back in! We’ll fix it.” We shuffled back inside and sat down, entertained only by a paused frame of fire from Ghost Rider, and the assortment of audience reactions to the switcheroo. About ten minutes later, the screen went black, and I thought, “wonderful; we’re going to get another half hour of commercials and trailers.” Luckily, the title credits for 300 appeared and all was right with the world.

This comedy of errors was as entertaining to me as the movie.

Ephialtes says, “I don’t know his name, but his face sure does ring a bell.”

Shadows in the campfire

Ted Haggard wants you to watch Jesus Camp THIS MUCH!!

Seeing ol’ Pastor Ted was only one of many disturbing experiences we had watching the documentary Jesus Camp, which gives us a look into the bizarre practices of the Christian Charistmatic movement, and how it is wielded to forge new generations of believers. The documentary unfolds like a slow-motion train wreck, as we watch the emotionally manipulative church leaders indoctrinating young children with belief systems that are openly hostile towards good reason.

I felt angry through most of the documentary, and pitied the poor children subjected to these manipulations. I also remain hopeful that the Charismatic movement of Christianity is not a typical sampling of Christian faith. If Jesus saw it, I hope he’d be seriously pissed.

Another disturbing scene in the documentary was a group of kids being made to pledge their allegiance to the Christianized American flag: I pledge allegiance to the Christian Flag, and to the Savior for whose Kingdom it stands. One Savior, crucified, risen, and coming again with life and liberty to all who believe. Maybe our poor laws are no match for the untold number of people who desire to restore the long-divorced church and state.

Jesus Camp is one of the best documentaries I have seen in a long time. Go rent it! And post comments on what you think of it.

Figure 2: Pastor Ted demonstrates the preacher power move that has come to be known as the Meth Magnet.

Cthulhu fhtagn, and he approves of this message.

The unspeakable dread is mounting on this Election Day as I traverse the glittering touchscreen prompts on the voting machines to select our next round of overlords. But in his house at R'lyeh, dead Cthulhu waits dreaming. And he has a campaign. Chris, friend of the Cthulhu for Senate effort, wore this shirt to the polls this morning:


Get the word out with another fine example of Cthulhu campaign paraphenilia:


More info on Cthulhu. Keep his name in mind if your desperation at the ballot leads you to stare at the write-in box, wondering what name to scrawl there. Is “a pulpy, tentacled head surmounted by a grotesque and scaly body with rudimentary wings” really any worse than many of the people we’ve already elected???

We’ll have to wait a little while longer for the Cthulhu for President campaign to heat up.


The Napoleon Dynamite shirt the Cthulhu shirt is based on:


See also: Icethulhu and Foodthulhu.

Spartans! Tonight… we dine… ON GYROS!

I am completely overtaken with anticipation for this movie.

The high definition version of the trailer is a thing of beauty.

My friend James gave me the graphic novel a few years back, and it’s very cool to see it turned into a movie.

He nice, the Borat.

I had no idea Borat was in town! He came to a rodeo in Salem last year to piss off the crowd, and the scene has made it into the new Borat movie. You can see parts of the scene in the new theatrical trailer for the movie.

Rodeo in Salem gets unexpected song rendition

“I hope you kill every man, woman and child in Iraq, down to the lizards,” he said, according to Brett Sharp of Star Country WSLC, who was also on stage that night as a media sponsor of the rodeo.

An uneasy murmur ran through the crowd.

“And may George W. Bush drink the blood of every man, woman and child in Iraq,” he continued, according to Robynn Jaymes, who co-hosts a morning radio show with Sharp and was also among the stunned observers.

The crowd’s reaction was loud enough for John Saunders, the civic center’s assistant director, to hear from the front office. “It was a restless kind of booing,” Saunders said.

Full story