Category Archives: tech

Why am I jealous of poster board?

Because my new poster gets to go to Istanbul, Turkey for next week’s EvoStar Evolutionary Computation conference without me. I just shipped it off this morning.

Combining puzzle-making with biologically-inspired computer algorithms. Because life isn’t already crazy enough.

Click the above image to see the large version of the poster. Or, you can get a copy of the original PDF. And download the entire 10 page paper if you really want to bore yourself.

We decided not to travel to Istanbul for the conference because of the expense and difficulty in getting there. Alas, I will not be able to experience the surreal juxtaposition of the geeky Mario A.I. competition amidst the majestic ruins of the ancient Roman, Byzantine, and Ottoman empires.

My other posters: Evolutionary art (GECCO 2007), Cracking substitution ciphers (GECCO 2008)

There is no Koopa.

Want to make $500?

All you have to do is make Mario come to life with an artificially intelligent computer program. And he has to survive as many randomly-generated game levels as possible from the Infinite Mario Bros project.

Piece of cake!

A contest entry. From the entrant: “Here’s my attempt at an AI for the Mario AI Competition. You can see the path it plans to go as a red line, which updates when it detects new obstacles at the right screen border. It uses only information visible on screen. I’ve included a slow-motion part in the middle where it gets hairy for Mario. :-)”

Here is the same intelligent agent playing a longer level.

These agents will be unstoppable once they replicate in the Matrix.

United we stand

You enter a contest. A million dollars is at stake. Forty-one thousand teams from 186 different countries are clamoring for the prize and the glory. You edge into the top 5 contestants, but there is only one prize, and one winner. Second place is the first loser. What do you do?

Team up with the winners, of course.

The Netflix Prize is a competition that is awarding $1,000,000 to whomever can come up with the best improvement to their movie recommendation engine. Their system looks at the massive amounts of movie rental data to try to predict how well users will like other movies. For example, if you like Coraline, you may also like Sweeney Todd. But Netflix’s recommendation engine isn’t great at making predictions, so they decided to offer a bounty to anyone who could come up with a system that has a verifiable 10% improvement to Netflix’s prediction accuracy.

The contest recently ended with two teams jockeying for the prize. During the two and a half years the contest has been active, several individuals and small groups dominated the contest leaderboard;, with competition among 41,000 teams from 186 different countries. The competition became fierce, resulting in coalitions forming. The team “BellKor’s Pragmatic Chaos” formed from the separate teams “BellKor” (part of the Statistics Research Group in AT&T labs), “BigChaos” (a group of folks who specialize in building recommender systems), and “PragmaticTheory” (two Canadian engineers with no formal machine learning or mathematics training). Another conglomerate team, “The Ensemble“, is made up of “Grand Prize Team” (itself a coalition of members combining strategies to win the prize), “Vandelay Industries (another mish-mash of volunteers)”, and “Opera Solutions“.


At first, it looked like BellKor’s Pragmatic Chaos won. But now it looks like The Ensemble won. Netflix says it will verify and announce the winner in a few weeks.

Who the hell cares? Why is this interesting in the slightest? Ten percent seems so insignificant.

Well, predicting human behavior seems impossible. But this contest has clearly shown that some amount of improvement in prediction of complicated human behavior is indeed possible. And what’s really interesting about the winning teams is that no single machine learning or statistical technique dominates by itself. Each of the winning teams “blends” a lot of different approaches into a single prediction engine.

Artificial neural networks. Singular value decomposition. Restricted Boltzmann Machines. K-Nearest Neighbor Algorithms. Nonnegative matrix factorization. These are all important algorithms and techniques, but they aren’t best in isolation. Blending is key. Even the teams in the contest were blended together.

United we stand.

Each technique has its strengths and weaknesses. Where one predictor fails, another can take up the slack with its own unique take on the problem.

BellKor, in their 2008 paper describing their approach, made the following conclusions about what was important in making predictions:

  • Movies are selected deliberately by users to be ranked. The movies are not randomly selected.
  • Temporal effects:
    • Movies go in and out of popularity over time.
    • User biases change. For example, a user may rate average movies “4 stars”, but later on decide to rate them “3 stars”.
    • User preferences change. For example, a user may like thrillers one year, then a year later become a fan of science fiction.
  • Not all data features are useful. For example, details about descriptions of movies were significant, and explained some user behaviors, but did not improve prediction accuracy.
  • Matrix factorization models were very popular in the contest. Variations of these models were very accurate compared to other models.
  • Neighborhood models and their variants were also popular.
  • For this problem, increasing the number of parameters in the models resulted in more accuracy. This is interesting, because usually when you add more parameters, you risk over-fitting the data. For example, a naive algorithm that has “shoe color” as an input parameter might see a bank that was robbed by someone wearing red shoes, and conclude that anyone wearing red shoes was a potential bank robber. For another classic example of over-fitting, see the Hidenburg Omen.
  • To make a great predictive system, use a few well-selected models. But to win a contest, small incremental improvements are needed, so you need to blend many models to refine the results.

RMSE (error) goes down as the number of blended predictors goes up. But the steepest reduction in error happens with only a handful of predictors — the rest of them only gradually draw down the error rate.

Yehuda Koren, one of the members of BellKor’s Pragmatic Chaos and a researcher for Yahoo! Israel, went on to publish another paper that goes into more juicy details about their team’s techniques.

I hope to see more contests like this. The KDD Cup is the most similar one that comes to mind. But where is the ginormous cash prize???


Optimism that only a programmer can appreciate

“A programmer is a person who passes as an exacting expert on the basis of being able to turn out, after innumerable punching, an infinite series of incomprehensive answers calculated with micrometric precisions from vague assumptions based on debatable figures taken from inconclusive documents and carried out on instruments of problematical accuracy by persons of dubious reliability and questionable mentality for the avowed purpose of annoying and confounding a hopelessly defenseless department that was unfortunate enough to ask for the information in the first place.”

— IEEE Grid news magazine


You’re sitting at your computer, writing your next awesome computer program. You think, “I want to run my new program. But the computer I have is too slow and too boring to run it on.”

You glance over at the petri dish in your biology lab. “What if I could deploy my program as DNA, and the outcome of my program gets expressed as proteins and genes in a real cell?”

Sounds kind of crazy. But Microsoft is researching this.

An Escherichia coli predator-prey system implemented with a synthetic biology programming language developed by Microsoft researchers.

In their paper Towards programming languages for genetic engineering of living cells, Microsoft UK researchers Michael Pedersen and Andrew Phillips have developed a programming language that translates logical concepts into models of biological reactions in simulators. Reactions that have favorable results have the potential to be synthesized into DNA for insertion into real cells, achieving some level of cyborgian awesomeness that we can only just begin to imagine. (Insert obligatory Blue Screen of Death joke here).

More info here. And be sure to check out the full paper here.


XSLT is a language for transforming XML. I came to hate XSLT long ago, at the tail end of a fading honeymoon period in which I dwelt in the empty promises of XML.

Somebody came up with a way to plot the Mandelbrot Set using only an XML file combined with a particularly evil XSLT file. This is a disturbing, evil way to go about drawing fractals. Please don’t do this.

It really works. Click here to try it. Your browser will thank you for the pointless exercise.

(previously, and previously)

Game Day at Luna City Arcade

The multi-talented programmer, vintage arcade enthusiast, artist, and iPhone game developer Peter Hirschberg created a one-of-a-kind private vintage arcade collection that he periodically opens to the public. His Luna City Arcade is an acclaimed collection of still-functioning vintage arcade games from the 1970s and 1980s that he keeps and maintains in a 2400 square foot building next to his house.

Luna City Arcade. Image by Peter Hirschberg.

This past Saturday, Iris and I hopped in the car and drove up to Luna City Arcade. We met up with the McCubbins and enjoyed a day of beeping and chippy nostalgia.

(youtube link)

Peter and his wife were gracious hosts and we had a great time. Check out Peter’s newest iPhone game, Vector Tanks, a throwback to the days of Battlezone, and Adventure Revisited, his clone of the classic Atari game Adventure for Windows and Mac. And take a look at his software portfolio for an interesting collection of simulators, emulators, and novelties that he’s created.


TGAW’s series of posts of “Season Compares” inspired me to code up another creation, the Swapper, to add to my ever-expanding collection of web toys of questionable value.

Example swappification. Click to open if it isn’t animating for you.

Its purpose is to let you specify two images. The Swapper will load your two images and display the first one for you. When you move your mouse cursor over the first image, the second image will appear. When you move your mouse cursor away from the second image, the first image will appear. This gives you a way to quickly flip between two images to compare them. Give it a try! There are some sample images there to get you started.

Programmer’s lament

“Since I’ve violated the Golden Rule of Helping Friends with their PC Problems and attempted to help a friend with his PC problem, expectedly wiping out his hard drive in vain, I had many opportunities to explain the Programmer Paradox: how can a programmer fail to make a computer do as he wishes? While the difficulty of debugging a program without the source proved hard to explain to laymen, I think I’ve found a metaphor that does a good job. A programmer is to the blue screen of death what Mikhail Kalashnikov is to a loaded AK-47: just as helpless a victim as any other mortal, except for having a profound understanding of the mechanisms of his execution.”

Yossi Kreinin

New site launched today:

I launched a new site site this morning:

I saw this xkcd comic recently, and it made me want to be able to see Wikipedia articles side-by-side with their “simple” counterparts.

Simple English Wikipedia is a version of the Wikipedia encyclopedia, written in Simple English and started in 2004. The encyclopedia is supposed to be used by children, who might not understand the complicated articles in the English Wikipedia, and other people who are still learning English.” is a quick hack I put together that lets you view the articles side-by-side. To do this, go to the site and type an article name in the search box (for example, War, or Peace, or Chocolate). Or, paste the article’s URL directly from wikipedia (for example, Then click the “Again, but slower” button. The site will try to load the original article and the simplified article side-by-side. If it doesn’t find the simple version, try a different article, because not all of Wikipedia’s articles have been translated into simple versions.

You can also try some examples by choosing one from the pulldown list on the page. Or, try your luck with a random article by clicking the Random button. If you click the full formatting checkbox, the original formatting of the Wikipedia articles will be displayed (the site displays the printable stripped-down format by default).