Category Archives: tech

Computers is smart

This fortune I read today seems like a natural follow-up to my previous post:

This is the first numerical problem I ever did. It demonstrates the
power of computers:

Enter lots of data on calorie & nutritive content of foods. Instruct
the thing to maximize a function describing nutritive content, with a
minimum level of each component, for fixed caloric content. The
results are that one should eat each day:

1/2 chicken
1 egg
1 glass of skim milk
27 heads of lettuce.

— Rev. Adrian Melott

Do you want to be a millionaire?

All you have to do is help Netflix read people’s minds!

The Netflix Prize is a contest that has been going on since October 2006. I didn’t hear about it until today. When you rate movies on the Netflix DVD rentals site, their proprietary Cinematch algorithm will predict which other movies you might like based on ratings that have been made by all Netflix users. It is very similar to Amazon’s “people who bought X also bought Y” feature. The Netflix Prize challenge is to come up with a new prediction system that is 10% more accurate than Cinematch. Whoever does this will get the top prize of $1,000,000. Netflix is also rewarding a periodic “progress prize” of $50,000 to people who can beat the last progress prize winner by 1%. The current progress prize winner, an AT&T Labs team named BellKor, has a technique that yields a prediction improvement of 8.5% over Cinematch. Read about their technique here. Their technique makes my brain hurt — it blends together 107 different results from a large ensemble of data mining models, including neighborhood-based models (k-NN), factorization models (such as Ridge regression), regressions based on Gaussian priors, restricted Boltzmann Machines, asymmetric factor models, and regression models (using principal components analysis for feature selection, and SVD vectors as predictor variables). Basically, they packed a data mining shotgun with as much shot as they could find, and pulled the trigger. That’s a hell of a lot of work for $50,000!


Figure 1: Oh, no! Data mining engineers found out that I like terrible movies!

By comparison, Netflix’s own Cinematch algorithm uses the following techniques, as quoted in their Netflix Prize FAQ:

How does Cinematch do it?
Straightforward statistical linear models with a lot of data conditioning. But a real-world system is much more than an algorithm, and Cinematch does a lot more than just optimize for RMSE. After all, we have a website to support. In production we have to worry about system scaling and performance, and we have additional sources to data we can use to guide our recommendations.

Netflix has begun a very interesting bounty hunt. As of today, 23021 teams from 164 different countries are clamoring for the cash. I love the idea of putting up public bounties for innovation – the X-Prize comes to mind, particularly the Google-sponsored lunar X prize.

(another informative article about the Netflix Prize)

Can evolution reveal a killer’s mind?

Computer science professor Dr. Ryan Garlick of University of North Texas has a very interesting setup for his symbolic processing course this semester: Each student’s objective is to contribute towards cracking the unsolved 340-character Zodiac cipher.

From a UNT news article:

Cracking the cipher is a difficult task for more than one reason, which is why Garlick, along with his students, are currently developing several computer software techniques that will hopefully make the process far more feasible.
“There are just too many possible keys to look at them all,” Garlick said. “There are 63 different symbols, and each can represent 26 possible letters (we think), which is just too many possible combinations to evaluate them all.”
This is where the computer techniques they are fashioning will hopefully come in handy. Corey Rosemurgy, an Austin senior and computer science major in Garlick’s class, is currently developing a genetic algorithm to solve the cipher.
“The genetic algorithm that I am developing models itself after the inherent properties of biological evolution and the theory of survival of the fittest where only the strong survive,” Rosemurgy said.

I was very interested to discover this course, since I have been working on a similar approach since around March of this year. In my free time (what little there is), I’ve been running experiments using ECJ, a Java-based evolutionary computing framework. So far, my focus has been on trying to get the algorithm to solve Zodiac’s 408-character cipher, which has a known solution. Using a dictionary-oriented approach, the algorithm was able to find the correct solution using a limited 400-word dictionary. Now I am trying to improve this by adding more words to the dictionary used by the algorithm. The basic idea is to get this test case working well before attempting to have it solve the really difficult 340-character cipher. This has proven to be very difficult, because the search space (number of possible solutions) is extremely large, and there is exactly one correct solution. The needle is tiny, and the haystack is vast. Evolutionary computing methods tend to be better-suited for finding really good solutions, rather than the one best solution, so this approach is quite challenging (if not flawed).

I’m glad that many people are still working on this problem; it would be nice to finally find a solution. Still, there is still a strong possibility that there is no solution, and the cipher is just gibberish designed to keep people unnecessarily busy. If so, then the Zodiac killer succeeded beyond his wildest dreams.

More info:
article | course page (of interest here are the syllabus and powerpoint presentation) | google code page and code repository for zodiac decoder software (this is the repository of software used and developed by students in the course)

Count the Start buttons

Another evil incarnation of “turduckenology” arose yesterday at work. From my virtual Windows XP instance running inside Parallels Desktop on my Mac, I needed to make a remote desktop connection to my Windows XP desktop, which is running a Virtual PC 2007 instance of a virtual machine that itself had a remote desktop connection over a VPN to a client machine on their network:

That’s me diving through the rabbit hole of *four* instances of Windows via my Mac, so I can install our software on a client’s machine. The horror!

What really bends my noodle during this is figuring out how to copy and paste all the way up and down the chain of Windows instances. And it’s really easy to lose track of which Windows XP you are in when you are clicking around – it’s easy to run the wrong program in the wrong place!

These are the kinds of memories that will preoccupy my demented dreams when I am an old man in a nursing home.

The cold case that just won’t die

I was surprised tonight to discover that somebody posted my little Zodiac cipher webtoy on Digg recently, and it has been getting some significant traction there:

http://digg.com/playable_web_games/Can_You_Crack_The_Zodiac_Killer_s_Code

I had a lot of fun making it, and I am glad that folks are getting some use out of it. Thanks for the support! I hope we can sustain the Digg Effect!!

I’ve been getting a lot of good feature suggestions from people, such as adding frequency analysis, allowing multiple letters per symbol, and supporting cipher transpositions. I hope to make some of these improvements to the app in the near future.

Dry academia

Despite its rigorous, almost unapproachable mathematical foundations, Doug Zongker’s groundbreaking academic research paper remains one of the most important scientific studies you will ever read.

Doug Zongker himself presenting his paper:


Awe-inspiring.

Source.

I get no respect.

My online Data Mining course has a web-based forum that students can use to post questions. The course instructor asked how everybody was doing on the first homework assignment. I replied, and in my reply, I asked the instructor if I was on the right track for one of the homework problems. The subject of the reply was quite surprising:



Man, that’s cold!

Wow. Harsh. I know that these instructors are very stressed graduate students; but this can’t be good for bumping up the enrollment numbers.

Sadly, the source of the subject wasn’t as dramatic as true malice, because the subjects are simply generated from the first parts of the responses. Here is the real reason the subject was so insulting:



Oh. Not REALLY fighting words.

Technology is the culprit. One day it will enslave us all! (Wait… I think this has already happened…)

Creative computing: Pushing eyecandy around

Back in 1993, during freshman year of college, my friend Brian McEntire introduced me to the “demoscene“, which is, at its best, a group of extremely highly skilled (and often very young) computer sound/video programmers who specialize in creating dazzling presentations that run in real-time on computers. Demoscene folks spend a lot of time trying to out-program each other, showing off what kind of amazing audio and visual effects they can do with computer hardware. Demos at the time were amazing to watch. When I watch the older demos now, 14 years later, they seem very quaint and primitive.


Second Reality, a demo by Future Crew, one of the most famous demo groups back in the 1990s. This was cutting-edge realtime PC sound and graphics back then.

Some of the best demos have come out of the Assembly demoparty, an annual Finnish gathering of demoscene enthusiasts which features a demo competition. Many people enter their productions into the competition, and the winning entries are usually very high quality. The recent Assembly demoparty was held in August 2007, and I was amazed by the creative and dream-like stylistic quality of the winning demo, LifeForce, by Andromeda Software Development, a Greek demogroup.


Screenshots from LifeForce by Andromedia Software Development. Click for a larger view.

To see this production in glorious motion, download the high-quality 246MB AVI movie file via this link. It is a much better experience than watching the embedded lower-quality YouTube version below.


LifeForce demo. Youtube does not do it justice. Get the high-resolution version!!

The pure skill and creative talent needed to generate these real-time productions (the animations are NOT pre-rendered), combined with the fact that the best demo groups consist mostly of teenagers and very young adults simply doing this stuff for fun in their free time, continue to amaze me.

There are always reasons to hate computers.

So, why post another?

Well, this error I recently experienced in Eclipse was amusing:



“An internal error occurred while showing an internal error.” Eclipse’s way of saying, “Senator, I do not recollect those events.”

Real life analogy: Barb at BKA once told us about a former coworker of hers who accidentally cut off his finger on some industrial equipment at the WestVaco paper mill in Covington, Virginia. A manager wanted to know what happened, so the coworker went to the machine and demonstrated the chain of events. And lost another finger.

GECCO 2007 – quick updates

The audio from the GECCO 2007 debate between Richard Dawkins, Lewis Wolpert, and Steven Jones is now available here. Also, my little project got a brief mention in this article in New Scientist (a few paragraphs from the end). I was stoked!