by krokkrok » Wed Oct 13, 2010 3:16 am
One of the problems with most of the games people use to measure AI's is that the outcome of a contest is subjective. The metric which determines the winner is not transitive so the ordering is not stable when new participants are added and the order of judgments impacts the outcome.
Consider the simple case where A beats B, B beats C, but C beats A, all the time. If all possible games are not played, the judgement will of A plays B, B plays C will have the outcome that C is rated the loser simply because it played last. A statistical sampling would most likely find a winner and a loser where that relationship does not actually exist.
Next think about what happens, even in the case where all possible outcomes of players vs player on all possible maps are actually counted. The way we determine the winners is to sort the players by number of wins. If this is a good objective metric, we would expect that if we added just one more player to the mix, that player would be slotted at his level of ability, and the relative position of other players would remain the same or different by one. But this is not the case with head-to-head games like Galcon: Consider the case where there is a bot that can beat top rated player %100 percent of the time, and other players zero percent of the time. When this player is inserted into the list, top rated player suddenly has losses equal to the number of games played against the new bot, while other bots have wins, and thus, the order is unstable. Thus, this metric is subjective to the exact set of players, and is not an objective relationship.
TLDR: sorting non-transitive games is like using a faulty compare function in a sort routine.
An example of an objective measurement is a foot race. Even though it seems that the runners are competing against one another, they are actually competing against a fixed metric: shortest time. If you run the race again with an additional entrant, that entrant will be in his proper place and inferior players are only moved down by one rank: the sorting is stable.
Indeed many familiar games like chess and the NBA fit under the unstable category. But this has been a standard way of competition, its a tradition, its the way its always been done.
But this is monkey logic. Literally monkey logic. Monkey's are about dominance, even if no such relationship exists. Someone has to be on top even if its just a random someone.
It is instructive I think to examine the types of cheating that can be done in the unstable system: players can get significant advantage by colluding with one another. A submits to B to give A an artificial lead in the standings. That this is even possible should be a red flag; unethical behavior can be confused with intelligence.
Thus, I propose that future AI challenges require that the outcome of the game should be objective and transitive.
edit: It looks like the only game in your list that meets this criteria is the maze solver.