[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file /includes/functions.php on line 4586: Cannot modify header information - headers already sent by (output started at /includes/functions.php:3765)
[phpBB Debug] PHP Warning: in file /includes/functions.php on line 4588: Cannot modify header information - headers already sent by (output started at /includes/functions.php:3765)
[phpBB Debug] PHP Warning: in file /includes/functions.php on line 4589: Cannot modify header information - headers already sent by (output started at /includes/functions.php:3765)
[phpBB Debug] PHP Warning: in file /includes/functions.php on line 4590: Cannot modify header information - headers already sent by (output started at /includes/functions.php:3765)
AI Challenge Forums • View topic - Rankings and cutoff based on mu

It is currently Tue Dec 11, 2018 9:26 pm Advanced search

Rankings and cutoff based on mu

Ideas for the Future

Rankings and cutoff based on mu

Postby fourmidable » Wed Dec 21, 2011 1:11 am

I believe the fairest way to proceed with the ranking and the cutoff of the bots is to use mu instead of skill.

Let me explain why this makes sense. First I'd like to recap a little on the TrueSkill algorithm for those who are unfamiliar with it. I got a pretty good look at how it works as part of Kaggle's Deloitte/FIDE Chess Rating Challenge. TrueSkill is essentially a Bayesian estimator of a player strength. TrueSkill is superior to other simpler ratings system like ELO in two ways: first there is provision for teams and multi-player games, and it converge faster than ELO. A large part of the advantage of TrueSkill is that it tracks both strength (mu) and uncertainty (sigma) of each participant. The player strength is modeled as a Gaussian (normal) distribution of mean mu and standard distribution sigma.

In Ants, mu is initialized to 50 and sigma to 50/3 ~= 16.67. From a Bayesian perspective, this is the prior distribution. After each game, TrueSkill compares the outcome of the game with the expected results and computes a new distribution mu/sigma that maximizes the probability of the strength for each player. The research paper reports that good convergence occurs within 20 games typically.

Now, ants leaderboard is not sorted by program strength estimate (mu), but instead by a metric called 'skill' defined as mu-3*sigma. Notice how a new submission gets a skill of zero. As gamed are played, the uncertainty in strength decreases, reducing sigma, and thus the 'skill' typically increases (unless mu drops precipitously) . Skill is nice to maintain a leaderboard so that rough initial mu estimates don't suddenly appear on top, and force the program to slowly raise through the ranks over time. However, for the final, this property is undesirable.

The pairing algorithm favors adding players with low sigma in other player's games. As more games are played, sigma becomes lower and even more game are played, improving 'skill' but not 'mu' as much. It would be unfair to compare 'skill' between players that have played different number of games, but comparing 'mu' is the best comparison. Remember 'mu' is really the statistically most probable measure of the strength of the player, and much less sensitive to the number of game played. There is still a small bias introduce by the priors, negative for top players and positive for bottom players, but this effect should be small once 10 games are played. Skill is an artificial measure designed to lower new submission, that essentially says there is 99.85% chance the player strength is actually above this value (3 sigma).

For these reason a suggest switching on a 'mu' based ranking for the finals.
fourmidable
Cadet
 
Posts: 8
Joined: Sun Oct 30, 2011 11:55 am

Re: Rankings and cutoff based on mu

Postby BenJackson » Wed Dec 21, 2011 4:43 am

I agree. I was/am hoping the sigmas will converge in the final-finals to the point where it won't matter. However when the "cutoff" starts any bot that gets ejected past the cutoff will have a hard time getting back in because outside the cutoff your sigma is frozen.
BenJackson
Colonel
 
Posts: 94
Joined: Sat Oct 29, 2011 4:16 am

Re: Rankings and cutoff based on mu

Postby tictac » Wed Dec 21, 2011 7:36 am

i agree too.
I am worried about convergence if the final stop thrusday as advertised: the top 100 start at least to have the good bots in it but its far far from converged imho. Quite some bots rank are probably of my more than 10 rank (mine for example is above its true rank probably because it has a lot of games and thus a low sigma) and the number of game is very different across bots. So to have a fair sigma you want to equalize the number of games but even that is not totally fair because a bot who had a lot of its matches early probably had a much more random set of encounters than bots that will have their missing games later (unless the sigma diminution takes into account the sigma of other players ?). In addition luck plays more of a factor in ants than in games like chess so this inherently increases the sigma of all bots.
tictac
Lieutenant
 
Posts: 18
Joined: Wed Nov 30, 2011 7:47 pm

Re: Rankings and cutoff based on mu

Postby tmc » Wed Dec 21, 2011 7:47 am

Completely agree; I wanted to make exactly the same argument, so thanks for sparing me the trouble :)
tmc
Brigadier-General
 
Posts: 101
Joined: Fri Oct 28, 2011 8:42 am

Re: Rankings and cutoff based on mu

Postby McLeopold » Wed Dec 21, 2011 6:07 pm

McLeopold
Contest Organizer
 
Posts: 262
Joined: Sun Sep 19, 2010 3:31 am

Re: Rankings and cutoff based on mu

Postby amstan » Wed Dec 21, 2011 6:18 pm

From what I talked to janzert.
Sigma is not exactly the same as game count. Game count does matter, but we hope we can converge them. Sigma offers a more important metric which is consistency. If 2 players are racing for a rank and they have the same mu, you would want the more consistent bot the take the lead than the one who might lose once in a while.

Related to convergence, this is the range of sigmas that we have so far:
http://pastebin.com/hmEaVUtq
Alexandru M. Stan
Contest Organizer
User avatar
amstan
Contest Organizer
 
Posts: 691
Joined: Sun Jan 31, 2010 4:02 am
Location: Stoney Creek, Ontario

Re: Rankings and cutoff based on mu

Postby Parasprites » Wed Dec 21, 2011 6:50 pm

Are the finals really going to end in two days? Because I doubt they will have converged completely by then. Admittedly, if all you want to know is 1st place, you could stop it now, but the rankings of everyone below Xathis are still highly variable.
Parasprites
Major-General
 
Posts: 224
Joined: Mon Oct 24, 2011 3:08 pm

Re: Rankings and cutoff based on mu

Postby analyst74 » Thu Dec 22, 2011 4:29 am

If my super-unscientific observation stands, don't all players' sigma converges to 1.30 at some point after 50~100 games?
analyst74
Major
 
Posts: 39
Joined: Wed Feb 17, 2010 7:45 pm

Re: Rankings and cutoff based on mu

Postby amstan » Thu Dec 22, 2011 4:41 am

Alexandru M. Stan
Contest Organizer
User avatar
amstan
Contest Organizer
 
Posts: 691
Joined: Sun Jan 31, 2010 4:02 am
Location: Stoney Creek, Ontario


Return to Suggestions

Who is online

Users browsing this forum: No registered users and 2 guests

cron