It is currently Wed Sep 20, 2017 6:13 pm Advanced search

I miss the collaborative nature of the Tron contest where everyone basically revealed their strategy in the forum and generated better ideas. Everyone's been much more tight-lipped since then. So I'm going to reveal mine here and now.

This is what I do for combat. It's stupid, but it got me into the top 10 (together with some pretty good exploration), and it's very easy to implement. I don't know whether it corresponds directly to any existing technique, but it's loosely based on Gibbs sampling in graphical models, and on Ising models. The central assumptions being a) the state space is so huge you can't evaluate all possibilities, but you can *sample* them, and b) because this is a simultaneous game and we don't get to see our opponent's move before we move, the minimax strategy would be too conservative.

I'm honestly not sure whether those assumptions are true, because you can *sort of* evaluate all possible enemy moves simultaneously to some extent by defining a frontier of enemy attack strength, but that's not very accurate. And the top bots seem to be very patient and avoid making sacrifices, which might be a better strategy overall, whereas my bot is very aggressive and will often win on points even after it's been eliminated.

First (very importantly), you need to be able to provisionally move your ants and the opponents ants anywhere, see who lives or dies, and add up some score for how good or bad you consider that to be. This can be tricky: say player A has two ants and player B has one in a local area, and the current move proposal looks like this: A..B.A. The A on the left lives, and the other two ants die. If the first A ant moves to the right, the B ant still dies but the A on the right is resurrected. I had a bug preventing that from working, which is why I mention it. But anyway, you need a score.

The third assumption of this method is that the magnitude of your heuristic score function doesn't actually mean anything, just that moves with greater scores are better moves.

For each ant, you keep a Dirichlet distribution over its possible moves: it represents the probability that the ant moving in that direction is the ant's best move. The Dirichlet distribution (which is the multidimensional generalization of the Beta distribution) is a bunch of fancy math around a simple idea, and so if you look it up on Wikipedia, don't worry about that. Technically we're going to do something called collapsed sampling (where we simultaneously consider all possible multinomial distributions generated by the Dirichlet distribution), and all the fancy math integrates away into something totally obvious. In fact you can forget I even mentioned the D word.

Anyway, each ant starts out with Dirichlet parameters which are just the counts (1,1,1,1,1): the 1 there is the "alpha" parameter and you can tweak it if you want. Each one corresponds to a move in one of the possible directions: no move, N, S, E, W. The higher you make alpha, the more random exploration it does. alpha=0 would be a completely greedy strategy (which might be worth trying!)

Pick a random ant (either yours or an enemy ant). Move it in each direction (if you can, meaning no water, food, or other ant is in the way), and compute the scores for each potential move. If this is an enemy ant, negate those scores so that the enemy ant is trying to minimize your score by maximizing -score. For every move that has the greatest possible score (you could have two different moves with an equal score), increment the corresponding Dirichlet parameter. E.g. if going north is the best move for your first ant, then its parameters become (1,2,1,1,1).

Then, sample a move from the new distribution. To sample a single move from the distribution, you add up all the counts for all the moves the ant can possibly make and call it sum. (6 if our example ant here can go any direction) Pick a random number r between 0 and sum-1 inclusive. Iterate i through the moves the ant can make, subtracting dirichlet[i] from r until it goes negative. Once it does, i is your move, so move the ant that way now and update the combat resolution.

Then keep doing that for all ants randomly until you're almost out of time (or your sampling converges). The ants explore the huge state space and converge towards favorable moves on both sides. At the end of sampling, go through all your ants and pick the move with the highest total best-move count (splitting ties randomly).

This actually works pretty well. Your ants largely do the right thing, but if you get into a big clump in front of a cautious enemy's wall of ants, they will often commit suicide accidentally. I use this, by the way, for my entire move strategy without splitting ants up into ants that are in combat and ants that aren't.

The method is also relatively insensitive to your scoring function -- some of my moves are 1e-6 better than others, some are 100 better, but "better" is always given equal weight. This could be a drawback, as it doesn't push strongly away from disastrous moves. If your scores happen to be log probability ratios, then you can just implement Gibbs sampling directly and skip all the Dirichlet stuff, but I think it's harder to come up with a good scoring function (a machine learning approach would probably work well here). That was going to be my next experiment, but I won't have time for it.

There are a few possible extensions, some of which I have implemented. One is tracking move success rates conditional on other ants' moves. Another is detecting ants which converge early and ending sampling for those ants. I've also tried tracking the best minimum score (basically implementing sampled minimax) but that doesn't really work as well.

This is what I do for combat. It's stupid, but it got me into the top 10 (together with some pretty good exploration), and it's very easy to implement. I don't know whether it corresponds directly to any existing technique, but it's loosely based on Gibbs sampling in graphical models, and on Ising models. The central assumptions being a) the state space is so huge you can't evaluate all possibilities, but you can *sample* them, and b) because this is a simultaneous game and we don't get to see our opponent's move before we move, the minimax strategy would be too conservative.

I'm honestly not sure whether those assumptions are true, because you can *sort of* evaluate all possible enemy moves simultaneously to some extent by defining a frontier of enemy attack strength, but that's not very accurate. And the top bots seem to be very patient and avoid making sacrifices, which might be a better strategy overall, whereas my bot is very aggressive and will often win on points even after it's been eliminated.

First (very importantly), you need to be able to provisionally move your ants and the opponents ants anywhere, see who lives or dies, and add up some score for how good or bad you consider that to be. This can be tricky: say player A has two ants and player B has one in a local area, and the current move proposal looks like this: A..B.A. The A on the left lives, and the other two ants die. If the first A ant moves to the right, the B ant still dies but the A on the right is resurrected. I had a bug preventing that from working, which is why I mention it. But anyway, you need a score.

The third assumption of this method is that the magnitude of your heuristic score function doesn't actually mean anything, just that moves with greater scores are better moves.

For each ant, you keep a Dirichlet distribution over its possible moves: it represents the probability that the ant moving in that direction is the ant's best move. The Dirichlet distribution (which is the multidimensional generalization of the Beta distribution) is a bunch of fancy math around a simple idea, and so if you look it up on Wikipedia, don't worry about that. Technically we're going to do something called collapsed sampling (where we simultaneously consider all possible multinomial distributions generated by the Dirichlet distribution), and all the fancy math integrates away into something totally obvious. In fact you can forget I even mentioned the D word.

Anyway, each ant starts out with Dirichlet parameters which are just the counts (1,1,1,1,1): the 1 there is the "alpha" parameter and you can tweak it if you want. Each one corresponds to a move in one of the possible directions: no move, N, S, E, W. The higher you make alpha, the more random exploration it does. alpha=0 would be a completely greedy strategy (which might be worth trying!)

Pick a random ant (either yours or an enemy ant). Move it in each direction (if you can, meaning no water, food, or other ant is in the way), and compute the scores for each potential move. If this is an enemy ant, negate those scores so that the enemy ant is trying to minimize your score by maximizing -score. For every move that has the greatest possible score (you could have two different moves with an equal score), increment the corresponding Dirichlet parameter. E.g. if going north is the best move for your first ant, then its parameters become (1,2,1,1,1).

Then, sample a move from the new distribution. To sample a single move from the distribution, you add up all the counts for all the moves the ant can possibly make and call it sum. (6 if our example ant here can go any direction) Pick a random number r between 0 and sum-1 inclusive. Iterate i through the moves the ant can make, subtracting dirichlet[i] from r until it goes negative. Once it does, i is your move, so move the ant that way now and update the combat resolution.

Then keep doing that for all ants randomly until you're almost out of time (or your sampling converges). The ants explore the huge state space and converge towards favorable moves on both sides. At the end of sampling, go through all your ants and pick the move with the highest total best-move count (splitting ties randomly).

This actually works pretty well. Your ants largely do the right thing, but if you get into a big clump in front of a cautious enemy's wall of ants, they will often commit suicide accidentally. I use this, by the way, for my entire move strategy without splitting ants up into ants that are in combat and ants that aren't.

The method is also relatively insensitive to your scoring function -- some of my moves are 1e-6 better than others, some are 100 better, but "better" is always given equal weight. This could be a drawback, as it doesn't push strongly away from disastrous moves. If your scores happen to be log probability ratios, then you can just implement Gibbs sampling directly and skip all the Dirichlet stuff, but I think it's harder to come up with a good scoring function (a machine learning approach would probably work well here). That was going to be my next experiment, but I won't have time for it.

There are a few possible extensions, some of which I have implemented. One is tracking move success rates conditional on other ants' moves. Another is detecting ants which converge early and ending sampling for those ants. I've also tried tracking the best minimum score (basically implementing sampled minimax) but that doesn't really work as well.

- a1k0n
- Colonel
**Posts:**90**Joined:**Fri Feb 12, 2010 3:51 am

Fascinating - thanks very much for sharing it!

- Scryer
- Colonel
**Posts:**72**Joined:**Wed Nov 09, 2011 5:40 pm

- Darhuuk
- Colonel
**Posts:**71**Joined:**Wed Nov 16, 2011 12:58 pm

Intriguing, I have a second codebase I dabbled with using , which is a simplified idea of what you have done - but I never managed to get it to behave properly and kind of gave up on it. Perhaps I should revisit it in the copious remaining time !

- Zaph
- Colonel
**Posts:**78**Joined:**Sun Sep 05, 2010 9:00 pm**Location:**Melbourne, Australia

Thank you a1k0n for posting this. I understand maybe 50% but if it got you to top ten I'd venture to say it's a quite good algorithm.

One thing that comes to mind, you program in C++. How would this method perform with interpreted languages like Python, which are, let's say, 30 times slower?

Also, I'm not sure about updating distribution for each ant - does it happen once and then you only go sample? Or do you check best moves at each stage - but then, where's the sampling?

One thing that comes to mind, you program in C++. How would this method perform with interpreted languages like Python, which are, let's say, 30 times slower?

Also, I'm not sure about updating distribution for each ant - does it happen once and then you only go sample? Or do you check best moves at each stage - but then, where's the sampling?

- agent_smith
- Colonel
**Posts:**54**Joined:**Mon Nov 28, 2011 2:28 pm

- a1k0n
- Colonel
**Posts:**90**Joined:**Fri Feb 12, 2010 3:51 am

- a1k0n
- Colonel
**Posts:**90**Joined:**Fri Feb 12, 2010 3:51 am

- Darhuuk
- Colonel
**Posts:**71**Joined:**Wed Nov 16, 2011 12:58 pm

Many thanks, a1k0n! It didn't take me more than 2 hours to implement your algorithm, and now my ants finally make beautiful, coordinated attacks

A few days ago I had actually planned to implement a "2-pass scoring" algorithm, where during the first scoring pass each ant would assume all other ants wouldn't move, then score all possible movements for this ant, thus yielding the "desired location" for each ant. During the second pass, all ants' movements would be scored again, but this time taking the "desired locations" into account. I realize now that this is basically what your algorithm does, except yours works better and is easier to understand. In my ignorance, I didn't even consider scoring the enemies' ants' moves, nor did the concept of random sampling/monte carlo enter my mind...

What surprised me is the speed of your algorithm: even my preliminary, completely unoptimized prototype already manages several thousand samples per turn (in C#).

Again, many thanks - you've definitely saved my lots of headaches and sleepless nights

A few days ago I had actually planned to implement a "2-pass scoring" algorithm, where during the first scoring pass each ant would assume all other ants wouldn't move, then score all possible movements for this ant, thus yielding the "desired location" for each ant. During the second pass, all ants' movements would be scored again, but this time taking the "desired locations" into account. I realize now that this is basically what your algorithm does, except yours works better and is easier to understand. In my ignorance, I didn't even consider scoring the enemies' ants' moves, nor did the concept of random sampling/monte carlo enter my mind...

What surprised me is the speed of your algorithm: even my preliminary, completely unoptimized prototype already manages several thousand samples per turn (in C#).

Again, many thanks - you've definitely saved my lots of headaches and sleepless nights

- gnu264
- Captain
**Posts:**23**Joined:**Tue Nov 08, 2011 5:27 pm

- Bolukan
- Cadet
**Posts:**4**Joined:**Fri Nov 04, 2011 11:55 pm

Users browsing this forum: No registered users and 1 guest