A MLP is nothing more than a function approximator and not a magical solution for any random problem. You need to spend some more thoughts about what function you want to approximate. Simply mapping the grid as input and a north/south/east/west/stay for output is not going to be very fruitful. The input is simply not relevant enough to produce a good move. You need to provide other state information as well.
Also, I don't believe it's such a good idea to use the top level bots for training data. They are simply not good enough.

A better idea would be either to generate sample battles via brute force exhaustive search, or even more fun, running your bot against itself in an evolutionary manner. Both those methods will give you more training data than you can handle. Even if backprop is quite slow, it's still my favorite. Each time I try something else (rprop, pso ...) I always revert back to backprop in the end. It might be slow, but I believe it's quite stable compared to many other faster algorithms. It's no too complicated to implement a massively parallel backprop using CUDA if you really want to, and then you don't have to worry about the learning speed any more (x100-x1000 times the speed of running on a CPU).
Experiment with different size of nets and numbers of hidden layers. Even if in theory any function is possible to implement using a single hidden layer, the economics is different. I.e. you may be able to reduce the total number of weights by using several smaller hidden layers instead of one large. And you don't have to worry about over training here, if you find this a problem, you have a too small training set. I would say your problem is the opposite, too hard to get the MLP to converge, especially if you have a suboptimal function you want to approximate in the first hand.