I have a round based game played on a grid map with multiple units that I would like to control in some fashion using neural network (NN). All of the units are moved at once. Each unit can move in any of the grid map direction: $up$, $down$, $left$ and $right$.
So if we have $n$ units then output policy vector of NN sh...