@jonbaer Nice work with that organization - it looks like it has forked a lot of relevant repos! TensorForce and OpenAI Baselines would definitely be directly applicable to trying some of the SOTA reinforcement learning algorithms in 0 AD.
The size of the state representation certainly comes up when exploring things like imitation learning. I actually have another fork of 0 AD that I use to simply log the game state and the replays can get decently large (between 1 and 10 mb iirc). Of course, if you convert them into some specific state representation, they will likely be much more compact (depending on your specific state representation).
As far as an 0 AD-gym, technically, there are actually two in the link I posted above (https://github.com/brollb/simple-0ad-example). In the repo, I created two different 0 AD-gym environments for learning to kite. The first used a very simple state space; the game state is represented simply as the distance to the nearest enemy units. As a result, the RL agent is able to learn to play it fairly quickly (although it would be insufficient if the enemy units used more sophisticated tactics than simply deathball). The second 0 AD-gym uses a richer state representation - the state is a simplified minimap centered on the player's units. This requires the RL agent to essentially "learn" to compute the distance - as it isn't already preprocessed. Although this will be harder to learn, the representation could actually capture concepts like the enemy flanking the player, the edge of the map (or other impassable regions), etc. This type of state representation will also make it possible to have a more fine-grained action space for the agent. (In the example, the RL agent can pick 2 actions: attack nearest enemy unit or retreat. With a minimap, the action space could actually include directional movement.)
That said, I am not convinced that there is a single state/action space representation for 0 AD given the customizability of maps, players, civs, goals, etc, and the trade-offs between the learnability and representational power. Since I don't think such a representation exists, I prefer providing a generic Python API for playing the game from which OpenAI gym environments can be easily created by specifying the desired state/action space for the given scenario.