Jump to content
Sign in to follow this  
irishninja

Machine Learning in 0 AD

Recommended Posts

Hello everyone,

I have been interested in making it possible to explore applications of machine learning in 0 AD (as some of you may have gathered from https://trac.wildfiregames.com/ticket/5548 :) ). I realized that I haven't really explained very thoroughly my interest and motivation so I figured I would do so here and see what everyone thinks!

tl;dr - At a high level, I think that adding an OpenAI gym-like interface* could be a cool addition to 0 AD that would benefit both 0 AD (technically and in terms of publicity) as well as the research community in machine learning and AI. I go into the specifics below as well as discuss other potential avenues for integrating/leveraging machine learning:

Potential Machine Learning Problems/Applications

  1. Intelligent unit control (micromanagement)
    1. I have an example where an AI learns to kite with cavalry archers when fighting infantry at https://github.com/brollb/simple-0ad-example. This is probably one of the easiest problems to explore as it can be done progressively starting with small, clearly defined scenarios using the functionality added in the beforementioned ticket. That said, there are still some of the standard challenges present with machine learning around ensuring that the AI has been trained on sufficiently diverse scenarios so that it doesn't ever encounter something new and behave incorrectly. 
    2. As far as potential impact on the game, automatic micromanagement could be interesting for either a component in an otherwise scripted AI such as Petra or as a way to make the units more intelligent as they gain experience. That is, I could imagine that as the units gain more experience, they could also start having improved tactical behavior, such as kiting, automatically.
  2. Enemy AI Trained Entirely with Reinforcement Learning
    1. This is actually very difficult although it has been recently done in StarCraft 2 (https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii). Although I think this could be fun for people to try to do, I wouldn't have high expectations on this front for awhile because it is a very hard problem for ML to solve - especially given the large number of different civilizations, maps, resource types, etc.
  3. Enemy AI with Scripting and Learned Components
    1. This is referring to a generic version of what I mentioned under "intelligent unit control". Essentially, there are a lot of opportunities to incorporate learned components into an otherwise scripted AI. From a technical perspective, this makes the machine learning problem much easier/tractable while still enabling more intelligent behavior from the built in AI.
    2. There are many different examples of intelligent components that could be incorporated. For example, it could try to predict the outcome of a battle (to determine if we should retreat) or try to imitate various high-level human strategies (such as predicting what a human might target for an attack).
  4. Quantitative Game Balancing
    1. This is a very interesting problem and I find 0 AD to be a particularly unique opportunity for exploring it. Essentially, the idea is that there are many different parameters in a game (such as attack damage for each unit, etc) which are quite difficult to tune without making the game imbalanced and one of the civilizations/strategies OP. (I don't think I need an example for this community but I enjoyed watching https://www.gdcvault.com/play/1012211/Design-in-Detail-Changing-the.) This problem is nontrivial since detecting overpowered strategies really requires an understanding of the way various aspects of the game can be exploited. 
    2. Although this is a nontrivial problem, I find it to be an exciting opportunity for 0 AD to gain publicity and for researchers to have a sandbox in which they can explore this research question in an actual game (rather than a trivial, toy environment). That is, many of the other environments used in reinforcement learning research are either open source toy environments (eg, CartPole) or proprietary games which cannot be modified (eg, StarCraft 2). There has been a bit of related research in detecting imbalance in complex games like StarCraft 2 as well as balancing simpler games but as proprietary games will not be exposing the parameters used for the units (and other aspects of the game), automatic game balancing approaches are limited. Being an open source game that people actually play, 0 AD provides a really exciting opportunity for research in this direction as the parameters of the game are not proprietary and could be modified programmatically enabling researchers to explore this rather complex problem. For the 0 AD community, enabling researchers to conduct this type of research in the game itself should make it much easier to be able to incorporate any results of such research into the game making 0 AD more fun and an even better game!
  5. Imitation Learning
    1. Training the AI to imitate humans is worth mentioning although the impact on the game is likely to be in one of the beforementioned ways. Imitation learning, unlike reinforcement learning, is training the AI using expert demonstrations of gameplay. It is often used as a method for essentially initializing the AI to something reasonable before training it further with reinforcement learning (ie, training the AI using a reward rather than example). Imitation learning can arguably be more valuable for game development given that it can more directly instill various human-like behaviors (hopefully making the gameplay more engaging and interesting) rather than simply trying to maximize some reward or score in the game.
  6. Techniques to Train and Understand AI Agents
    1. This is more of a general research direction that I find interesting (and is similar to research that I have done in the past). Essentially, this is exploring the means by which the game developer can use the various methods of instilling behavior into an AI (programming, reinforcement learning, imitation learning) to create the desired behavior (and game experience). This is a bit of both a human-computer interaction (HCI) and machine learning question (also related to machine teaching). To give a more concrete example, this would include exploring the behavior of a trained RL agent in the game, correcting these behaviors, and perhaps trying to detect potentially incorrect behaviors to raise to the user automatically. 0 AD is well suited for this type of research for the same reasons that it is well suited for exploring game balance - most games used in research are either proprietary or not something people would actually play.
  7. Optimizing Existing Game Parameters (Relatively Easy)
    1. There are some existing machine learning tricks that could be used to make other sorts of improvements to the game rather than explore research questions. A while back, I was playing around with CMAES (a machine learning technique to optimize a set of parameters given a "fitness function") to improve some of the sort of magic numbers used within Petra such as "popPhase2" and "armyMergeSize". Essentially, this made it possible to find values for these parameters which would improve the AI's ability to win when playing against the standard Petra agent (on the hardest difficulty). Although I don't find this as interesting as the other areas, it is a useful tool that could be nice to apply to other aspects of the game.

Overall, I think it would be really exciting to be able to explore some of the research questions in 0 AD as I think it could be beneficial both to researchers but also would make it easier to incorporate the results of this research into 0 AD (making it an even better game!). Of course, this is only true if the functionality required to be added to 0 AD is easy to maintain and doesn't add overhead taking away from the development of the core game features and functionality. I am also hopeful that incorporating some of these machine learning capabilities could also be beneficial to the community and raise awareness of 0 AD!

As far as technical requirements, I made an RPC interface for controlling the AI from Python (because the majority of machine learning tools are in Python). This makes it possible to explore 1, 2, and 3 as well as provides necessary functionality for 4, 5, and 6. As mentioned above, I have an example of #1 on GitHub and I think this could make for really interesting undergraduate projects (as well as potentially interesting integrations into the game). However, I think 0 AD is a particularly unique opportunity for exploration of 4 and 6. Game balancing (#4) still requires the ability to programmatically edit the unit parameters which I have explored a little bit but haven't added to the game. If this is something that others find interesting (and wouldn't mind me asking a few questions :) ), I would be open to adding this as well.

Anyway, I find these machine learning problems and applications quite exciting both for 0 AD and for AI/ML research but I want to know what the rest of the community thinks! Let me know what you think or if you have any questions/comments! :) 

* I say *OpenAI gym-like* because a gym environment requires an observation space (numerical representation of the world for the AI), action space (numerical representation of the actions the AI can perform), and reward function to be defined. It isn't clear what the most appropriate choices for these would be (and they could vary based on the specific scenario) so I would prefer making more of a "meta-gym" where it is basically an OpenAI gym that needs the user to specify these values.

  • Like 11
  • Thanks 5

Share this post


Link to post
Share on other sites

I actually started a small repo area a while back (after I found the Hannibal 0AD bot), https://github.com/0ad4ai ... but I think the problem he had was that releases were moving so quickly it was hard to nail down a solid externalized interface.  I think the minigames are a great place to start but I tend to think I have my ideas broken up to be too narrow (market production for example), many of these can obviously be related to techniques already in Starcraft play, resource management.  I did not bail on these ideas but I rather found a "simpler" version easier to work w/ so I have a built copy which is smaller, but even then I moved onto MicroRTS for quicker implementation of ideas, https://github.com/santiontanon/microrts ... there is an OpenAI gym for it but likewise there issues always seems to be across what format to write data out to (binary vs. JSON) especially when your state is quite large.

Either way I would love to see an 0AD-gym be available @ some point.  It's just hard to say if it justifies pulling down and using the entire game or just say 1 map + 2/3 civs, etc.

Share this post


Link to post
Share on other sites

@jonbaer Nice work with that organization - it looks like it has forked a lot of relevant repos! TensorForce and OpenAI Baselines would definitely be directly applicable to trying some of the SOTA reinforcement learning algorithms in 0 AD. 

The size of the state representation certainly comes up when exploring things like imitation learning. I actually have another fork of 0 AD that I use to simply log the game state and the replays can get decently large (between 1 and 10 mb iirc). Of course, if you convert them into some specific state representation, they will likely be much more compact (depending on your specific state representation).

As far as an 0 AD-gym, technically, there are actually two in the link I posted above :) (https://github.com/brollb/simple-0ad-example). In the repo, I created two different 0 AD-gym environments for learning to kite. The first used a very simple state space; the game state is represented simply as the distance to the nearest enemy units. As a result, the RL agent is able to learn to play it fairly quickly (although it would be insufficient if the enemy units used more sophisticated tactics than simply deathball). The second 0 AD-gym uses a richer state representation - the state is a simplified minimap centered on the player's units. This requires the RL agent to essentially "learn" to compute the distance - as it isn't already preprocessed. Although this will be harder to learn, the representation could actually capture concepts like the enemy flanking the player, the edge of the map (or other impassable regions), etc. This type of state representation will also make it possible to have a more fine-grained action space for the agent. (In the example, the RL agent can pick 2 actions: attack nearest enemy unit or retreat. With a minimap, the action space could actually include directional movement.)

That said, I am not convinced that there is a single state/action space representation for 0 AD given the customizability of maps, players, civs, goals, etc, and the trade-offs between the learnability and representational power. Since I don't think such a representation exists, I prefer providing a generic Python API for playing the game from which OpenAI gym environments can be easily created by specifying the desired state/action space for the given scenario.

  • Like 2

Share this post


Link to post
Share on other sites

I think I really missed it but what is the status of https://code.wildfiregames.com/D2199 (?) ... I don't seem to be able to locate newoption { trigger = "with-rlinterface", description = "Enable RPC interface for reinforcement learning" } in my git copy which I am building from.  

Share this post


Link to post
Share on other sites
35 minutes ago, jonbaer said:

I think I really missed it but what is the status of https://code.wildfiregames.com/D2199 (?) ... I don't seem to be able to locate newoption { trigger = "with-rlinterface", description = "Enable RPC interface for reinforcement learning" } in my git copy which I am building from.  

You need to pass it to update-workspaces.sh like so:

$ ./update-worskspaces.sh -j4 --with-rlinterface

 

Share this post


Link to post
Share on other sites

Hmm ... I still get ...

Premake args:  --with-rlinterface --atlas
Error: invalid option 'with-rlinterface'
ERROR: Premake failed

I don't see it in premake5.lua @ all ... https://github.com/0ad/0ad/blob/master/build/premake/premake5.lua

There is only a master branch there right?

Edit: Sorry or just to be clear, I should apply the D2199 diff if I want that option? I meant I just did not see it in my latest git pull anywhere.

Edited by jonbaer

Share this post


Link to post
Share on other sites

Did you apply the patch at D2199 ? You either need to use Arcanist (see wiki:Phabricator) or download and apply the patch using Download Raw diff from the differential revision page.

Share this post


Link to post
Share on other sites

Thank you, was able to build this fork and have it running now, I am currently on OSX w/ no GPU @ the moment (usually anything I require GPU for I use Colab or Gradient @ the moment) ... I wasn't able to run PPO_CavalryVsInfantry because of what looked like Ray problems but will figure out.  

  • Like 1

Share this post


Link to post
Share on other sites

Sounds good! If there is anything I can do to help - let me know.

If you are more comfortable with OpenAI baselines or Intel AI's coach, feel free to use them instead; I was just using RLlib for convenience and am not using any of their fancier features (like distributed training). Since that repo has a couple 0 AD gym environments, they should drop into these other RL frameworks pretty easily :) 

  • Like 1

Share this post


Link to post
Share on other sites

There are still some minor small issues but I got it running, I had to directly install rllib (pip3 install ray[rllib]) + obviously forgot to install the map(s) first time around :-\

It looks like I may not be running w/ the correct version of TF inside of Ray though since I get this from the logger ... AttributeError: 'SummaryWriter' object has no attribute 'flush' ...

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 496, in _process_trial
    result, terminate=(decision == TrialScheduler.STOP))
  File "/usr/local/lib/python3.7/site-packages/ray/tune/trial.py", line 434, in update_last_result
    self.result_logger.on_result(self.last_result)
  File "/usr/local/lib/python3.7/site-packages/ray/tune/logger.py", line 295, in on_result
    _logger.on_result(result)
  File "/usr/local/lib/python3.7/site-packages/ray/tune/logger.py", line 214, in on_result
    full_attr, value, global_step=step)
  File "/usr/local/lib/python3.7/site-packages/tensorboardX/writer.py", line 395, in add_histogram
    self.file_writer.add_summary(histogram(tag, values, bins), global_step, walltime)
  File "/usr/local/lib/python3.7/site-packages/tensorboardX/summary.py", line 142, in histogram
    hist = make_histogram(values.astype(float), bins)
AttributeError: 'NoneType' object has no attribute 'astype'

Is there something inside of ~/ray_results which would confirm a successful run? (I have not used it much yet but will read over the docs this week).

Share this post


Link to post
Share on other sites

Yeah, I would guess it is related to an issue with the tensorflow version. 

You can find the results in `~/ray_results/default/PPO_CavalryVsInfantry_SOME_UUID/result.json`. I sometimes plot the results as follows (after adding the attached python script to my PATH and installing jq)

cat result.json | jq '.episode_reward_mean' | scatterplot -t 'Training Reward' -x 'Training Iteration' -y 'Episode Reward Mean' -c

 

scatterplot

Edited by irishninja
  • Like 2

Share this post


Link to post
Share on other sites

I was able to fix my errors by pip3 install --upgrade tensorboardX + playing around with the zero_ad py client has been fun.

One thing I'd like to figure out (this might already exist or I just don't know if it is something which can be accomplished), would be for the python client to actually inject a way to overwrite some of the JS prototypes.  I will give an example, in Petra on the tradeManager it will link back to HQ for something like this:

m.HQ.prototype.findMarketLocation = function(gameState, template)

To me this is a decision making ability I feel like RL would be pretty well suited for (I could be wrong), but I feel like having ways to optimize your market in game (with say a Wonder game mode) on a random map would be a great RL accomplishment ... especially you would get bonus points to work around enemies + allies.  Sorry I have always been fascinated w/ that function of the game, kudos to whoever wrote it.  There are occasions where this AI makes some serious mistakes on not identifying chokepoints/narrow water routes, etc.  But to me @ least it's an important part of the game that pre simulating or making that part of the AI smarter would be key.

Also will D2199 make it into master copy?  I can't seem to locate where that decision was(n't) made ... thanks.

Share this post


Link to post
Share on other sites
5 hours ago, jonbaer said:

Also will D2199 make it into master copy?  I can't seem to locate where that decision was(n't) made ... thanks.

I'd like for it to do so. I haven't been able to get someone to review it yet. I believe it will be disabled by default though.

For my part I can't even test it as I haven't been able to make it work on Windows.

Thank you for your interest though, it might help @irishninja to improve his patch.

I believe one of the current issues of this patch is that as the interface duplicates part of the code it needs to be maintained by someone otherwise it will break pretty quickly. Which is why for instance @agentx stopped working on Hannibal AI because it was too hard to keep up with engine changes (Feel free to correct me if I'm wrong)

 

Share this post


Link to post
Share on other sites

@jonbaer - Currently there isn't any way to inject code into the builtin AI logic. I was mostly focused on making it possible to train AI via an OpenAI Gym interface (hence adding the building blocks to do so) but being able to use it to hybridize one of the builtin AIs would be cool. That said, it isn't clear to me exactly how that would work but I am open to ideas!

I would really like to get this merged into the master copy but, as @Stan` said, it is currently waiting on a reviewer. I have been excited about the potential of both machine learning for 0 AD and 0 AD for machine learning and was hoping that I could gauge/raise interest with this discussion (and potentially find a reviewer in the process ;) ).

5 hours ago, Stan` said:

I believe one of the current issues of this patch is that as the interface duplicates part of the code it needs to be maintained by someone otherwise it will break pretty quickly.

Yeah, the issue is specifically with the programmatic creation of the scenario config (source/tools/client/python/zero_ad/scenario_config.py). I am planning on removing it and simply passing the scenario config (reading it from a file rather than creating it programmatically) so that should hopefully minimize the amount of maintenance after the merge :)

  • Like 1

Share this post


Link to post
Share on other sites

I will try, I think the major issue here is probably along the lines of versioning the protobuf builds at some point and I don't know how that works.  Like maybe zero_ad is a pip installed library with a version inline with the subversion id or the Alpha 23 version, etc.  Everything else (beyond what is inside of main.cpp) can just be really built on as default.  In other words just let python have the ability to talk to pyrogenesis and push everything else to another repo.  I think there should be (or probably is) already someway to define a client/server mismatch.  I don't think @irishninja needs the main build to include the clients, I could be wrong or this has probably been discussed but I didn't see it yet.

Share this post


Link to post
Share on other sites

It can not be built by default for multiple reasons but also mainly because it doesn't work on windows. 

Share this post


Link to post
Share on other sites

I have only used it + compiled on OSX and Linux, when you say "it" does that mean it won't compile or the python client won't work/do anything on Windows, is there an error trace somewhere.  This is tough because obviously the game itself is (without doubt) rendered beautifully on Windows/GPUs/gaming rigs but I think this bit (RL/ML/AI) is really being done more in Linux/parallel.  I come in peace and hope the two camps can work together :-). I guess leaving this as a compile-time option is the only way forward.  I was hoping somehow as something through the mod lobby too.

Share this post


Link to post
Share on other sites

For now I think it will print errors then the build will fail. Basically all the sh scripts wonr work and the tools are not well supported easy to install.

I do believe though that one can totally play with it enabled while other clients do not. Pushing this a little I assume one could bench the AI against lobby players.would be fun to let it play against players all day.

Share this post


Link to post
Share on other sites
19 hours ago, jonbaer said:

I will try, I think the major issue here is probably along the lines of versioning the protobuf builds at some point and I don't know how that works.  Like maybe zero_ad is a pip installed library with a version inline with the subversion id or the Alpha 23 version, etc.  Everything else (beyond what is inside of main.cpp) can just be really built on as default.  In other words just let python have the ability to talk to pyrogenesis and push everything else to another repo.  I think there should be (or probably is) already someway to define a client/server mismatch.  I don't think @irishninja needs the main build to include the clients, I could be wrong or this has probably been discussed but I didn't see it yet.

I would be happy to move the python client to a separate repo. The only challenge is having access to the protobuf files during the code generation of the python files. This is trivial when they are in the same repo but takes a little more thought if they are going to be in separate repos (certainly still possible though).

As far as versioning in protobuf, protobuf tries to be forgiving across different versions of a protocol (https://developers.google.com/protocol-buffers/docs/overview#a-bit-of-history) and I haven't seen versions specified in protobuf files but I would be open to ideas such as adding a version field.

18 hours ago, jonbaer said:

I have only used it + compiled on OSX and Linux, when you say "it" does that mean it won't compile or the python client won't work/do anything on Windows, is there an error trace somewhere. 

There are two main issues with Windows support currently:

The good news is that both of these are definitely possible but I unfortunately do not have any Windows machines. That said, if anyone is interested in adding Windows support to the revision, I would be happy to assist however I can :)

Edited by irishninja
Fix list. Add links for code changes.
  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...