Jump to content

irishninja

Community Members
  • Posts

    22
  • Joined

  • Last visited

  • Days Won

    2

Everything posted by irishninja

  1. Not sure how I missed this on the original revision since I also am using arch linux and it was building for me when I was testing it... Regardless, I am able to see the same issue when building from the git mirror. I just accepted D2928 which adds the missing include and I have addressed the unused includes that @Nescio found in another revision (D2930). Nice catch, @Nescio!
  2. @asterix - thanks for the ping! I am a bit biased but I think D2199 has a bit of promise I have an example repo where I have been training the AI to micro (mostly focused on kiting) here. Essentially, this enables you to create OpenAI Gym environments (a few have been created in the earlier link) and use RL to try to solve the task. I also made a bigger post on the topic a little while back here: Alternatively, you could also start with a simpler form of learning and simply perform derivative-free optimization of some of the hyper parameters of the AI (#7 from the earlier post). I played around with this a bit in the past and was able to improve a few of the parameters (though I didn't do a good job of contacting the community or anything :/). Essentially, I updated the code so I could pass parameters to the AI (I copied petra into "petra-param") from the command line. Then I was able to use CMAES to learn the optimal value for these parameters where the fitness function was determined by the win/loss of the agent when playing against the original petra agent on the hardest difficulty. Hmm... I thought I had made a separate branch with just the beforementioned changes but it looks like it is filled with a bunch of of other explorations into writing AI in Python (before the creation of D2199). Anyway, you can browse the code with just the ability to pass command line params to the AI here: https://github.com/brollb/0ad/tree/a15250e34db352ddbeec7134952e1bfec23a2597 Let me know if you have any questions about either of these that I could help with!
  3. I would be happy to move the python client to a separate repo. The only challenge is having access to the protobuf files during the code generation of the python files. This is trivial when they are in the same repo but takes a little more thought if they are going to be in separate repos (certainly still possible though). As far as versioning in protobuf, protobuf tries to be forgiving across different versions of a protocol (https://developers.google.com/protocol-buffers/docs/overview#a-bit-of-history) and I haven't seen versions specified in protobuf files but I would be open to ideas such as adding a version field. There are two main issues with Windows support currently: Code generation from protobuf files Link additional libraries (boost-fiber, boost-context) The good news is that both of these are definitely possible but I unfortunately do not have any Windows machines. That said, if anyone is interested in adding Windows support to the revision, I would be happy to assist however I can
  4. @jonbaer - Currently there isn't any way to inject code into the builtin AI logic. I was mostly focused on making it possible to train AI via an OpenAI Gym interface (hence adding the building blocks to do so) but being able to use it to hybridize one of the builtin AIs would be cool. That said, it isn't clear to me exactly how that would work but I am open to ideas! I would really like to get this merged into the master copy but, as @Stan` said, it is currently waiting on a reviewer. I have been excited about the potential of both machine learning for 0 AD and 0 AD for machine learning and was hoping that I could gauge/raise interest with this discussion (and potentially find a reviewer in the process ). Yeah, the issue is specifically with the programmatic creation of the scenario config (source/tools/client/python/zero_ad/scenario_config.py). I am planning on removing it and simply passing the scenario config (reading it from a file rather than creating it programmatically) so that should hopefully minimize the amount of maintenance after the merge
  5. Yeah, I would guess it is related to an issue with the tensorflow version. You can find the results in `~/ray_results/default/PPO_CavalryVsInfantry_SOME_UUID/result.json`. I sometimes plot the results as follows (after adding the attached python script to my PATH and installing jq) cat result.json | jq '.episode_reward_mean' | scatterplot -t 'Training Reward' -x 'Training Iteration' -y 'Episode Reward Mean' -c scatterplot
  6. Sounds good! If there is anything I can do to help - let me know. If you are more comfortable with OpenAI baselines or Intel AI's coach, feel free to use them instead; I was just using RLlib for convenience and am not using any of their fancier features (like distributed training). Since that repo has a couple 0 AD gym environments, they should drop into these other RL frameworks pretty easily
  7. I have also pushed the changes of D2199 to a branch on my fork of 0 AD if you prefer: https://github.com/brollb/0ad/tree/arcpatch-D2199
  8. @jonbaer Nice work with that organization - it looks like it has forked a lot of relevant repos! TensorForce and OpenAI Baselines would definitely be directly applicable to trying some of the SOTA reinforcement learning algorithms in 0 AD. The size of the state representation certainly comes up when exploring things like imitation learning. I actually have another fork of 0 AD that I use to simply log the game state and the replays can get decently large (between 1 and 10 mb iirc). Of course, if you convert them into some specific state representation, they will likely be much more compact (depending on your specific state representation). As far as an 0 AD-gym, technically, there are actually two in the link I posted above (https://github.com/brollb/simple-0ad-example). In the repo, I created two different 0 AD-gym environments for learning to kite. The first used a very simple state space; the game state is represented simply as the distance to the nearest enemy units. As a result, the RL agent is able to learn to play it fairly quickly (although it would be insufficient if the enemy units used more sophisticated tactics than simply deathball). The second 0 AD-gym uses a richer state representation - the state is a simplified minimap centered on the player's units. This requires the RL agent to essentially "learn" to compute the distance - as it isn't already preprocessed. Although this will be harder to learn, the representation could actually capture concepts like the enemy flanking the player, the edge of the map (or other impassable regions), etc. This type of state representation will also make it possible to have a more fine-grained action space for the agent. (In the example, the RL agent can pick 2 actions: attack nearest enemy unit or retreat. With a minimap, the action space could actually include directional movement.) That said, I am not convinced that there is a single state/action space representation for 0 AD given the customizability of maps, players, civs, goals, etc, and the trade-offs between the learnability and representational power. Since I don't think such a representation exists, I prefer providing a generic Python API for playing the game from which OpenAI gym environments can be easily created by specifying the desired state/action space for the given scenario.
  9. Hello everyone, I have been interested in making it possible to explore applications of machine learning in 0 AD (as some of you may have gathered from https://trac.wildfiregames.com/ticket/5548 ). I realized that I haven't really explained very thoroughly my interest and motivation so I figured I would do so here and see what everyone thinks! tl;dr - At a high level, I think that adding an OpenAI gym-like interface* could be a cool addition to 0 AD that would benefit both 0 AD (technically and in terms of publicity) as well as the research community in machine learning and AI. I go into the specifics below as well as discuss other potential avenues for integrating/leveraging machine learning: Potential Machine Learning Problems/Applications Intelligent unit control (micromanagement) I have an example where an AI learns to kite with cavalry archers when fighting infantry at https://github.com/brollb/simple-0ad-example. This is probably one of the easiest problems to explore as it can be done progressively starting with small, clearly defined scenarios using the functionality added in the beforementioned ticket. That said, there are still some of the standard challenges present with machine learning around ensuring that the AI has been trained on sufficiently diverse scenarios so that it doesn't ever encounter something new and behave incorrectly. As far as potential impact on the game, automatic micromanagement could be interesting for either a component in an otherwise scripted AI such as Petra or as a way to make the units more intelligent as they gain experience. That is, I could imagine that as the units gain more experience, they could also start having improved tactical behavior, such as kiting, automatically. Enemy AI Trained Entirely with Reinforcement Learning This is actually very difficult although it has been recently done in StarCraft 2 (https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii). Although I think this could be fun for people to try to do, I wouldn't have high expectations on this front for awhile because it is a very hard problem for ML to solve - especially given the large number of different civilizations, maps, resource types, etc. Enemy AI with Scripting and Learned Components This is referring to a generic version of what I mentioned under "intelligent unit control". Essentially, there are a lot of opportunities to incorporate learned components into an otherwise scripted AI. From a technical perspective, this makes the machine learning problem much easier/tractable while still enabling more intelligent behavior from the built in AI. There are many different examples of intelligent components that could be incorporated. For example, it could try to predict the outcome of a battle (to determine if we should retreat) or try to imitate various high-level human strategies (such as predicting what a human might target for an attack). Quantitative Game Balancing This is a very interesting problem and I find 0 AD to be a particularly unique opportunity for exploring it. Essentially, the idea is that there are many different parameters in a game (such as attack damage for each unit, etc) which are quite difficult to tune without making the game imbalanced and one of the civilizations/strategies OP. (I don't think I need an example for this community but I enjoyed watching https://www.gdcvault.com/play/1012211/Design-in-Detail-Changing-the.) This problem is nontrivial since detecting overpowered strategies really requires an understanding of the way various aspects of the game can be exploited. Although this is a nontrivial problem, I find it to be an exciting opportunity for 0 AD to gain publicity and for researchers to have a sandbox in which they can explore this research question in an actual game (rather than a trivial, toy environment). That is, many of the other environments used in reinforcement learning research are either open source toy environments (eg, CartPole) or proprietary games which cannot be modified (eg, StarCraft 2). There has been a bit of related research in detecting imbalance in complex games like StarCraft 2 as well as balancing simpler games but as proprietary games will not be exposing the parameters used for the units (and other aspects of the game), automatic game balancing approaches are limited. Being an open source game that people actually play, 0 AD provides a really exciting opportunity for research in this direction as the parameters of the game are not proprietary and could be modified programmatically enabling researchers to explore this rather complex problem. For the 0 AD community, enabling researchers to conduct this type of research in the game itself should make it much easier to be able to incorporate any results of such research into the game making 0 AD more fun and an even better game! Imitation Learning Training the AI to imitate humans is worth mentioning although the impact on the game is likely to be in one of the beforementioned ways. Imitation learning, unlike reinforcement learning, is training the AI using expert demonstrations of gameplay. It is often used as a method for essentially initializing the AI to something reasonable before training it further with reinforcement learning (ie, training the AI using a reward rather than example). Imitation learning can arguably be more valuable for game development given that it can more directly instill various human-like behaviors (hopefully making the gameplay more engaging and interesting) rather than simply trying to maximize some reward or score in the game. Techniques to Train and Understand AI Agents This is more of a general research direction that I find interesting (and is similar to research that I have done in the past). Essentially, this is exploring the means by which the game developer can use the various methods of instilling behavior into an AI (programming, reinforcement learning, imitation learning) to create the desired behavior (and game experience). This is a bit of both a human-computer interaction (HCI) and machine learning question (also related to machine teaching). To give a more concrete example, this would include exploring the behavior of a trained RL agent in the game, correcting these behaviors, and perhaps trying to detect potentially incorrect behaviors to raise to the user automatically. 0 AD is well suited for this type of research for the same reasons that it is well suited for exploring game balance - most games used in research are either proprietary or not something people would actually play. Optimizing Existing Game Parameters (Relatively Easy) There are some existing machine learning tricks that could be used to make other sorts of improvements to the game rather than explore research questions. A while back, I was playing around with CMAES (a machine learning technique to optimize a set of parameters given a "fitness function") to improve some of the sort of magic numbers used within Petra such as "popPhase2" and "armyMergeSize". Essentially, this made it possible to find values for these parameters which would improve the AI's ability to win when playing against the standard Petra agent (on the hardest difficulty). Although I don't find this as interesting as the other areas, it is a useful tool that could be nice to apply to other aspects of the game. Overall, I think it would be really exciting to be able to explore some of the research questions in 0 AD as I think it could be beneficial both to researchers but also would make it easier to incorporate the results of this research into 0 AD (making it an even better game!). Of course, this is only true if the functionality required to be added to 0 AD is easy to maintain and doesn't add overhead taking away from the development of the core game features and functionality. I am also hopeful that incorporating some of these machine learning capabilities could also be beneficial to the community and raise awareness of 0 AD! As far as technical requirements, I made an RPC interface for controlling the AI from Python (because the majority of machine learning tools are in Python). This makes it possible to explore 1, 2, and 3 as well as provides necessary functionality for 4, 5, and 6. As mentioned above, I have an example of #1 on GitHub and I think this could make for really interesting undergraduate projects (as well as potentially interesting integrations into the game). However, I think 0 AD is a particularly unique opportunity for exploration of 4 and 6. Game balancing (#4) still requires the ability to programmatically edit the unit parameters which I have explored a little bit but haven't added to the game. If this is something that others find interesting (and wouldn't mind me asking a few questions ), I would be open to adding this as well. Anyway, I find these machine learning problems and applications quite exciting both for 0 AD and for AI/ML research but I want to know what the rest of the community thinks! Let me know what you think or if you have any questions/comments! * I say *OpenAI gym-like* because a gym environment requires an observation space (numerical representation of the world for the AI), action space (numerical representation of the actions the AI can perform), and reward function to be defined. It isn't clear what the most appropriate choices for these would be (and they could vary based on the specific scenario) so I would prefer making more of a "meta-gym" where it is basically an OpenAI gym that needs the user to specify these values.
  10. I realize I am pretty late to comment on this but if anyone is still interested in this, this revision adds support for using 0 AD like an OpenAI gym environment. There are a couple simple examples using this API at https://github.com/brollb/simple-0ad-example where the agent learns how to kite cavalry archers when fighting infantry!
  11. Especially when using machine learning in 0 AD, it is really valuable to be able to record game states as this enables things like off-policy reinforcement learning, imitation learning, and facilitates quantitative analysis of gameplay. That said, it would be great if we could "expand" a 0 AD replay into a log of state, action pairs which could then be used for the beforementioned tasks. I am imagining a command line flag for logging state, action pairs when running an existing replay. Currently, I have implemented a primitive approach to this [here](https://github.com/brollb/0ad/pull/1) which simply prints states to stdout. Although this is not the ideal interface, it has enabled me to wrap the game with a script to reconstruct the states with the actions from the commands.txt file. If anyone is interested in trying it out, there is also a docker image for the branch on GitHub which I have used to "extract" the rich game states from a replay. I am interested in cleaning it up and adding this capability back to 0 AD but thought I would start here to get feedback and input from the community!
  12. In case anyone is following this thread, there is currently a ticket opened (and patch submitted with the update!): https://code.wildfiregames.com/D2197 and https://trac.wildfiregames.com/ticket/5565
  13. Thanks for the quick replies! @stanislas69 I have been digging into it a little bit more and it seems that the main call that fails is `getReplayMetadata` (which generates the metadata.json file saved for the replays). It seems that without the metadata.json, the replay doesn't show up in the replay list so I figured it was a file that I needed. Currently, that file will not be generated if it is running in nonvisual mode and overriding this manually results in an error since it seems the `getReplayMetadata` command can only be called from the script interface used by g_GUI (not available if running in nonvisual). Does that make sense? I actually am a fan of the way the AI is treated during replays :). Although it is different from the humans, expecting the AI to be deterministic (and controlling the random seed) is cool since we could hypothetically resume a replay at any point (if it is player vs bot) and the bot would adapt accordingly. If it was simply replaying fixed actions (and not recomputing the AI's behavior), this wouldn't be possible. This feature isn't available now (afaik) but would be cool when exploring things like imitation learning. Awesome to hear that you are open to contributions! I am interested in using 0 AD as an RL sandbox but would like to keep any fork as small as possible since maintaining forks is no fun. If possible, it would be really great if all RL experimentation related features could be added to 0 AD itself (and toggled with cli options to not change any of the current behavior, of course) . @bb_ and @stanislas69 I have actually been working on my own fork where I am exposing a GRPC interface to the game for controlling the player (actions are then logged to the replay - at least when there is a GUI). It is still a work in progress; I can currently control units and issue player commands but still need to add support for some other features like loading a scenario (maybe defined dynamically), saving replays when nonvisual, and smooth GUI movements when watching an RPC-controlled game.
  14. I am currently looking at experimenting with different AIs and would like to be able to run different experiments headlessly then be able to watch/demo selected replays. I have noticed that 0 AD currently does not support saving replays when nonvisual and it seems largely due to a reliance on GetSimulationState (from the GuiInterface). I haven't yet dug deeply enough to be able to see how difficult it would be to refactor the code so GetSimulationState does not relay on the GUI (I am a bit concerned about some of the calls to `QueryInterface` which I haven't yet dug into). Does this seem feasible? If so, I would appreciate any pointers or suggestions. Also, if I am able to make this change, would 0 AD be willing to uptake this or would it only live in my fork?
  15. @jonbaer Nice collection of RL work related to RTS games! Did you get a chance to start applying any of these approaches to 0 AD? I have seen that there have been a few posts about using python to implement an AI as well as discussion about OpenAI Gym integration. I think integration with gym would be pretty cool. @Feldfeld Good point. I think I will add an edit to the main post to ask for any replays people would like to donate and, if they happen to have any replays on small mainland (human vs petra), it's an added bonus but not by any means required. Hopefully this way we can encourage a specific standard but not by any means be restricted to replays of that format!
  16. How does everyone feel about changing the scenario to a 1v1 (human vs petra) on a small mainland map? Any objections?
  17. @elexis - Thanks for the link! I was checking those out earlier and they seem like a useful resource (I counted 46 posted there + 1 with no metadata.json file). I thought I would go ahead and make this post anyway so we could: make sure we had a dataset where we knew people were fine with it being used for research, publication, general merriment, etc try to collect a dataset with a standard configuration. It becomes a bit harder problem if we want to generalize across a bunch of different scenarios (particularly the team based scenarios - multi-agent reinforcement learning gets a bit tricky and would require a lot of examples). If we have a dataset with a large amount of data using the same configuration, then hopefully people could tackle a simplified version of the problem before tackling generalization to more complicated scenarios. Another cool thing about facing the built in AI is that it appears that there could be some cool capabilities that could be leveraged while training. After digging into the replay files a bit, I realized that only human players' actions are recorded whereas the AI is just initialized with the same random seed then behaves deterministically. This is pretty cool since it should be possible to do things like run the replay to a specific point and then replace the human player with an experimental AI and see how it would perform (since the opposing AI could still react to it as it would if it was playing the experimental AI the whole time). That being said, any and all data people would like to donate would be great and I think could be really useful for exploring AI/ML agents in 0 AD!
  18. That makes sense. What size map would be best for this 1v1 scenario? Medium or small? How do people feel about controlling the seed to the PRNG so we get the same map? My concern is with the number of replays required before being able to learn something meaningful. Although it would be great to generalize, this would likely increase the data requirements a bit whereas it might be an easier starting point with a fixed map. However, the downside is that this might be annoying and could limit the number of replays... Thoughts?
  19. Yeah, we could make it a more popular map since we haven't collected any replays yet - we certainly won't want to change the map after replays have been posted. However, I am concerned about mainland since it appears to include some procedural generation (making the learning problem harder as it would need to generalize to the different variations produced by the procedural generation). It would be nice to start with a fixed (skirmish) map to make the learning problem easier. Is there a more popular (2 player) skirmish map that might be a better fit?
  20. I think it would be really cool if we could start creating a public dataset of replays for a standard scenario (especially in light of recent things like AlphaStar). Although I am most interested in creating the dataset for training ML agents (such as imitation learning - maybe to initialize a reinforcement learned agent), it could also be cool for comparing player strategies using a common baseline, or for potentially find shortcomings with the current AI. I think the dataset should be public domain to keep it open for anyone to use for fun, publication, etc. One natural scenario to start with would be the Acropolis Bay 2 map where both the human player and the AI are Spartans. The AI difficulty can be set to whatever people want but stronger AI is more impressive, of course If you have a replay you would like to donate to the dataset (using the specific scenario mentioned above), feel free to post it here! By donating the replay, you are giving consent for the data to be made public domain and, therefore, able to be freely used for research, fun, and the like! Update: Rather than Acropolis Bay, the Mainland map is preferred (using the small map size) using the same civilization and players listed above. However, any and all replays are welcome and appreciated!
×
×
  • Create New...