mod LocalRatings mod - evaluate players' skills based on previous games

sarcoma · February 7, 2025

LR error when opening leaderboard?
functions utility 88 44

Mentula · February 8, 2025

On 07/02/2025 at 7:58 AM, sarcoma said:

LR error when opening leaderboard?
functions utility 88 44

Thanks for reporting. The error can be ignored, it does not impact any game functionality. There is a fix for it.

**Stan`** · February 8, 2025

I'm still missing the last chart btw, but i don't know how to implement it. PRs to do that and to implement the latest local ratings features are welcome.

ffm2 · June 8, 2025

I modified the mod so it calculates a glicko-2 rating. Glicko-2 should get a bit faster to a correct rating than elo (which I tried before). The old local rating is preserved and in the graphs still shown. I only display the glicko-2 rating in the lobby and game setup. Glicko-2 starts at 1500. The other number next to it is volatility. Volatility starts at 200 and goes lower to find the accurate rating and stabilize there. Games are only evaluated if the victory state of all players are won or defeated.

The result is not satisfying yet for me. I have played/spectated a lot of games this alpha, more than one could demand of another mod-user. I release it anyway for testing or if anyone else want to play around with it.

It could be that this works better if:

Team pairings are consequently made by this glicko-2 rating. This way if underrated players gets teamed up occasionally they could gain rating. This feedback loop could benefit the rating. On the other hand if e.g. one player is very strong and known for it, he might get paired with many very bad players and lose rating.
Maybe it should not be done locally as then the rating pool gets bigger and better connected. Atm. good players only play with good players and the rating of them doesn't get elevated as high.

Atm. I'm a bit disappointed and may not follow up on this but wanted to drop it here. But I mean, if you doubt the team game pairing this could be used as a guide and if it is wrong rating got distributed so it should work better next time. But before one trusts the algorithm I wouldn't put in the work to include the graphs.

localratings_glicko2.zip

Seleucids · June 8, 2025

How do you account for RD increasing over time? do you factor in the replay date?

ffm2 · June 8, 2025

there's a template for the decay, but not used actually atm. LR gets the time by the replay folder. But downloaded replays from replay-palas don't have the date in their name like the ones from the game. Like with "2025-05-21_0001", another shared replay would have another game 0001 on that day.

ffm2 · June 8, 2025

LR checks metadata.json, which doesn't store the date. commands.txt does, but would be another step costing time. I appended to my replays also a string so I can mix replays of different pcs but I can change that to cater to LR too.

Mentula · February 22

Hi folks!

LocalRatings was updated to version 0.28.1, compatible with 0 A.D. 28.

Before making the mod available via the official downloader (Settings > Mod Selection > Download Mods) I would wait a few more days to have it tested, as some code re-work was needed to port the mod.

Since I am not active in the lobby (sigh!) some of you may want to help testing it on R28; I appreciate any bug report. To test the mod, you can download the zip file and extract it into the mod folder (see here to locate the mod folder on your system); then activate the mod via Settings > Mod Selection.

dinuruian · February 22

up.

dinuruian · February 22

goes well, that means ”up”. tried 1v1, tg. goes well.

Frederick_1 · March 8

On 22/02/2026 at 10:43 AM, Mentula said:

Hi folks!

LocalRatings was updated to version 0.28.1, compatible with 0 A.D. 28.

Before making the mod available via the official downloader (Settings > Mod Selection > Download Mods) I would wait a few more days to have it tested, as some code re-work was needed to port the mod.

Just downloaded and activated. Have speced some matches in 0.28 and it works at least at first glance. Hovihovi getting +125.6 points for one game getting ranked best of 78 players.

Mentula · March 21

LocalRatings 0.28.1 is now available for installation via the game menu:

Settings > Mod Selection > Download Mods

Enjoy!

AlexHerbert · March 21

On 4/6/2022 at 3:46 PM, Yekaterina said:

@Mentula could you explain to me exactly how the ratings are calculated from the weights? I would like a mathematical understanding of your algorithm as it currently spits out some rather unexpected results.

Setting everything to 0 except the number of units killed, which is set to 1: (the top killer)

It would seem that azeem1121 is the top killer, however, he is nowhere near as effective as vinme or Palin in game. In the matche I played with him, he had a very high kill death ratio because he surprise rushed a few inexperienced players in nomad mode. The total number of kills was less than 100 although he lost very few. The game ended in a crash instead of a proper finish so I think there is something to be fixed here. If you can explain to me how your algorithm works, perhaps I can propose a better mathematical model.

The top resource gatherers:

This is much closer to reality based on what I have seen from these players.

Yep... I found is not accurate.

Also I want to propose a name for this mod
Maybe a good name could be "Your past condemns you" :loool:

Mentula · March 21

5 minutes ago, AlexHerbert said:

Yep... I found is not accurate.

and also

On 19/12/2024 at 12:06 PM, ffm2 said:

currently it's not accurate.

Accuracy depends on the parameter one intends to measure.

I assume the comments above refer to "chances of victory" as the parameter that LR is supposed to measure. And I agree, LR is not accurate for that parameter, for the reasons below.

The optimal predictor for victory is, trivially, victory/defeat ratio. Take this number, and you will have the highest possible accuracy.

But LR does not even include the outcome of the game as a parameter, so it cannot accurately measure how good a player is at winning. Instead, the parameters that LR can measure (and therefore: the parameters LR can be accurate for) are the weights (see "Score Weights" in the LocalRatings Options menu) and any combination of those weights.

26 minutes ago, AlexHerbert said:

Maybe a good name could be "Your past condemns you"

or "Fortunately I can only see this"

ffm2 · March 22

The problem is that a lot game relevant "skills" is not covered in the scores though.

I think a score for "enemy presence", a bit like the exploration score would help a bit. This is more directed to the main game though.

A player should get a score from enemy units near him based on time and distance. E.g. player 1 has 3 units, player 2 has 15 units. Player 1 can just run around and make the 15 units chase him. Player 1 should then receive more "enemy presence" points.

Currently all would count as idle, not making eco or military score. This would also favor someone who gets 2v1 attacked.

LR doesn't make the scores though, only weights whats there.

ffm2 · May 26

LocalRatings Team Balancer

A mod that extends LocalRatings by Mentula with two new buttons on the game-setup screen. One balances teams automatically by rating, the other evaluates the current setup and adds two outcome-based rating systems, Glicko-2 and OpenSkill, alongside the original LocalRatings score.

Why outcome-based ratings?

The LocalRatings ranking list already shows the problem: even with plenty of replay data, some known-strong players end up rated poorly and some known-weak ones look stronger than they are. Raw in-game statistics don't track skill cleanly. In a 4v4, if three players focus-harass one strong opponent, that player's stats look weak, though they're being targeted because they're strong.

A useful comparison: modern chess engines evaluate positions with neural networks far beyond human understanding, yet FIDE still uses Elo, chess.com uses Glicko-1, and lichess uses Glicko-2. Even with perfect game analysis available, the established outcome-based systems remain the standard for tracking player skill over time. Win/loss results across many games carry a signal that in-game metrics miss. 0 A.D. has no chess-engine equivalent, so the case for outcome-based rating is even stronger here.

Rating systems

Three systems are available, selectable in the LocalRatings settings:

Local Ratings (original) - rates players by in-game statistics relative to others in the same match. Works for all game types but inherits the limitations above.
Glicko-2 - tracks each player with a rating, a rating deviation (confidence), and a volatility. New players start at 1500 ± 350. The conservative rating used for balancing is rating − 2·deviation, so a fresh player is treated as significantly weaker than someone who has won even one game.
OpenSkill - an open-source Bayesian system based on the Bradley-Terry full model. Each player has a mean skill mu and an uncertainty sigma. New players start at mu 1500, sigma 500. The conservative rating is mu - 3·sigma.

Glicko-2 and OpenSkill only count locked-teams games with exactly two teams; free-for-all and unlocked-team games are excluded. All other LocalRatings filters (minimum duration, population cap, cheat games, etc.) still apply and can be configured in the LocalRatings settings. Win/loss counts are tracked separately and shown alongside ratings.

Balance button

When the host presses balance, the mod reads all currently assigned players, looks up their ratings, and finds the partition into two teams that minimizes the difference in conservative rating sums. It then reorders the player slots. The result is posted in chat with rating sums, the rating difference, and a predicted win probability. Note the win probability it calculated with the raw rating and the pairing done with the conservative rating.

Non-host players with the mod see a suggest button instead. Pressing it posts the same proposal to chat without changing any settings, so the host can decide whether to apply it.
Both buttons have spam protection: pressing them again with the same player constellation does nothing. Slot shuffles or team swaps don't count as a new constellation.

Evaluate button

Reports on the current team assignment without changing anything: rating sum and conservative sum per team, the rating difference, which team is stronger, predicted win probability, and a full player ranking by rating with win/loss counts. Observers in the lobby appear in the ranking as well.

End-of-game rating updates

When a game finishes, the rating database updates on every mod user's machine. To avoid multiple mod users posting the same numbers to chat, only one client announces the changes; the others update silently.

There's a known chat-line spacing bug that pushes this announcement far below the regular chat area. It can be fixed with this patch

Including older replays

The engine only exposes replays from the currently installed version, so by default ratings are built from 0.28.0 replays only. To include 0.27.1 replays, copy them into the 0.28.0 folder. They count toward all three rating systems.

Linux: copy or move everything from ~/.local/share/0ad/replays/0.27.1/ into ~/.local/share/0ad/replays/0.28.0/
Windows: copy everything from %APPDATA%\0ad\replays\0.27.1\ into %APPDATA%\0ad\replays\0.28.0\
macOS: copy everything from ~/Library/Application Support/0ad/replays/0.27.1/ into ~/Library/Application Support/0ad/replays/0.28.0/

After copying, open the LocalRatings page and press Rebuild list to re-process all replays in date order. This step is needed even when rebuild isn't normally required, because the imported replays are older than your existing ones and the ratings have to be recalculated from the beginning. The imported replays won't be playable as visual replays in 0.28.0, but their metadata is read correctly for rating purposes.

Limitations

Team rating is inherently harder than 1v1. Individual contribution isn't fully separable from team performance, and the rating systems see each game as a single team win or loss. This is a fundamental constraint of every team rating system, not something specific to this mod.
Cold-start: ratings come from the host's local replays only. Since the database is built from whatever the host has played or observed, players who appear in the lobby with no recorded games start unrated, and their conservative rating sits well below average until they accumulate results. A genuinely strong player with only a few recorded losses will likewise look weaker than they are. The automatic team pairing works as a correcting instance: A strong player with low rating will be paired with players with a high rating, resulting in a very strong team and a high likely win. This strong player will then accumulate wins. These unbalanced games might be frustrating.

Three things you can do meanwhile:

Accept the balance and play. The fastest fix is more games with the automatic pairing.
Adjust manually. Use /rate username 1600 to set a player's rating before the match starts. Useful when a rating is obviously off.
Seed the database. Download games with known outcomes from replay-pallas and drop them in your replays folder, then rebuild. Pick a balanced selection - similar wins and losses for each player - so you don't accidentally bias the rating in either direction. A player can help to fix their rating by uploading a few decisive games to replay-palas for the hosts to download.

Non-decisive games. Most 0 A.D. team games don't end with the entire losing side eliminated, and Glicko-2 / OpenSkill can only update on games where a winner can be determined. The LocalRatings Team Balancer ships with an Auto-Classifier that infers a winner from in-game data (surviving populations, final scores and defeated-count difference) for games that ended without a clean engine verdict. The quality of your ratings therefore depends partly on how well the Auto-Classifier performs on your replay set. Per-game rating changes under your current settings are visible in the replay section.

Locked teams. The most sensible option usually would be only evaluate games with locked teams. The problem is when one switches the map e.g. from mainland to balanced-mainland the settings defaults to unlocked teams. In most cases this is harmless as the players just don't change their diplomacy states. Therefore the option Rate unlocked-team games exists.

LocalRatings_Team_Balancer.zip

Atrik · May 26

Pretty important info you forgot to mention is how to config.

For example instead of local rating weight i wanted to use openSkill so i had to change this in config:

localratings.general.ratingsystem = "openSkill"

ffm2 · May 26

Yeah, that can be changed from a dropdown menu in the options. Theres a lot. If your curious of the rating of a certain player, you can trace indiviual games and the progression of this player in the replay by filtering his name and check out the bottom right.

jwrona · May 27

ffm bot has been super refreshing as of late. Will download and try it out! Thanks!

AlexHerbert · June 15

On 26/5/2026 at 12:37 PM, ffm2 said:

LocalRatings Team Balancer

A mod that extends LocalRatings by Mentula with two new buttons on the game-setup screen. One balances teams automatically by rating, the other evaluates the current setup and adds two outcome-based rating systems, Glicko-2 and OpenSkill, alongside the original LocalRatings score.

Why outcome-based ratings?

The LocalRatings ranking list already shows the problem: even with plenty of replay data, some known-strong players end up rated poorly and some known-weak ones look stronger than they are. Raw in-game statistics don't track skill cleanly. In a 4v4, if three players focus-harass one strong opponent, that player's stats look weak, though they're being targeted because they're strong.

A useful comparison: modern chess engines evaluate positions with neural networks far beyond human understanding, yet FIDE still uses Elo, chess.com uses Glicko-1, and lichess uses Glicko-2. Even with perfect game analysis available, the established outcome-based systems remain the standard for tracking player skill over time. Win/loss results across many games carry a signal that in-game metrics miss. 0 A.D. has no chess-engine equivalent, so the case for outcome-based rating is even stronger here.

Rating systems

Three systems are available, selectable in the LocalRatings settings:

Local Ratings (original) - rates players by in-game statistics relative to others in the same match. Works for all game types but inherits the limitations above.
Glicko-2 - tracks each player with a rating, a rating deviation (confidence), and a volatility. New players start at 1500 ± 350. The conservative rating used for balancing is rating − 2·deviation, so a fresh player is treated as significantly weaker than someone who has won even one game.
OpenSkill - an open-source Bayesian system based on the Bradley-Terry full model. Each player has a mean skill mu and an uncertainty sigma. New players start at mu 1500, sigma 500. The conservative rating is mu - 3·sigma.

Glicko-2 and OpenSkill only count locked-teams games with exactly two teams; free-for-all and unlocked-team games are excluded. All other LocalRatings filters (minimum duration, population cap, cheat games, etc.) still apply and can be configured in the LocalRatings settings. Win/loss counts are tracked separately and shown alongside ratings.

Balance button

When the host presses balance, the mod reads all currently assigned players, looks up their ratings, and finds the partition into two teams that minimizes the difference in conservative rating sums. It then reorders the player slots. The result is posted in chat with rating sums, the rating difference, and a predicted win probability. Note the win probability it calculated with the raw rating and the pairing done with the conservative rating.

Non-host players with the mod see a suggest button instead. Pressing it posts the same proposal to chat without changing any settings, so the host can decide whether to apply it.
Both buttons have spam protection: pressing them again with the same player constellation does nothing. Slot shuffles or team swaps don't count as a new constellation.

Evaluate button

Reports on the current team assignment without changing anything: rating sum and conservative sum per team, the rating difference, which team is stronger, predicted win probability, and a full player ranking by rating with win/loss counts. Observers in the lobby appear in the ranking as well.

End-of-game rating updates

When a game finishes, the rating database updates on every mod user's machine. To avoid multiple mod users posting the same numbers to chat, only one client announces the changes; the others update silently.

There's a known chat-line spacing bug that pushes this announcement far below the regular chat area. It can be fixed with this patch

Including older replays

The engine only exposes replays from the currently installed version, so by default ratings are built from 0.28.0 replays only. To include 0.27.1 replays, copy them into the 0.28.0 folder. They count toward all three rating systems.

Linux: copy or move everything from ~/.local/share/0ad/replays/0.27.1/ into ~/.local/share/0ad/replays/0.28.0/
Windows: copy everything from %APPDATA%\0ad\replays\0.27.1\ into %APPDATA%\0ad\replays\0.28.0\
macOS: copy everything from ~/Library/Application Support/0ad/replays/0.27.1/ into ~/Library/Application Support/0ad/replays/0.28.0/

After copying, open the LocalRatings page and press Rebuild list to re-process all replays in date order. This step is needed even when rebuild isn't normally required, because the imported replays are older than your existing ones and the ratings have to be recalculated from the beginning. The imported replays won't be playable as visual replays in 0.28.0, but their metadata is read correctly for rating purposes.

Limitations

Team rating is inherently harder than 1v1. Individual contribution isn't fully separable from team performance, and the rating systems see each game as a single team win or loss. This is a fundamental constraint of every team rating system, not something specific to this mod.
Cold-start: ratings come from the host's local replays only. Since the database is built from whatever the host has played or observed, players who appear in the lobby with no recorded games start unrated, and their conservative rating sits well below average until they accumulate results. A genuinely strong player with only a few recorded losses will likewise look weaker than they are. The automatic team pairing works as a correcting instance: A strong player with low rating will be paired with players with a high rating, resulting in a very strong team and a high likely win. This strong player will then accumulate wins. These unbalanced games might be frustrating.

Three things you can do meanwhile:

Accept the balance and play. The fastest fix is more games with the automatic pairing.

Adjust manually. Use /rate username 1600 to set a player's rating before the match starts. Useful when a rating is obviously off.

Seed the database. Download games with known outcomes from replay-pallas and drop them in your replays folder, then rebuild. Pick a balanced selection - similar wins and losses for each player - so you don't accidentally bias the rating in either direction. A player can help to fix their rating by uploading a few decisive games to replay-palas for the hosts to download.

Non-decisive games. Most 0 A.D. team games don't end with the entire losing side eliminated, and Glicko-2 / OpenSkill can only update on games where a winner can be determined. The LocalRatings Team Balancer ships with an Auto-Classifier that infers a winner from in-game data (surviving populations, final scores and defeated-count difference) for games that ended without a clean engine verdict. The quality of your ratings therefore depends partly on how well the Auto-Classifier performs on your replay set. Per-game rating changes under your current settings are visible in the replay section.

Locked teams. The most sensible option usually would be only evaluate games with locked teams. The problem is when one switches the map e.g. from mainland to balanced-mainland the settings defaults to unlocked teams. In most cases this is harmless as the players just don't change their diplomacy states. Therefore the option Rate unlocked-team games exists.

LocalRatings_Team_Balancer.zip 130.89 kB · 22 downloads

This is sehr gut

Edited June 15 by AlexHerbert

mod LocalRatings mod - evaluate players' skills based on previous games

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Mentula

Mentula

Mentula

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in