Jump to content

LocalRatings mod - evaluate players' skills based on previous games


 Share

Recommended Posts

Here is an idea actually:

score is a weird topic as shown in a couple of discussions. Sometimes a large eco score can boost ones score even if they fight horribly. I have mentioned the possibility of a new score metric in other discussions including suggestions for A27:

On 17/04/2022 at 11:00 AM, real_tabasco_sauce said:

This was in another discussion, but it should go here too:

Economy score rework:

Economy score = resources gathered resources spent

 

separate statistic in summary screen:

Value ratio = military score / economy score

(shows player skill, if some units are super OP like merc cav, player unit composition, overall effectiveness)

 

Also, the latter value would show how impactful a rush is in the early game with the same weight (since an early game ratio and a late game ratio are still each ratios)

Thoughts on this change?

I wonder if using "value ratio" would be suitable for this mod? This could also give a more appropriate score to rushing players, since I see Aslan. and H. Herle are poorly rated in my lists. What do you think @Mentula?

In practice, I guess this would look like adding resources spent to the available weights customization window, although the ratio would also be nice.

Edited by real_tabasco_sauce
  • Like 1
Link to comment
Share on other sites

1 hour ago, real_tabasco_sauce said:

However: If I host a TG and balance based on how well people play versus me, am I likely to get a balanced game? I guess this comes down to the certainty of the score (how many matches I have played)

so the recipe for a balanced TG is to balance my (the host's) local ratings on both sides, correct?

Short answer: I would personally balance a TG by balancing the total ratings of the two teams.

However, let me be prudent in giving a definitive answer. One big fact to take into account is that the rating heavily depends the weights you choose. Different weights can give rise to very different ratings. The weights are supposed to change the "meaning" you give to the rating. But once you have decided upon the weights to choose, it's true what you say: the more games you played, the more reliable ratings are.

56 minutes ago, real_tabasco_sauce said:

I wonder if using "value ratio" would be suitable for this mod? This could also give a more appropriate score to rushing players, since I see Aslan. and H. Herle are poorly rated in my lists. What do you think @Mentula?

In practice, I guess this would look like adding resources spent to the available weights customization window, although the ratio would also be nice.

Yes, the amount of spent resources could be a weight to add, thanks for suggesting. Regarding the ratio, ratios can't be used as weights for the following reason: the rating of a player is determined by comparing the player's parameters with the average game parameters, so at some point a division occurs in the calculation. Ratios can sometimes be close to 0 (or to infinity), and we all (well.. many of us) know what happens if a number close to 0 (or to infinity) is at the denominator. Actually, during the early stage of the mod development I have considered including ratios (like k/d ratio, resources sold/bought, tributes sent/received, ...) and results were odd, to say the least.

51 minutes ago, real_tabasco_sauce said:

It also looks like the default weight for map exploration is 10 (version 0.26.5). I bet this is a typo, right?

The number 10 is the correct one. The "Exploration Score" (the same one that you can see in the Summary at the end of a game), is obtained by multiplying the percentage of explored map by 10.

  • Like 2
Link to comment
Share on other sites

2 hours ago, Mentula said:

Yes, the amount of spent resources could be a weight to add, thanks for suggesting. Regarding the ratio, ratios can't be used as weights for the following reason: the rating of a player is determined by comparing the player's parameters with the average game parameters, so at some point a division occurs in the calculation. Ratios can sometimes be close to 0 (or to infinity), and we all (well.. many of us) know what happens if a number close to 0 (or to infinity) is at the denominator. Actually, during the early stage of the mod development I have considered including ratios (like k/d ratio, resources sold/bought, tributes sent/received, ...) and results were odd, to say the least.

K/D ratio would be problematic, but fortunately value ratio is divided by res spent, so the denominator will always be greater than the numerator, except for scenarios where you start with units. I see what you mean about weights tho.

 

2 hours ago, Mentula said:

The number 10 is the correct one. The "Exploration Score" (the same one that you can see in the Summary at the end of a game), is obtained by multiplying the percentage of explored map by 10.

I see. I thought you meant 0 as default. IMO, exploration is a skill that gives you (in theory) the upper hand in a fight, so there should be no need to score exploration in addition to units value killed, etc. When I set it to 0 I find a much more accurate list.

  • Like 1
Link to comment
Share on other sites

21 hours ago, real_tabasco_sauce said:

 

However: If I host a TG and balance based on how well people play versus me, am I likely to get a balanced game?

I doubt it. Local rating is not only skill-dependent, but also win-rate-dependent and rushers tend to have lower ratings while boomers have higher.

  • Like 1
Link to comment
Share on other sites

6 hours ago, Player of 0AD said:

I doubt it. Local rating is not only skill-dependent, but also win-rate-dependent and rushers tend to have lower ratings while boomers have higher.

The mod accounts for scores averaged across gametime, so rushes have more of an impact than they do just by looking at end of game scores, but its not great.

Acero and I discussed how to account for rushes. thoughts on this?

 

effectiveness = military score*(game pop cap/ avg game pop)/ resources spent

[ex. rush at ~3 mins, total game pop is 160/1600 = 20 percent -> military score of involved players (rusher and rush defender) receive 80% boost]

I could honestly envision this to replace the current score breakdown:

eco score: res spent (or gathered+trade)       military score          effectiveness (as above)

 

alternatively, you could tie the boost directly to military score, but I like it as above.

I imagine from these three stats, you could compute a pretty equitable player score, even using @Mentula's mod!

^per @alre's suggestion, effectiveness could also be called "population-weighted military/economy ratio", "normalized military/economy ratio" or "normalized value ratio".

Edited by real_tabasco_sauce
Link to comment
Share on other sites

I love this :) Keep working on it. Would be great if Wilfiregames kept a global database of all games of all times. And that this could be a new start for a global rating system for 0ad. It will really need some disqussion and testing to find something that all will find quite fair. There are so many factors to take into concideration.

I love that I can see how many games I played with each player. This should be standard in game.

Another thing is that maybe there should be some weighting between TGs and 1v1s. Much the same but still very different. Some play mostly 1v1 some play mostly TGs.

Bonus points for winning by points or conquest.

Would be great to have one score for war skills and one rating for eco skills in the game in addition to total rating. Then one can sort by the three categories.

Another interesting question is what more relevant statistics can be found in the replay files? I guess there is a plethora of different things in addition to what is seen in the summary. 

 

Keep up the good work :D 

  • Like 2
Link to comment
Share on other sites

I think some features would be very helpful.

  •  
  • a time limiter option that uses replays from past month or year
  • finding a recommended/default parameter valuation that best represents skill in 0ad, if players discuss skill levels, it would be helpful to have a default system to measure against. Right now rushes are undervalued, so I would recommend adding time-value to the parameters rather than using the values at the end of the game. 

I appreciate the work done for the mod and I am talking to other players to get them to check it out. I think with some refinement and some accuracy improvement (for default/recommended values) we could see this become an in-game feature in a future alpha.

Edited by BreakfastBurrito_007
I realized that bullet 1 would require more communication between players or players to have each other's replays
  • Like 2
Link to comment
Share on other sites

Can we start sharing weights?

Spoiler

image.png.27f6eb54e87460549f4f4a5300231809.png

Whats the difference between some categories (value/umber)?

I had to filter players with 5+ games to avoid some randos but this is still not right.

Spoiler

image.png.d03b88cdd1323ca669b937692b925408.png

 

  • Like 2
Link to comment
Share on other sites

7 hours ago, sarcoma said:

Whats the difference between some categories (value/umber)?

@sarcoma let us consider a specific example: "Enemy units killed (value)" vs "Enemy units killed (number)".

If you set a weight of 1 to "Enemy units killed (value)", the contribution of this parameter to the rating corresponds to the total value of enemy unit killed, that is, the cost to produce them. For example, with a weight of 1, a player gets 50 points for killing an enemy citizen woman, 100 for a base infantry unit, 150 for a base cavalry unit and so on... On the other hand, if you set a weight of 1 to "Enemy units killed (number)", the system will assign one point for each enemy unit killed, disregarding the type of unit and the cost to produce it. For example, with a weight of 1, a player gets 1 point for killing and enemy citizen woman, 1 for a base infantry unit, 1 for a base cavalry unit...

To continue with this example, if you set a weight of 1 to both "Enemy units killed (value)" and "Enemy units killed (number), you will get 51 points for killing and enemy citizen woman, 101 for a base infantry unit, 151 for a base cavalry etc... So probably you might want to consider one of the two parameters only and set the other to 0, but this up to your rating design.

The mod's default weight for "Enemy units killed (value)" is 0.1 because in 0 A.D. each enemy unit killed increases a player's score by 10% of the unit's value. For the same reason, the default weight for "Enemy units killed (number)" is 0 because in 0 A.D. a player's score is not affected by the number of enemy units killed (but only by their value). The LocalRatings default weights are the ones that 0 A.D. uses to calculate the total score of a player.

  • Thanks 1
Link to comment
Share on other sites

@Mentula could you explain to me exactly how the ratings are calculated from the weights? I would like a mathematical understanding of your algorithm as it currently spits out some  rather unexpected results. 

Setting everything to 0 except the number of units killed, which is set to 1:  (the top killer)

image.thumb.png.9251a8aac683fb0832b58fbb367d6960.png

It would seem that azeem1121 is the top killer, however, he is nowhere near as effective as vinme or Palin in game. In the matche I played with him, he had a very high kill death ratio because he surprise rushed a few inexperienced players in nomad mode. The total number of kills was less than 100 although he lost very few. The game ended in a crash instead of a proper finish so I think there is something to be fixed here. If you can explain to me how your algorithm works, perhaps I can propose a better mathematical model. 

 

The top resource gatherers:

image.thumb.png.d1b1f8b3d15ef08932d31c14b8806a1d.png

This is much closer to reality based on what I have seen from these players. 

  • Like 1
Link to comment
Share on other sites

Sure @Sevda. First of all, I can't see from your pictures the number of matches you've played with the players in your list, but I can imagine that the number of matches is small for those players that you believe being far from your expectations. When the number of matches is small, statistics are unreliable; you need many games to get significant data values. Notice that the mod (v0.25.6, the last version at the moment I am writing) allows you to filter out players whose number of games is small (from the Options > Player Filters menu).

That being said, I'll do my best to explain the algorithm hereby; you can find more info at the repository page and, if you want to look at the part of the code that handles the rating computation, you can look at this file.

Spoiler

As an example, I assume we are computing the rating of a player whose username is Mentula.

  1. First, we scan all replays having Mentula as an active player and we ignore the others (this would be extremely inefficient from an algorithmic perspective; this is not how the mod works, but let's assume the mod works this way for simplicity).
  2. For each replay, we look at the statistics of Mentula at all instants of the replay (and not only at game's end). In fact, 0 A.D. stores data at given moments of the game, so you can imagine the timeline as a discrete timeline consisting of n instants t_1, ..., t_n.
  3. For each instant, we compute Mentula's score according to the weights you set: for example, with "Enemy units killed (number)" set to 1 and all other weights set to 0 ("top killer", as you say) we multiply the weight (1 in our case) by the number of units killed by Mentula at that instant of the game, so to get the score of Mentula at that instant. Thus, we obtained n scores s_1, ..., s_n, each relative to an instant t_1, ..., t_n.
  4. We then compute Mentula's average score over all instants. In formulas: (s_1 + ... + s_n) / n. This is Mentula's average score relative to the particular replay we are considering.
  5. We also compute the average score of the game, namely the sum of all the average scores of all players (as in step 4), divided by the number of players. This represents the average score of all players as if they were one single player.
  6. We compute the ratio (Mentula's score - average score) / average score. This is Mentula's rating relative to the particular replay we are considering. For example, if Mentula's rating turns out to be 0.1, then Mentula has an average score during the game 10% higher than the average score of all players combined.
  7. We produce the final rating of a player by computing the average rating over all replays (namely, we sum the ratings obtained at step 6 over all replays and we divide by the number of replays).

Sorry for the long answer, I hope it's clear enough.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

1 hour ago, Mentula said:

We produce the final rating of a player by computing the average rating over all replays (namely, we sum the ratings obtained at step 6 over all replays and we divide by the number of replays).

Thank you very much for the explanation, it makes total sense to me.

I think the problem might be alleviated if we take into consideration the average rating of the players present in each match. I am suggesting running the algorithm twice instead of just once; the first run generates a rough rating for everyone, which will contain anomalies like azeem. Then, we run a second pass, this time taking into account the average rating of the players participating in the game, and weight that game accordingly. The total score at the end of the second pass will be a weighted average instead of just an average. If a player participated a lot in OP TGs, even though they perform just 5% above average, their total will still be much higher than a player who dominates the newcomers a few times. High average player level -> more weight. 

Furthermore, instead of just comparing to the average in one game, we can change the rating +/- threshold depending on the players present: in a game surrounded by experts, even if you have done 10% below average, you still did a good job, being able to hurt those experts somehow. So we should give the player positive credit if they perform anything better than 10% below average. On the other hand, in a noob game, you must perform 150% better than them to show that you are not a noob like the others.

Finally, I propose we build a replay bank using a service like Onedrive or Google drive, where everyone dumps their replays into the repository. Then we can query the repository with Mentula's algorithm for players' ratings. Players like I delete replays often to free up disk space, which results in the loss of many records and good games. I believe a repository will also benefit @mysticjim's videos.

 

  • Like 2
Link to comment
Share on other sites

Would it be smart to have more weight on, lets say, the last 20 games, less weight on the 20 previous to that and games before that again very little weight?

To me that makes sence as people get better, but in different tempo. What matters is how you perform lately, right?

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Thanks @woodpecker and @sarcoma. The idea of filtering/weighting replays according to the date has been brought to attention multiple times (for example by @seeh and @BreakfastBurrito_007) and seems to be a must-have. I guarantee this will be implemented in a future version of the LocalRatings mod.

Besides a new match filter that takes into consideration the replay date, I am imagining a customizable "weighting function" that assigns a weight of 1 to the most recent replay and slowly decreases back in time, according to user-given parameters. Thinking out loud, such function could be linear, logarithmic, a step function or even we can let the user choose among multiple possibilities. If any of you has thoughts / references / expertise feel free to give a comment.

PS: thanks to all of you who are proposing new ideas, @real_tabasco_sauce, @Sevda and @rossenburg among others. I am happy to see that the LocalRatings mod has raised interest in many players. Although I am not responding to all comments, I am taking all suggestions and their feasibility into consideration.

  • Like 3
Link to comment
Share on other sites

  • 3 weeks later...
On 10/06/2022 at 8:11 PM, Sevda said:

 you can temporarily cut and paste the old replays into another folder

yeahh sure i know. this should really only be a temporary solution (remove and backup some folder parts).

Link to comment
Share on other sites

  • 2 weeks later...

Hi everyone,

I have updated the LocalRatings mod, including a new match filter: the date of validity. This means that the mod now allows to filter out games played before a certain date to the aim of computing the rating. See picture below.

This is probably the last update including new features before A26 is officially released. There have been many ideas on how to improve the mod and I wish to collect more feedback from users before committing to new changes.

Spoiler

recentDateUpdate.thumb.png.2fa77f35c106fc3d1084695f4dd3cb05.png

Download: as usual, you can download the new release (v0.25.7) of the mod from the zip file attached to this post or from the zip file attached to the first post of this thread or from the official page.

LocalRatings-v0.25.7.zip

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

Great. It would be even more nice if you could set a numer of matches used to calculate ratings individually for all players. Lets say i played 10 games with a player long time ago that i did not play with lately. Id still like to see his or her rating and that we actually played. Just setting a date excludes so much valuable info about players. Ofc it helps on your own rating lately but with mentioned limitations.

  • Like 1
Link to comment
Share on other sites

Posted (edited)

CHARTS!

Hello 0 A.D. friends,

I am happy to announce a new release (v0.25.8) of the LocalRatings mod with new amazing features!

Are we getting better over time? Is our archenemy becoming everyday stronger? Charts will tell! Explanatory picture below.

Charts.thumb.png.c0383bae114ba29bbd49198c72e6e091.png

And if you don't like the default chart colors you can always change them (see picture below).

Spoiler

ChartsColor.png.c94188cbf5fce0c163dc10566d8ea678.png

Further, two new Score Weights have been added: the amount of resources used and the amount of resources sold at the market.

Finally, Other minor issues have been addressed.

Download: you can download the new release (v0.25.8) of the mod from the zip file attached to this post or from the zip file attached to the first post of this thread or from the official page.

 

LocalRatings-v0.25.8.2.zip

Edited by Mentula
added file with .zip extension
  • Like 6
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...