Jump to content

LocalRatings mod - evaluate players' skills based on previous games


Mentula
 Share

Recommended Posts

This is pretty cool. FYI: After installing and starting it up, I had to click "Rebuild List" before I was able to get actual ratings. Once I clicked that, it worked fine.

 

If we're posting our own ratings for comparison, my rating here with this mod was 3.43. My lobby rating is 1286.

Edited by thephilosopher
Link to comment
Share on other sites

17 minutes ago, seeh said:

It would be great if you could limit the time. E.g. the last x months. Or the last 50 games. Then the games where you played really badly would no longer be included

The main problem with restricting the sample of replays to a limited amount is statistics. The more replays you have, the more accurate the rating of a player is, therefore it is disadvisable to rely on data extracted from a small (though recent) sample.

Link to comment
Share on other sites

@Mentula I would recommend adding an uncertainty label after the main rating, calculated based on how many games they have played with you. This will prevent players with just 1 fluked match getting ridiculously high ratings. 

For example, leGrosRobert is at the first place although he is arguably not the best player in your list. Furthermore, 2 of Yekaterina's smurf accounts are way better than you and weirdJokes, although I doubt whether she is actually that talented.

  • Like 1
  • Haha 1
  • Confused 1
Link to comment
Share on other sites

6 minutes ago, Sevda said:

@Mentula using your default settings, this is the leaderboard for me:

image.thumb.png.60124a9593857c56eb2b4a7eafa97939.png

And I find myself with no rating :(

image.thumb.png.163356f17b74386c2d4e26eb3bcd4ac2.png

 

Why is it that so many players have a score of 0?

Hum that is a bit strange, it could be a bug. It's actually the first time I see it and at the moment it's hard to find the cause. @Sevda I'll send you a private message to investigate (maybe tomorrow, now it's late in my timezone). Thanks for reporting!

Link to comment
Share on other sites

2 hours ago, Player of 0AD said:

If someone will play only 1v1 vs much weaker players, the player can get easily a very high "rating". So you can think that its rather like a win rate than like a rating. Just saying

I personally disagree with this interpretation of the rating. First of all, it's true that someone who mostly plays against weaker players gets a higher rating, no doubt. However, I disagree on the interpretation of the rating as a win rate (in any context: 1v1s, TGs, ...). The rating assigned by the LocalRatings mod (the default one, as well as any user-customized rating using different weights) is very distant from representing a win rate. A win rate of 20% means that the player wins 1 out of 5 games, which is not good; on the other hand, a rating (as in the LocalRatings mod) of 20% means that the player's graph in the Summary chart is 20% better than the average graph, which in other words means that the player performs very well (and therefore presumably has a high win rate).

In my experience with this mod, a player with a rating of 20% is a strong player; I wouldn't say the same of a player with a 20% win rate.

However, let me clarify one thing again: this mod is based on statistics. In statistics any of us can imagine a "limit case" (it could be, for example, a player who only plays with much weaker players). On the bright side... this mod is based on statistics! This means that, generally, a player who plays with different types of players will experience more reliability in the ratings data.

56 minutes ago, Sevda said:

Furthermore, you can farm ratings easily using virtual machines (although I haven't been able to work out the maths behind the rating system), so I am inclined to trust Mentula's ratings mod more than Vanilla Lobby's ratings evaluation. 

This comment actually gives me the possibility of clarifying one thing. Assigning a rating to a player can be something very arbitrary and will never make all of us agree. So, I tend to see the rating of a player as the player's performance over the average, or, if you prefer, their contribution to the game. But this is just my interpretation of it. The lobby's rating evaluation is a system only takes into account the win rate, whereas the LocalRatings mod takes into account scores. In this sense, they don't conflict, they just represent two different things.

  • Like 3
Link to comment
Share on other sites

1 hour ago, Mentula said:

Hum that is a bit strange, it could be a bug. It's actually the first time I see it and at the moment it's hard to find the cause. @Sevda I'll send you a private message to investigate (maybe tomorrow, now it's late in my timezone). Thanks for reporting!

Sevda's issue is the same issue I had at first. I solved it by clicking "Rebuild List," and then all the scores populated.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

1273841976_Bildschirmfotovon2022-05-1512-33-38.thumb.png.9c14dcb752e0e5f8ec6e1ff15d294524.png

My ratings, based on resources (creating a resource gives a point, sending resources to allies too, destroying gives a point, capturing resources which are buildings gives almost 2 points (not exactly 2 because you often cant keep the building), basic loot is considered (its hard to consider the loot which results in units carrying resources))

 

Link to comment
Share on other sites

Dear 0 A.D. friends,

I am happy to announce a new release of the LocalRatings mod! Most notably, the new release includes the following new features.

  1. The rating of a player and the number of games played appears next to the player's name in the Match Setup page. This will hopefully make balancing games an easier task. See image below.
  2. A new Player Filter has been added to the mod: now it is possible to filter out players depending on their rating or the number of games they played.
Spoiler

lobbyS.png.6fdf5a2e9fa475b24fbb79d301764ccb.png

Download and install: you can download the new release (v0.25.5) of the mod from the zip file attached to this post or from the zip file attached to the first post of this thread or from the official page.

I wish to thank all the forum users who gave feedback and suggestions. A special thanks goes to sanafur, who suggested the above (and other possibly future) features.

Have fun!

LocalRatings-v0.25.5.zip

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

Hi everyone,

with this new update (v0.25.6) of the LocalRatings mod, one can filter out matches with a given number of players, number of teams, or matches with uneven team composition. For example, one can exclude all 1v1 matches from the rating computation; or, conversely, one can consider 1v1 matches only! See image below.

Spoiler

teamcompositionfilter.thumb.png.0e60bb261d3c221529d910b79010e9ad.png

Thanks to sanafur who suggested the update!

Download: as usual, you can download the new release (v0.25.6) of the mod from the zip file attached to this post or from the zip file attached to the first post of this thread or from the official page.

LocalRatings-v0.25.6.zip

  • Like 1
Link to comment
Share on other sites

I like this a lot, it seems valuable for balancing.

However: If I host a TG and balance based on how well people play versus me, am I likely to get a balanced game? I guess this comes down to the certainty of the score (how many matches I have played)

so the recipe for a balanced TG is to balance my (the host's) local ratings on both sides, correct?

Also one problem: Since the Local Rating is only visible in match setup in the absence of 0ad's ratings, it is rare to see other's local ratings. This limits the potential for the mods use as a balancing tool.

^Nevermind, this is not the case. It was just a coincidence I have never played these two below me.image.thumb.png.67546be909bb1e49d7c923c119dc0656.png

Edited by real_tabasco_sauce
Link to comment
Share on other sites

Here is an idea actually:

score is a weird topic as shown in a couple of discussions. Sometimes a large eco score can boost ones score even if they fight horribly. I have mentioned the possibility of a new score metric in other discussions including suggestions for A27:

On 17/04/2022 at 11:00 AM, real_tabasco_sauce said:

This was in another discussion, but it should go here too:

Economy score rework:

Economy score = resources gathered resources spent

 

separate statistic in summary screen:

Value ratio = military score / economy score

(shows player skill, if some units are super OP like merc cav, player unit composition, overall effectiveness)

 

Also, the latter value would show how impactful a rush is in the early game with the same weight (since an early game ratio and a late game ratio are still each ratios)

Thoughts on this change?

I wonder if using "value ratio" would be suitable for this mod? This could also give a more appropriate score to rushing players, since I see Aslan. and H. Herle are poorly rated in my lists. What do you think @Mentula?

In practice, I guess this would look like adding resources spent to the available weights customization window, although the ratio would also be nice.

Edited by real_tabasco_sauce
  • Like 1
Link to comment
Share on other sites

1 hour ago, real_tabasco_sauce said:

However: If I host a TG and balance based on how well people play versus me, am I likely to get a balanced game? I guess this comes down to the certainty of the score (how many matches I have played)

so the recipe for a balanced TG is to balance my (the host's) local ratings on both sides, correct?

Short answer: I would personally balance a TG by balancing the total ratings of the two teams.

However, let me be prudent in giving a definitive answer. One big fact to take into account is that the rating heavily depends the weights you choose. Different weights can give rise to very different ratings. The weights are supposed to change the "meaning" you give to the rating. But once you have decided upon the weights to choose, it's true what you say: the more games you played, the more reliable ratings are.

56 minutes ago, real_tabasco_sauce said:

I wonder if using "value ratio" would be suitable for this mod? This could also give a more appropriate score to rushing players, since I see Aslan. and H. Herle are poorly rated in my lists. What do you think @Mentula?

In practice, I guess this would look like adding resources spent to the available weights customization window, although the ratio would also be nice.

Yes, the amount of spent resources could be a weight to add, thanks for suggesting. Regarding the ratio, ratios can't be used as weights for the following reason: the rating of a player is determined by comparing the player's parameters with the average game parameters, so at some point a division occurs in the calculation. Ratios can sometimes be close to 0 (or to infinity), and we all (well.. many of us) know what happens if a number close to 0 (or to infinity) is at the denominator. Actually, during the early stage of the mod development I have considered including ratios (like k/d ratio, resources sold/bought, tributes sent/received, ...) and results were odd, to say the least.

51 minutes ago, real_tabasco_sauce said:

It also looks like the default weight for map exploration is 10 (version 0.26.5). I bet this is a typo, right?

The number 10 is the correct one. The "Exploration Score" (the same one that you can see in the Summary at the end of a game), is obtained by multiplying the percentage of explored map by 10.

  • Like 2
Link to comment
Share on other sites

2 hours ago, Mentula said:

Yes, the amount of spent resources could be a weight to add, thanks for suggesting. Regarding the ratio, ratios can't be used as weights for the following reason: the rating of a player is determined by comparing the player's parameters with the average game parameters, so at some point a division occurs in the calculation. Ratios can sometimes be close to 0 (or to infinity), and we all (well.. many of us) know what happens if a number close to 0 (or to infinity) is at the denominator. Actually, during the early stage of the mod development I have considered including ratios (like k/d ratio, resources sold/bought, tributes sent/received, ...) and results were odd, to say the least.

K/D ratio would be problematic, but fortunately value ratio is divided by res spent, so the denominator will always be greater than the numerator, except for scenarios where you start with units. I see what you mean about weights tho.

 

2 hours ago, Mentula said:

The number 10 is the correct one. The "Exploration Score" (the same one that you can see in the Summary at the end of a game), is obtained by multiplying the percentage of explored map by 10.

I see. I thought you meant 0 as default. IMO, exploration is a skill that gives you (in theory) the upper hand in a fight, so there should be no need to score exploration in addition to units value killed, etc. When I set it to 0 I find a much more accurate list.

  • Like 1
Link to comment
Share on other sites

21 hours ago, real_tabasco_sauce said:

 

However: If I host a TG and balance based on how well people play versus me, am I likely to get a balanced game?

I doubt it. Local rating is not only skill-dependent, but also win-rate-dependent and rushers tend to have lower ratings while boomers have higher.

  • Like 1
Link to comment
Share on other sites

6 hours ago, Player of 0AD said:

I doubt it. Local rating is not only skill-dependent, but also win-rate-dependent and rushers tend to have lower ratings while boomers have higher.

The mod accounts for scores averaged across gametime, so rushes have more of an impact than they do just by looking at end of game scores, but its not great.

Acero and I discussed how to account for rushes. thoughts on this?

 

effectiveness = military score*(game pop cap/ avg game pop)/ resources spent

[ex. rush at ~3 mins, total game pop is 160/1600 = 20 percent -> military score of involved players (rusher and rush defender) receive 80% boost]

I could honestly envision this to replace the current score breakdown:

eco score: res spent (or gathered+trade)       military score          effectiveness (as above)

 

alternatively, you could tie the boost directly to military score, but I like it as above.

I imagine from these three stats, you could compute a pretty equitable player score, even using @Mentula's mod!

^per @alre's suggestion, effectiveness could also be called "population-weighted military/economy ratio", "normalized military/economy ratio" or "normalized value ratio".

Edited by real_tabasco_sauce
Link to comment
Share on other sites

I love this :) Keep working on it. Would be great if Wilfiregames kept a global database of all games of all times. And that this could be a new start for a global rating system for 0ad. It will really need some disqussion and testing to find something that all will find quite fair. There are so many factors to take into concideration.

I love that I can see how many games I played with each player. This should be standard in game.

Another thing is that maybe there should be some weighting between TGs and 1v1s. Much the same but still very different. Some play mostly 1v1 some play mostly TGs.

Bonus points for winning by points or conquest.

Would be great to have one score for war skills and one rating for eco skills in the game in addition to total rating. Then one can sort by the three categories.

Another interesting question is what more relevant statistics can be found in the replay files? I guess there is a plethora of different things in addition to what is seen in the summary. 

 

Keep up the good work :D 

  • Like 1
Link to comment
Share on other sites

I think some features would be very helpful.

  •  
  • a time limiter option that uses replays from past month or year
  • finding a recommended/default parameter valuation that best represents skill in 0ad, if players discuss skill levels, it would be helpful to have a default system to measure against. Right now rushes are undervalued, so I would recommend adding time-value to the parameters rather than using the values at the end of the game. 

I appreciate the work done for the mod and I am talking to other players to get them to check it out. I think with some refinement and some accuracy improvement (for default/recommended values) we could see this become an in-game feature in a future alpha.

Edited by BreakfastBurrito_007
I realized that bullet 1 would require more communication between players or players to have each other's replays
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...