Jump to content

Multiplayer rankings


causative
 Share

Recommended Posts

I made some multiplayer rankings.  This is a help when setting up games, to decide if teams are fair.

First, these rankings depend on replays (specifically, metadata.json files).  If you'd like to improve or expand these ratings, zip up all your a21 replays and send them to me!  Check https://trac.wildfiregames.com/wiki/GameDataPaths to find where they are on your computer.

 

I have revised these rankings several times as I get more data and improved methods.  See the spoiler below for the rest of the original post, with the original rankings.

  Reveal hidden contents

And here's the updated list with all the changes and extra data so far.  This updated list was generated by the following process:

  1. Start with replays:  mine, Feldfeld's, mapkoc's, Hannibal_Barca's, and temple's.
  2. Filter out single player games, games with AI players, pizza games, survival of the fittest, games with a matchID that was already processed, games under 200 seconds long
  3. Map each player name to a canonical player name, according to a list of known aliases.  For example, "Please" is mapped to "JC".
  4. Score players in each multiplayer game according to 5*(enemy units killed value) + (resources harvested) + 2 * (trade income)
  5. Each team on a multiplayer game is converted into a set of 1v1 matches between players on the same team, the winner of the match being the player with the higher custom score.  However, if two players on the same team have custom scores within 20% of each other, each player is recorded as beating the other.
  6. 1v1 games are also included.
  7. Feed these matches into a WHR (Whole History Rating) algorithm, with rating variance set to 10 per day.  This gives every player a rating.
  8. Determine player "Strengths" such that the sum of strengths on a team predicts which team wins.  It is assumed/required that higher rated players have higher strengths.  A set of strengths that minimizes prediction errors is found through random search, and then normalized to a 0-5 scale.  Note that these strengths are not very strong predictors; just rough estimates.
  9. Strength categories are displayed as headings, e.g. those under the ===== 4 - 5 ===== heading have a strength between 4 and 5.
  10. Players are listed in order of decreasing rating.
  11. Players are indented a number of spaces equal to 10 - the number of multiplayer games they played.  If a player is indented, distrust their ranking.
  12. Players with 3 or fewer games are followed by a list of players they have scored higher than, so you can see whether their rank is reasonable (often not).
  13. A bar is printed at the left margin to prevent line wraps from confusing the rankings.
  14. Players who have never outscored anyone are excluded from the list, to save space.
  15. Known aliases are printed at the end.
  Reveal hidden contents

 

Edited by causative
  • Like 1
Link to comment
Share on other sites

Thank you!  With Feldfeld's replays here are the updated rankings.  I also tweaked it so that each player is counted for 2 extra losses against their best win, instead of 1 extra loss - this reduces the rank of certain players with a 100% win rate but few games.

  Reveal hidden contents

 

Edited by causative
Link to comment
Share on other sites

This could be good but the game is full of smurfs, banned people and it's also a bit unrealistic in overall. For example trade shouldn't be really counted, some are great simmers at that. (or great abusers)

(For more data see the a21 replays page's last few posts to get the link to my replays.)

Also somehow i appear 3x in the list (not after arrows) :P

Edited by Hannibal_Barca
Link to comment
Share on other sites

Full of smurfs and banned people, yes, but best ranked ones are known i think. There are also smurfs that aren't kicked from 1v1 rankings too, furthermore 1v1 ranking is unrealistic in some way, too, since people that play more 1v1s will tend to have better ratings (without considering their actual level). A lot of players deserve being better ranked in 1v1s but they didn't play enough, so top 100 not only is full of inactive people (hence unrealistic considering their level is now not known), but with active players that remain low ranked because of lack of games.

This multiplayer is unrealistic too indeed because sometimes stats aren't enough to assess one's help to the team, and because teams are already supposed to be balanced in pre-game balancing, that means that it is done in a way so that better players are seeded with weaker ones. One of the consequence is that some players are already supposed to be in top of their team from the beggining and won't face best players of opposing team. So from the lot of games i played, i was almost never in borg's team and hence almost never faced him (only when i started 0ad and one recent game i think).

But i still think that in general this ranking tend to guess well reality, will be better if we remove known smurfs.

  • Like 1
Link to comment
Share on other sites

smurfs in this list:

Trapast = pesem

Please = smirno = Burger_III = Burger_II = BurgerExpress = JeanClaude = JC_TRUMP_World_&_Partners

ragnarlothbrok = noobie

LeRoiScorpion = TheLegendary

defenderbenny = WW_Butcher

zztop = Please

moe__ = mo

spolia_opima = PhyZic

GlenRunciter = meiyo

equilibrium = equilizer

elexis2 = elexis3 = elexis4

(red = banned)

  • Like 1
Link to comment
Share on other sites

The name "Hannibal_Barca" appears twice in the list (not to the right of an arrow).  The second time, your name is preceded by the unicode character U+200E which is a left-to-right mark, an invisible formatting character.  No other player has that symbol in their name, so it's not a systematic bug in my code.  Perhaps someone was cleverly impersonating you?

Thanks for the list of aliases.  I see you have zztop = Please = JC.  Is zztop = JC?  Also you have TheLegendary = LeRoiScorpion = Please, so that would merge those three lines together, i.e.

Please = smirno = Burger_III = Burger_II = BurgerExpress = JeanClaude = JC_TRUMP_World_&_Partners = LeRoiScorpion = TheLegendary = zztop

Is that right?  How certain are you of all these?  zztop in particular has been around for quite a while.

Thanks for the replays too.  In the following list, I incorporated 4 changes.  I added your replays, used a slightly abridged version of your smurf list, excluded pizza games, and added player strengths.

 The purpose of player strengths is so you can add up the player strengths on each team, and predict which team will win by which team has the larger sum of strengths.

To find player strengths, I started with the ranked list of players, and randomly adjusted player strengths with the requirement that higher ranked players have to be stronger than lower ranked ones.  If a random adjustment improves the accuracy of using strength to predict game outcomes, I keep it, otherwise I throw it away.  At the end I scale the strengths to a 0-5 scale.  I classified 224 out of 297 games (75%) correctly, using player strengths.  So it's not super accurate, but perhaps it can serve as a rough guide for setting up teams.

Note that betam is above borg- in these rankings, which is wrong, but he has a good record:  he outscored Hannibal_Barca, causative, maxticatrix, Hannibal_Barca, causative, Eurakles, Feldfeld, cb3001, META-BARONS, Hannibal_Barca, and maxticatrix, while losing only to Feldfeld and caesar_salad.

  Reveal hidden contents

I also tried your suggestion of leaving out trade from the score.  I think trade is important in long games; the team with more trade usually wins in the end.  But why not.  Here's the no-trade list:

  Reveal hidden contents

 

Edited by causative
Link to comment
Share on other sites

I have many things to say to this but for now I'll just mention a few points:

  1. Whatever you mean by "Good" - as in good player - is unlikely a one dimensional property of a player (if at all) and thus can't be expressed with a number. What can be expressed with a number is the frequency of a player winning games in the past. But that's not telling much about a players likelihood of winning a game in the future! That's because games - their settings and the participating players - are differing for most games and thus don't produce independent and identically distributed random variables. You also need massive amount of data points (1000 is sometimes considered the lower limit) to come to useful conclusions.
  2. Persons react to ratings. This means even if assuming the outcomes of games where randomly distributed over all games (which might by OK) and we have enough data you will break a precondition of statistics by generating them and using them for the purpose stated here - helping PPL to host fair (Guessing here it's meant all teams have an equal chance of winning the game hosted according to the analyzed data) games.
  3. Since one doesn't want to get the likelihood of any player to win the game (which should be 0.5 if nothing is broken ;p) but of a specific player that means one can only use the data of past games he participated in. Since this holds true for all players participating in the particular game you can only use data of games with the exact same players. That makes the "masses of data" requirement quite hard to be fulfilled.
  4. As far as I'm concerned "troll" and "smurf" is an insult (Definitely out of the scopes of statistics BTW ;p). So please be careful with those!

EDIT: Some rating systems (like ELO) are designed to give a relative score to each player of a (stable - over many games) group of players assuming that games are played one player vs. one player (simplifying things - but that's not true for 0 A.D.) and players always want to win (likely true for competitive communities like a chess league - not so much for mixed communities like that of 0 A.D.). And so on and so forth...

  • Like 2
Link to comment
Share on other sites

  On 08/05/2017 at 11:15 AM, FeXoR said:

I have many things to say to this but for now I'll just mention a few points:

  1. Whatever you mean by "Good" - as in good player - is unlikely a one dimensional property of a player (if at all) and thus can't be expressed with a number. What can be expressed with a number is the frequency of a player winning games in the past. But that's not telling much about a players likelihood of winning a game in the future! That's because games - their settings and the participating players - are differing for most games and thus don't produce independent and identically distributed random variables. You also need massive amount of data points (1000 is sometimes considered the lower limit) to come to useful conclusions.
  2. Persons react to ratings. This means even if assuming the outcomes of games where randomly distributed over all games (which might by OK) and we have enough data you will break a precondition of statistics by generating them and using them for the purpose stated here - helping PPL to host fair (Guessing here it's meant all teams have an equal chance of winning the game hosted according to the analyzed data) games.
  3. Since one doesn't want to get the likelihood of any player to win the game (which should be 0.5 if nothing is broken ;p) but of a specific player that means one can only use the data of past games he participated in. Since this holds true for all players participating in the particular game you can only use data of games with the exact same players. That makes the "masses of data" requirement quite hard to be fulfilled.
  4. As far as I'm concerned "troll" and "smurf" is an insult (Definitely out of the scopes of statistics BTW ;p). So please be careful with those!

EDIT: Some rating systems (like ELO) are designed to give a relative score to each player of a (stable - over many games) group of players assuming that games are played one player vs. one player (simplifying things - but that's not true for 0 A.D.) and players always want to win (likely true for competitive communities like a chess league - not so much for mixed communities like that of 0 A.D.). And so on and so forth...

Expand  

Even though i don't have an especially good level in statistics or mathematics, i'll try to answer this considering the idea of causative ...

first thing i want to clarify is that (if i understood well) is that a "win" or a "loss" is not the actual outcome of the multiplayer game, but a comparison of arbitrary selected statistics of the game (with good weight for killed units). This sole fact means that anyway the goal was from beggining to give an estimation and not predict a probability of winning a multiplayer game. So i think that causative explained his system better that i would have done with my english ...

  On 06/05/2017 at 5:57 AM, causative said:

Each player was considered to have "beaten" other players on their own team, if their score in that game is higher.  Players were not compared to players on the opposing team, because being on the winning/losing team artificially inflates/decreases your score due to the action of your teammates, and I didn't want that to influence the results.  If you were within 20% of another player's score, then both you and the other player were considered to be tied (each recorded as beating the other)

Expand  

so what's measured is actually a serie of 1v1s in a multiplayer games. If you play a 4v4 game you actually play 3 games, between your allies. As this was not criticized i wanted to make sure it's clarified...

1. So as i said i am not good in statistics but from how i see it : supposing the arbitrary selected score of a game used to compare players are good (though we know it's not the case), then this "multiplayer" rating is similar to the current 1v1 rating of the game : if statistics laws applies to multiplayer rating, it also applies to the 1v1 rating system of this game, so that would mean the 1v1 official system is as bad as this multiplayer system. So knowing that, the issue of this multiplayer system is how score describe a multiplayer game which we, i think, all acknowledge is not 100% accurate, score for example doesn't measure how good a strategy is unless it rewards more kills etc...

2. Already talked about that in a post in the thread, even if i don't speak well, but still, how i see it : indeed, games are supposed to be balanced, but since it's not the outcome (win/loss) of the entire multiplayer game that is measured, i think that we should take other conclusions of this. We have players of different strength in a team. It is supposed to be balanced at the start : that mean players already use more or less relative estimations of a player's strength, in a way that better players play with weaker players. If the game's score agreed with the human estimation of a player's strength, it simply gives a human help to multiplayer rating estimation.

3. i still think another conclusion should be taken. You emphasized the fact that players in a multiplayer games may play in different conditions each game (and i think it is more about fighting a player in the opposing team who is stronger than the one your ally fights) but i would say that with more and more games this fact tends to disappear.

4. Well, for the "smurf" term i guess everyone has its own meaning, though for the troll one : the guy hannibal called a troll was actually the guy who caused a huge disruption, who was the cause of closing registrations. I don't think this term will be used again anytime soon.

EDIT : by the way Feldfeld = Attila2

well, i play chess and in chess too we see upsets : i already beat a player rated 400 over me and drew one rated 450 over me. It doesn't need 1000 games to be accurate. In 0a.d., if a player doesn't want to win i think the it will be seen in the score. But generally as far as i'm concerned in multiplayer games players try to win. And yes, it seems elo system doesn't work in 0a.d. since causative noticed that fide rating doesn't work here for reasons described in first post.

Edited by Feldfeld
  • Like 1
Link to comment
Share on other sites

  On 08/05/2017 at 7:32 PM, Feldfeld said:

This sole fact means that anyway the goal was from beggining to give an estimation and not predict a probability of winning a multiplayer game.

Expand  

I guess it's about estimating the outcome (win/lose) of the game hosted? If so how do you want to estimate if not with a probability?

 

I read:

  On 06/05/2017 at 5:57 AM, causative said:

This is a help when setting up games, to decide if teams are fair.

Expand  

and assumed what is meant by "fair" in my previous post.

 

Correct me if I misunderstood anything, please..

  • Like 1
Link to comment
Share on other sites

  On 09/05/2017 at 1:39 AM, FeXoR said:

I guess it's about estimating the outcome (win/lose) of the game hosted? If so how do you want to estimate if not with a probability?

 

I read:

and assumed what is meant by "fair" in my previous post.

 

Correct me if I misunderstood anything, please..

Expand  

You're right, i expressed badly...

What i mean is that currently, it is a ranking, not a rating (and i made the mistake many times already in the thread). Currently teams are balanced considering relative players strength, so they don't use probabilities. I guess a way to try to balance a game with this ranking is to put on a team, the best ranked player with the 4th, the 5th and the 8th (following the logic of "the first beat the 2nd, but the 3rd beat the 4th etc) but of course  players can be same strength and that messes up a bit the balance. I'm simply not sure we can use probabilities with a ranking only and not a rating. Though causative in a post tried to rate players from 0 to 5 considering their ranking so i don't really know.

  • Like 1
Link to comment
Share on other sites

You are absolutely correct that a list of players sorted by strength doesn't give you a probability for the victory of one side. But that's exactly what you want to have so you additionally have to assume something like

  On 09/05/2017 at 5:47 AM, Feldfeld said:

the first beat the 2nd, but the 3rd beat the 4th etc

Expand  

adding to the uncertainty (you don't even attempt to calculate).

So basically it ends up in the same way as the probabilistic attempt: Not enough information

Link to comment
Share on other sites

  On 09/05/2017 at 11:36 AM, FeXoR said:

You are absolutely correct that a list of players sorted by strength doesn't give you a probability for the victory of one side. But that's exactly what you want to have so you additionally have to assume something like

adding to the uncertainty (you don't even attempt to calculate).

So basically it ends up in the same way as the probabilistic attempt: Not enough information

Expand  

Yes, not enough information but i still trust that, following the "better than nothing" logic :

sometimes, i see games setup where a player say that currently the game is not balanced, the host says "suggest changes" but none are suggested. The ranking system would maybe suggest a balancing.

sometimes, the host, and the players, don't all know each others. Maybe, by checking the rating system, we can see who's the closest known player to the unknown one to have an estimation that i would call somewhat decent, because the issue here would be how the ranking is made. It would give a relative strength value that i think is used by most of people to balance. I would call that better than nothing.

So here what i don't agree about is this sentence :

  On 09/05/2017 at 11:36 AM, FeXoR said:

But that's exactly what you want to have

Expand  

sometimes, a help, even if it still doesn't give a perfect balance, is helpful and good to take, or at least that's what i think, from my experience in the game.

Link to comment
Share on other sites

FeXoR, Feldfeld explained it pretty well.  Rankings here are not based on whether your team wins or loses the game.  They are based on whether you have a higher (custom) score than other players on your own team.

 

I've experimented with some other methods of ranking the players.  The method I described in my first post, where I divide the weighted sum of wins by the weighted sum of losses, is pretty ad-hoc.  I have settled on something called WHR (whole-history rating) which has a more sound statistical basis.  Also, I've incorporated Hannibal_Barca's additional smurf list, and I'm no longer displaying players with 0 wins in my records, and I've added a vertical bar ┃ before each ranked player, so that if the line wraps you know it's not a new line.  Here is the updated list:

  Reveal hidden contents

 

Edited by causative
Link to comment
Share on other sites

Since there's only 8 players max, it's pretty quick to just try every combination of teams and see which is best.

 

Here's the latest ratings list.  Thanks to mapkoc for sending his replays as well.  Other changes:  I have included single player games, and excluded survival of the fittest.  (Pizza was already excluded in previous rankings).  I have also taken into account when each game was played, to allow for players getting better over the duration of a21.  I've updated the first post in this thread with a full explanation of the process.

  Reveal hidden contents

 

Edited by causative
  • Like 1
Link to comment
Share on other sites

borg = borg- = borgsvn

Cesar_SVN = Cesar

palank = palank_svn

fatherbushido = fatherbushido2 = fatherbushido_svn

ffffffff = fpre

Hannibal_Baraq = Hannibal_Barca

JC = TurboBurger

franksy07 = Franksy

mo = moe__    --- mo is the used account, not moe__

 

Also should banned accounts be left in the list?

Edited by Hannibal_Barca
Link to comment
Share on other sites

Thanks for the expanded alias lists.  Also, thanks to temple for sending his replays, bringing the database to 496 games (after filtering and deduping).  Here are the updated rankings, with temple's replays and an updated smurf list:

  Reveal hidden contents

 

Edited by causative
Link to comment
Share on other sites

All that assumes that there is a 1 dimensional distribution of players "skills" and that a higher "skilled" player is more likely to win against a lower "skilled" player weather you use "chance to win" or not.

(And again: I strongly doubt "skill" is a number or sortable in the first place, even worse in team games)

Also you have a goal to give hosts a hint how to arrange the teams. Usually players are humans though and what they are going to use a ranking for is not necessarily the thing the author meant it to be used for.

So while I wish you fun with playing around with ratings I ask you to think about the consequences of such ratings.

And I don't see a simple solution for this dilemma that can (and in other gaming communities already had) consequences like:

  1. Scores without automated balancing: All players try to swap into the team with the highest average score (and games will fill up very slowly because most players want to have a decent chance of winning). If the host distributes the players into teams the host will barely ever wind up in a team with lower average score (Strange, isn't it ;) )
  2. Scores with automated balancing: Players just enter a game and then watch TV or something to get a lower score so the next game they actually participate in will be more likely won (for they get underrated (And friends clamor about not always being in the same team).

If disconnecting counts as win or is handled differently will influence the behavior though not really in a "good" way (while with "good" I mean helping to have better means for or chances of a fair yet competitive game - meaning from the beginning of the game until the victory/defeat conditions are meat for all participating players).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...