Even though i don't have an especially good level in statistics or mathematics, i'll try to answer this considering the idea of causative ...
first thing i want to clarify is that (if i understood well) is that a "win" or a "loss" is not the actual outcome of the multiplayer game, but a comparison of arbitrary selected statistics of the game (with good weight for killed units). This sole fact means that anyway the goal was from beggining to give an estimation and not predict a probability of winning a multiplayer game. So i think that causative explained his system better that i would have done with my english ...
so what's measured is actually a serie of 1v1s in a multiplayer games. If you play a 4v4 game you actually play 3 games, between your allies. As this was not criticized i wanted to make sure it's clarified...
1. So as i said i am not good in statistics but from how i see it : supposing the arbitrary selected score of a game used to compare players are good (though we know it's not the case), then this "multiplayer" rating is similar to the current 1v1 rating of the game : if statistics laws applies to multiplayer rating, it also applies to the 1v1 rating system of this game, so that would mean the 1v1 official system is as bad as this multiplayer system. So knowing that, the issue of this multiplayer system is how score describe a multiplayer game which we, i think, all acknowledge is not 100% accurate, score for example doesn't measure how good a strategy is unless it rewards more kills etc...
2. Already talked about that in a post in the thread, even if i don't speak well, but still, how i see it : indeed, games are supposed to be balanced, but since it's not the outcome (win/loss) of the entire multiplayer game that is measured, i think that we should take other conclusions of this. We have players of different strength in a team. It is supposed to be balanced at the start : that mean players already use more or less relative estimations of a player's strength, in a way that better players play with weaker players. If the game's score agreed with the human estimation of a player's strength, it simply gives a human help to multiplayer rating estimation.
3. i still think another conclusion should be taken. You emphasized the fact that players in a multiplayer games may play in different conditions each game (and i think it is more about fighting a player in the opposing team who is stronger than the one your ally fights) but i would say that with more and more games this fact tends to disappear.
4. Well, for the "smurf" term i guess everyone has its own meaning, though for the troll one : the guy hannibal called a troll was actually the guy who caused a huge disruption, who was the cause of closing registrations. I don't think this term will be used again anytime soon.
EDIT : by the way Feldfeld = Attila2
well, i play chess and in chess too we see upsets : i already beat a player rated 400 over me and drew one rated 450 over me. It doesn't need 1000 games to be accurate. In 0a.d., if a player doesn't want to win i think the it will be seen in the score. But generally as far as i'm concerned in multiplayer games players try to win. And yes, it seems elo system doesn't work in 0a.d. since causative noticed that fide rating doesn't work here for reasons described in first post.