Jump to content

Progress reports on funded work


Recommended Posts

A while ago we asked for donations via Pledgie to support a developer doing full-time work on the game for a month, and had a great response. I'm currently able to devote that time to the project (I've just finished a PhD, and haven't sorted out long-term employment yet and can wait a while) so I'm going to start doing that now. This isn't meant to detract from the open source development of the game - instead it's an added opportunity to tackle some of the larger problems that can benefit from more concentrated work than usual.

I'll use this forum topic to record my progress, so people can see what their donations are funding and hopefully get a vaguely interesting view into the development process. Detailed technical discussions should probably go in separate topics to keep things organised, but comments are generally welcome here :)

My current (but quite flexible) plan of tasks to focus on is:

  • Saved game support - particularly to allow players to reconnect to a multiplayer game after a crash or network failure, since large gameplay testing sessions are often annoyingly interrupted by dropped players. Saving single-player matches is also a frequently requested feature.
  • Better tools for measuring the game's performance. We want to optimise various aspects of performance - overall framerate, consistency of framerate (i.e. avoiding occasional pauses that make the game feel jerky), better utilisation of multi-core CPUs, etc - and I think it's worth investing in new ways of recording and analysing performance so we can improve it more effectively.
  • Better engine support for AI scripts - mainly to increase performance in various ways, and to fix some other areas that cause pain for AI scripters.
  • More efficient pathfinder - one of the current algorithms can perform particularly badly with large numbers of units in a small area (e.g. melee battles), and needs to be redesigned to provide much better performance in those cases.

This should be enough to get started with :)

  • Like 1
Link to comment
Share on other sites

A while ago we asked for donations via Pledgie to support a developer doing full-time work on the game for a month, and had a great response. I'm currently able to devote that time to the project (I've just finished a PhD, and haven't sorted out long-term employment yet and can wait a while) so I'm going to start doing that now. This isn't meant to detract from the open source development of the game - instead it's an added opportunity to tackle some of the larger problems that can benefit from more concentrated work than usual.

I'll use this forum topic to record my progress, so people can see what their donations are funding and hopefully get a vaguely interesting view into the development process. Detailed technical discussions should probably go in separate topics to keep things organised, but comments are generally welcome here :)

My current (but quite flexible) plan of tasks to focus on is:

  • Saved game support - particularly to allow players to reconnect to a multiplayer game after a crash or network failure, since large gameplay testing sessions are often annoyingly interrupted by dropped players. Saving single-player matches is also a frequently requested feature.
  • Better tools for measuring the game's performance. We want to optimise various aspects of performance - overall framerate, consistency of framerate (i.e. avoiding occasional pauses that make the game feel jerky), better utilisation of multi-core CPUs, etc - and I think it's worth investing in new ways of recording and analysing performance so we can improve it more effectively.
  • Better engine support for AI scripts - mainly to increase performance in various ways, and to fix some other areas that cause pain for AI scripters.
  • More efficient pathfinder - one of the current algorithms can perform particularly badly with large numbers of units in a small area (e.g. melee battles), and needs to be redesigned to provide much better performance in those cases.

This should be enough to get started with :)

God speed and hope you can complete this work and hopefully more in about a month . Best of luck .

Link to comment
Share on other sites

Day 1

Worked on improving the engine support for saved games.

As a basic introduction: The problem is that the game has a load of complex data structures storing its representation of the world in memory - data about units like their positions, hitpoints, the paths they're moving along, and hundreds of other things, along with each player's resource counts and the explored regions of the map and so on. All that data needs to be flattened into a string of bytes that can be saved into a file, and it needs to be portable between different computers. The saving process also needs to be comprehensive: if we forget to save one minor bit of data (perhaps we fail to store the wing positions of a butterfly), the game will play out differently when you load it back (perhaps a tornado will no longer form), causing unpredictable buggy behaviour. (Not that we actually have tornadoes in the game. Or butterflies.)

As an example of the data we need to save, this is a female citizen:

- id: 1134
Decay:
Footprint:
Minimap:
r: 80
g: 80
b: 200
active: true
x: 708.81771
z: 801.58425
Obstruction:
active: true
moving: true
control group: 1134
tag: 1426
flags: 5
OverlayRenderer:
Ownership:
owner: 1
Position:
in world: true
x: 708.81771
z: 801.58425
last x: 707.65525
last z: 801.00235
rot x: 0
rot y: 1.0468
rot z: 0
altitude: 0
relative: true
anchor: "upright"
floating: false
Selectable:
UnitMotion:
radius: 0.8
state: 3
path state: 3
ticket: 0
target entity: 0
target pos x: 0
target pos y: 0
target offset x: 0
target offset y: 0
target min range: 0
target max range: 0
speed: 6.5
length: 0
length: 1
waypoint x: 717.50824
waypoint z: 805.93458
type: 0
goal x: 717.50824
goal z: 805.93458
goal u x: 0
goal u z: 0
goal v x: 0
goal v z: 0
goal hw: 0
goal hh: 0
Vision:
VisualActor:
actor: "units/carthaginians/female_citizen.xml"
r: 1
g: 1
b: 1
AIProxy:
object: {
"changes": {
"position": [
708.8177032470703,
801.5842437744141
]
},
"needsFullGet": false
}
Armour:
Attack:
Auras:
Builder:
Cost:
Health:
object: {
"hitpoints": 75
}
Identity:
Loot:
Looter:
object: {}
ResourceGatherer:
object: {
"carrying": {}
}
Sound:
Stamina:
object: {}
StatusBars:
UnitAI:
object: {
"orderQueue": [
{
"type": "Walk",
"data": {
"x": 717.5082397460938,
"z": 805.9345703125
}
},
{
"type": "Gather",
"data": {
"target": 469,
"type": {
"generic": "wood",
"specific": "tree"
},
"lastPos": {
"x": 695.7979736328125,
"y": 22.38250732421875,
"z": 710.696044921875
}
}
}
],
"order": {
"type": "Walk",
"data": {
"x": 717.5082397460938,
"z": 805.9345703125
}
},
"formationController": 0,
"isGarrisoned": false,
"isIdle": false,
"lastFormationName": "Line Closed",
"stance": "passive",
"fsmStateName": "INDIVIDUAL.WALKING",
"losRangeQuery": 153,
"heldPosition": {
"x": 717.5082397460938,
"z": 805.9345703125
}
}

The engine already supports most of the functionality needed for this, but it's hard to verify that it doesn't have a few bugs hiding in it. I've now added some code that developers can enable to test the correctness of the saving/loading code. It'll automatically save the game after every single state update (about 5 times per second), load it again into a second copy of the game, run the gameplay update code on both copies, and verify that the result is the same in both cases. If we ever forget to save some data that will affect the behaviour of the simulation, the two copies should get out of sync and we can detect and debug the problem. Running in this mode is noticeably slower than normal but still quite playable.

I've also added a quicksave/quickload feature (press shift+F5 to save, shift+F8 to load) which provides another way to test this code. One known problem is that animation state is not saved - if a unit is walking when you save, it'll slide across the ground while standing perfectly upright after you reload. Also there's no support for saving AI yet, and it'll explode catastrophically if you try it in multiplayer, but the rest of it should work.

Now I'm working on extending the networking code to support transferring large files between players, with the aim of letting players reconnect to a multiplayer game if they crash or temporarily lose their network connection instead of having to abandon the match. One of the already-connected players will automatically save their game and transfer it to the new player, who will load it and then can continue with the match. This file-transfer code should be reusable for some future features too, like automatically downloading maps if not all players have it installed. It's half working now, so I'll try to finish it off tomorrow.

Link to comment
Share on other sites

To be honest i don't think its worth putting much effort into reviving disrupted multiplayer games, if its any more than 2 players it becomes very difficult to get everyone ready to continue the game.

This is mainly for when a single player has lost their connection to the server - the other players carry on without them, and that player can rejoin just by connecting to the server again, so it doesn't need any complex coordination between players. That's been a fairly common situation when I've been playtesting (with WFG developers and other people on IRC) and it's a major pain to lose half an hour of progress just before you were about to launch an attack on your enemy, so it seems worthwhile spending maybe a day making it possible to recover from that situation :)

Link to comment
Share on other sites

This is mainly for when a single player has lost their connection to the server - the other players carry on without them, and that player can rejoin just by connecting to the server again, so it doesn't need any complex coordination between players. That's been a fairly common situation when I've been playtesting (with WFG developers and other people on IRC) and it's a major pain to lose half an hour of progress just before you were about to launch an attack on your enemy, so it seems worthwhile spending maybe a day making it possible to recover from that situation :)

That would be an excellent feature for multiplayer. Not sure I've ever seen an RTS with such an outstanding feature.
Link to comment
Share on other sites

My current plan is to do as little GUI work as possible, so I won't change how disconnections are handled - the player will get sent back to the menu screen, and the other players will carry on as normal. (We might want to change it later so e.g. the host can voluntarily pause the game if they're waiting for the player to return, and so the player gets some informative automatic reconnection screen). Then the first player can exit and restart their game if they want (it doesn't store any state on the client), and join the server's IP as normal, and if they have the same name as a previously-disconnected player then they'll be given control of that player slot. (If we had some persistent user account identifier, we should use that instead of name). Then (at least this is the theory and I think it probably should be easy enough to implement) the host will save the state at the current turn N; transmit it; the client will load the map and deserialise the state; the host will transmit the commands from other players between turn N and the new current turn M (it carries on running while transmitting); the player fast-forwards their simulation up to turn M; maybe iterate the last two steps until the player catches up completely; then the server starts accepting commands from the new player and they've been spliced back into the game.

If other games don't do this, I don't know why, because it seems obvious when you have saved games and deterministic replay :)

Link to comment
Share on other sites

My two cents worth: Don't use just usernames to determine who to match up. Short of implementing a login system, for now, use the same ID used in profile reporting as the re-entry key. Would that work? Any big issues doing that?

I think if a player leaves they go to the main menu. If they disconnect, the game should pause for them, the screen darkens (like most games pause screens), and the text "Disconnected. Attempting reconnect...". After 10 seconds, if it doesn't reconnect, then throw them to the main menu with "You were disconnected from the game and could not be reconnected. Please contact the game host.".

As for the transmit file plan, sounds good. Will need some type of GUI (a progress bar for the % of the file size, with status updates: "Downloading map...", "Downloading turns...", "Finalizing..." (like the map loading screen, but doesn't need to have the pictures/tips).

Link to comment
Share on other sites

If other games don't do this, I don't know why, because it seems obvious when you have saved games and deterministic replay :)

I guess the only "danger" is if the file transfer is really, really slow, it might not be feasible for the reconnecting player to receive the complete state and fast-forward before the game ends. In general though I think that is the solution we should strive for. AOK would pause the game when a player disconnected, and it had really buggy disconnect handling anyway (due to reliance on UDP?), so we should aim higher :)

Link to comment
Share on other sites

After 10 seconds, if it doesn't reconnect, then throw them to the main menu with "You were disconnected from the game and could not be reconnected. Please contact the game host."

Hmm, that message is bad, because it implies they can't try connecting again, but I think what Philip was saying is that they can try to reconnect.

Link to comment
Share on other sites

I guess the only "danger" is if the file transfer is really, really slow, it might not be feasible for the reconnecting player to receive the complete state and fast-forward before the game ends.
What about implementing auto save and then when a host gets a request from a player:

if (game->savedState().size() > game->turnsSince(request->lastTurn()).toFile().size() ) {
sendTurnsFrom(request->lastTurn())
} else {
sendFullMap()
}

So on a huge map with 100's of units, if the player was only disconnected for 2-3 seconds, send the turns rather than the full map. If the turns are greater than the map size though, send the whole map (to avoid as many issues as possible).

Would something like that be feasible?

Hmm, that message is bad, because it implies they can't try connecting again, but I think what Philip was saying is that they can try to reconnect.

True, perhaps something like this: "You were disconnected from the game. Please try reconnect. If problems persist, contact the game host."

Link to comment
Share on other sites

My two cents worth: Don't use just usernames to determine who to match up. Short of implementing a login system, for now, use the same ID used in profile reporting as the re-entry key. Would that work? Any big issues doing that?

Wouldn't that interfere with the purpose of those IDs? (Namely to make the profiles reasonably anonymous while still allowing stats from the same user to be registered as just that.)

Link to comment
Share on other sites

My two cents worth: Don't use just usernames to determine who to match up. Short of implementing a login system, for now, use the same ID used in profile reporting as the re-entry key. Would that work? Any big issues doing that?

What Erik said - those IDs are designed to have some limited security properties (like anonymity (or at least pseudonymity)), and reusing them in other contexts would risk weakening those properties. It'd be better to have a separate persistent ID that's designed for multiplayer usage (so e.g. you can't impersonate another player's ID, and so it could be integrated with the lobby server and rankings and global bans and whatever) so we can avoid unexpected side-effects. If we want to stop people cheating in multiplayer (e.g. connecting as other players) there's a lot more we'd need to do anyway (there's basically no security in the current implementation), so I think that should be approached as a separate task.

I guess the only "danger" is if the file transfer is really, really slow, it might not be feasible for the reconnecting player to receive the complete state and fast-forward before the game ends.

Yeah, it might not be ideal if you have a dialup network connection and an extremely slow CPU. But it's probably still better to make one player wait for 5 minutes when reconnecting than to make every other player pause for 4 minutes while the first is downloading the state. (And the (compressed) state should be pretty small (maybe 100KB but I haven't measured accurately yet) so it shouldn't be too bad anyway.)

What about implementing auto save and then when a host gets a request from a player:

[...]

So on a huge map with 100's of units, if the player was only disconnected for 2-3 seconds, send the turns rather than the full map. If the turns are greater than the map size though, send the whole map (to avoid as many issues as possible).

That should work but adds a bit more complexity (e.g. making sure the player doesn't load the autosave from a different match) - it'll probably be worthwhile if disconnections are common and the simpler full resychronisation approach is too slow in practice.

Link to comment
Share on other sites

That should work but adds a bit more complexity (e.g. making sure the player doesn't load the autosave from a different match) - it'll probably be worthwhile if disconnections are common and the simpler full resychronisation approach is too slow in practice.
If the game were to try and auto connect for up to 10 seconds after a disconnection, I think this approach (sending turns instead of the map) makes the most sense.

My most common use case is my internet connection cutting out for just 2 seconds (thankfully not often, but it does). It comes back online fairly quickly, and the game should be able to recover from that.

So forgetting file size then (to avoid desync issues), how's this:

* Player disconnects

* Game tries to reconnect right afterwards (with screen saying so: "Reconnecting...", stops trying to reconnect either after say 30 seconds or the user hits a cancel button)

Either:

* Game successfully reconnects, using the same map it was just playing, downloads turns since last turn and applys them and the user is back in

Or:

* Game fails to reconnect. User gets "You were disconnected from the game. Please try reconnect. If problems persist, contact the game host." message. They try rejoin the game, and the entire map downloads, and then the turns, all are applied and user is back in.

That sounds like it'd work, avoid any sync issues, avoid needless user interacting for small connection hickups..

Link to comment
Share on other sites

Day 2

Worked more on reconnection support. It's a bit more painful to implement than I hoped (the saving/loading is fine but the coordination between server and clients is already complex and this extension makes it worse). But the above

shows a fundamentally working version, which is nice :). Should just need to clean up the code now, then test it more carefully and fix any remaining problems.

With this example (Oasis II at the start of a match (i.e. with very few units, except for trees and animals)) the entire saved game state is only about 80KB, so it should download pretty quickly to a reconnecting player.

Link to comment
Share on other sites

Day 3

Cleaned up the code for handling reconnection, and tested it over the internet with some people. The reconnection seemed to work fine - the only bugs found were in the saving/loading code.

The trickiest bugs are in the code that handles line-of-sight calculations (i.e. revealing the map and pushing back the fog-of-war when in the vision range of a unit). The basic idea is that the code has a 2D grid counting how many units can see the corner of each tile:

pr-los-1.png

The two blue units (with vision range represented as the large circles) are close together, so some tiles are visible to both units and some to only one. If a tile is in range of at least one unit, then it should be visible to the player, else it'll be fog-of-war or shroud-of-darkness.

In the real game, the vision ranges might have a radius of perhaps 16 tiles, and there might be a thousand units on the map. Computing the counts for every visible tile for every unit would be pretty slow - there's just too much data to process. To keep the game running fast, we only compute the whole visibility-count grid at the start of the game, and then update it incrementally as units move around the map.

For example, consider a unit moving from the red position to the green position:

pr-los-2.png

We're interested in the changes to the values in the visibility-count grid, rather than the absolute values. We just need to subtract 1 from the count for the tiles in the unit's old vision range (red), and then add 1 for the tiles in the new vision range (green), and we should end up with the same result as if we recomputed the entire grid from scratch with the unit in its new position.

Since units usually move very short distances, most tiles in the old vision range will still be in the new vision range. We'd be subtracting 1 then immediately adding 1, so we actually don't need to update them at all - only the tiles inside one range but outside the other need updating. With that optimisation, we rarely need to update more than a few dozen entries in the visibility-count grid every time a unit moves, so it's pretty fast.

The problem is that in some rare cases, the result is not the same as recomputing from scratch, due to bugs in the implementation of this algorithm. (The implementation tries to avoid the obvious approach of using square-root functions to compute the edges of circles, because square-roots are slow, so it does something a bit trickier and more error-prone.) That doesn't matter much in normal games - the bugs are largely unnoticeable - but it matters when saving/loading games for multiplayer reconnection. We don't store the visibility-count grid in the saved game file (to minimise the file size), and instead we recompute it from scratch when loading. But that means one player has the newly recomputed version, while the older players have the incrementally-computed version, and they differ. The game eventually detects that inconsistency and reports an out-of-sync error. If the error wasn't detected, perhaps one player's copy of the game would think a unit had lost its target into fog-of-war while another player thought the target was still visible, so the rest of the game could play out differently and the players would get very confused.

So I need to fix that problem now. Also the GUI needs some improvements and fixes when reconnecting, but hopefully that'll be about enough.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...