Jump to content

Collecting data from users


Recommended Posts

There are various cases where I think it'd be quite useful to automatically collect data from the game's users. E.g.:

* If we knew what platforms (OSes, OS versions, Linux distros, etc) were most commonly used, we could focus on testing and improving compatibility and packaging for those, and not worry so much about the obscurest platforms.

* If we knew what graphics hardware/drivers people used, we could decide how much to worry about known compatibility problems (broken S3TC on open source ATI drivers, crashes on specific NVIDIA drivers, etc).

* If we knew what graphics hardware/drivers people used, and their typical framerates, and their graphics options (fancywater, shadows, etc), we could build up a database of sensible default graphics settings to get decent performance, and work out where to focus on optimisations.

* If we knew about crashes or assertion failures, we could try to fix them.

* If we had data on how often people use various GUI buttons and hotkeys and gameplay features, we could work out which could be removed or should be advertised more clearly.

(The last point is a bit hypothetical now - we need to finish more of the game and add a tutorial etc before we could have a clearer idea of usability problems - but I think the others would be useful during our current stage of development since they will help us deal with problems that are already occurring.)

Also it would be useful to send some data back to users (without requiring them to download a new release):

* If people are not using the latest release, we can tell them to download it and tell them what's new.

* The game could download updates to the graphics driver database, so we can quickly push compatibility/performance fixes.

What I'm imagining is that the game will include an HTTP client library. That will be used to download release news and compatibility database updates on startup (asynchronously so it doesn't slow loading, and with caching so it doesn't use much bandwidth, and anonymously) from some site that we control. For uploading, the engine will occasionally produce some data (system configuration on startup, average framerate every 5 minutes while playing, etc) and send it over HTTP to a service on our site. (Actually it'd probably save into a local log file first, and then periodically try to upload until it succeeds, to cope with offline users and server downtime without data loss.)

The uploaded data would aim to be anonymous (don't include usernames or chat messages or directory names or IP addresses etc) and would use a randomly-generated-on-first-run user ID to correlate data. The user would be told clearly what data we're going to collect and asked to opt in or out before anything gets uploaded (though we'd strongly encourage them to opt in since it'll help improve the game). The raw data would be saved on the server, and I'd like to make the data publicly available so anyone can analyse it (we shouldn't be proprietary about it) if that's not too risky. We'd then develop some tools to sort through the data and try to extract useful information from it.

An initial implementation of this shouldn't be particularly complex - primarily it involves adding libcurl as a dependency, writing the asynchronous download/upload system, writing the server side (probably trivial in a standard web framework), and worrying about anonymity (carefully check all the uploaded data and anything that can be derived from it) and security (probably should do everything over HTTPS and hard-code the server's public key etc).

Does this seem potentially useful and worthwhile or pointless and problematic?

Link to comment
Share on other sites

  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

  • 2 weeks later...

Very much like this idea! At one point, there was work on an online error reporting system, but the cart apparently got stuck in the mud. I still think even that would be helpful, and the other information we could collect/tell users even more so.

Libcurl and simply sending all previously existing logfiles whenever the executable is started sounds good. (When our process has crashed, it's better to minimize activity, and transmitting immediately is a bit riskier than writing to an (ideally previously created) file and transmitting next time.)

Link to comment
Share on other sites

  • 3 weeks later...

I was thinking in a bit more detail about how this could be implemented. For simplicity and flexibility, I think the basic idea should be that the client sends the server an HTTP request containing a JSON document and optionally some binary files, with a pseudonymous user ID and timestamp. The binary files are needed for error reports with crash logs, or other relatively large pieces of opaque textual data (e.g. simulation command logs). The JSON document is an unconstrained structure, and depends on what type of data is being transmitted (error reports, hardware settings, various types of gameplay stats, etc) and on what game version the user is running (we'll still accept data from users on old versions and SVN versions). The server will blindly store all this data.

About scalability: Currently we get something on the order of 10K downloads per month. Assume they all successfully install and run the game, and use it long enough to send us 10 pieces of data (saying what maps they've played etc). In total that's about 2 pieces of data per minute, and about a million over a year. If each is maybe 1KB then that's 1GB per year. That all seems fairly easy to cope with, and even if we become 10x more popular it shouldn't be much of a worry if we have a sensible storage architecture.

Then we need to analyse the data, which is probably the hard part. I don't know exactly what reports we'll need - they'll probably be relatively arbitrary queries over the JSON data, e.g. counting number of users over the past month who had <10FPS on the main menu grouped by their hardware report's GPU, or whatever, to let us search for patterns of problems. It doesn't matter if the reports lag behind newly reported data by a few hours.

So I think it'd make sense to store the incoming data in a simple non-queryable database (e.g. SQLite with the JSON in a text field, with binary files on the filesystem), then batch convert it into a queryable database (extract the records of a certain type for a certain time period, then parse the JSON and push the interesting fields into a new SQLite/etc database with indexed columns), so we can easily throw away the queryable database and redesign it without disturbing the data collection. Stick on a simple web front-end (probably using Django, because Python is less objectionable than most other languages) with some graphs and it should be alright. For users' privacy I currently think we shouldn't expose the raw data (particularly crash logs which may contain random RAM content) to public users, but aggregated data should be public as far as possible.

Seems like it should be reasonably straightforward...

Link to comment
Share on other sites

Hmm, sounds good, except, my vote would be to go with a restful web service written in Ruby on Rails, backed with MySQL. A request hits /report on the app, stores the plain JSON plus binaries into a raw_reports table. A background job is fired off to analyze the data outside the request and record the information it find into a processed data table. A further background job is then fired off to look for patterns between similar data sets and store them in an aggregate data table.

This is something I could quite easily handle, as my day job deals with such technologies and data processing techniques (albeit, I handle bank data, not crash reports, but the methods of storage would be similar).

A defined JSON response would be handy. It would contain game version and date at the top level, then nested hardware attributes, along with what other sorts of data? Here is a taste of what I mean:

{
"version": "Alpha 3 (Cerberus)",
"date": "2010-12-12T16:14:32+13:00",
"hardware": {
"os": {
"major": "Windows",
"minor": "7",
"version": "7.123.4567"
},
"graphics": {
"model": "Nvidia",
"make": "GForce",
"version": "R4z3r",
"revision": "4.56.124",
"memory": "512"
}
}
}

Feel free to copy and add to, so we can get a final version agreed upon (cause the worst thing in a developers life is changing specs! :-().

Edited by k776
Link to comment
Share on other sites

I'd vote against Ruby, primarily because I don't know it and don't want to bother learning it and I'm terrible at relinquishing control of things :P. Also I expect I'll end up hosting and sysadmining it myself, so I'd want to understand it regardless of who writes it. Everyone but me hates Perl, and JS doesn't have any mature web frameworks, and PHP is a mess with no redeeming features that I've ever heard of, so that leaves Python by process of elimination :). (Also, Python is a nice language with good libraries.)

SQLite is worse than I expected for storage. There's necessarily a transaction each time a piece of user-provided data is saved, and SQLite flushes to disk on each commit so I can only process about 5 POST requests per second on my local machine (and some of them fail, complaining the database is locked). With flushes disabled I can get about 100/sec but then the database will probably be corrupted whenever the server crashes, and there's no recovery tools. Also, any slow read query (e.g. backing up the database, or extracting the data for further processing, or even just looking at the data in an admin interface) will block any writes, which is not good when the aim was to save data quickly.

MySQL with InnoDB with innodb_flush_log_at_trx_commit=2 gets around 100/sec, and should recover from crashes; and it can seemingly execute queries concurrently with inserts so new data shouldn't get held up. So that's probably better. (I imagine Postgres would work similarly, but I already run a MySQL server so it's easier to reuse that.)

I'm thinking data would be like

{
"version": "8832-release", // or "custom build" if it's SVN since we can't tell the revision
"generated_date": "2010-12-12T03:14:32Z",
"data_type": "hwreport",
"data_version": 1,
"data": ... // structure depends on data_type and data_version
}

The user will typically have more than one piece of data like this, and will upload them individually. Each contains one type of data (plus a version number in case we change the structure and want to tell the difference). For the "hwreport" type, the "data" field can be like

{"os_unix":1,"os_linux":1,"os_macosx":0,"os_win":0,"gfx_card":"Tungsten Graphics, Inc ","gfx_drv_ver":"OpenGL 2.1 Mesa 7.9","gfx_mem":0,"gl_vendor":"Tungsten Graphics, Inc","gl_renderer":"Mesa DRI Mobile Intel® GM45 Express Chipset GEM 20100330 DEVELOPMENT ","gl_version":"2.1 Mesa 7.9","gl_extensions":"GL_ARB_copy_buffer [...] GL_OES_EGL_image","video_xres":1024,"video_yres":768,"video_bpp":24,"uname_sysname":"Linux","uname_release":"2.6.35-gentoo-r5","uname_version":"#1 SMP Wed Sep 1 11:53:07 BST 2010","uname_machine":"x86_64","cpu_identifier":"Intel Pentium Dual   T3400  @ 2.16GHz","cpu_frequency":-1,"ram_total":3924,"ram_free":2221}

which is what we already construct for our hwdetect.js script (and basically the same as system_info.txt). The game would generate and transmit that data once a month (or whatever) so we can usually tell if it changed. Other data types can be added whenever we feel like it.

Link to comment
Share on other sites

  • 2 weeks later...
  • 1 month later...

Yep, definitely. I'm thinking it would say something like:

  Help improve 0 A.D.!

You can automatically send us anonymous feedback that will help

us to improve performance and compatibility and to fix bugs.

[ Enable feedback ] [ See technical details ]

in a box on the main menu screen (stuck on the left side so it doesn't block the buttons, probably). That message will always be displayed, until you enable it, at which point it'll change to say:

  Thank you for helping improve 0 A.D.!

Anonymous feedback is currently enabled.

If you want to send a message to the developers, you can enter one here:

[ ]

[ ]

[ Send message ] [ See technical details ] [ Disable feedback ]

The "technical details" button will describe exactly what data we'll send, and what we want to use it for. The message-sending feature is mostly just for fun - it seems nice to let people communicate back, and we can easily produce a listing of all the messages that get submitted.

Link to comment
Share on other sites

Yeah, it'll probably be full of garbage, but that's fine - it's easy to skip over the useless messages and there might be occasional interesting comments or bug reports in there :). (And the messages wouldn't be made publicly visible, so that people will have less incentive to abuse it.)

Link to comment
Share on other sites

The main menu's already got a button that links to the web site, but not many people seem to go to all the bother of registering and activating and posting just for quick feedback. There's a button for IRC webchat too and that seems to get more quick feedback ("just played the game and love it", "Nice game!!! Congratulations but in my Acer aspire one the game was slow....", "hEY aDD GAME SPEED CONTROL", "YOUR ALL MEANIES!", etc), but that's still only about 0.1% of the people who download the game. So I think it'd be interesting to make it as trivial as possible to send feedback and see what happens. We don't need to spend any effort maintaining it - just dump it all as a big list on a web page, accessible to WFG members (since making it public would invite spam and privacy concerns). In the worst case we can ignore it entirely and remove it from the next release, but I think there's a chance it could be useful.

Link to comment
Share on other sites

The main menu's already got a button that links to the web site, but not many people seem to go to all the bother of registering and activating and posting just for quick feedback. There's a button for IRC webchat too and that seems to get more quick feedback ("just played the game and love it", "Nice game!!! Congratulations but in my Acer aspire one the game was slow....", "hEY aDD GAME SPEED CONTROL", "YOUR ALL MEANIES!", etc), but that's still only about 0.1% of the people who download the game. So I think it'd be interesting to make it as trivial as possible to send feedback and see what happens. We don't need to spend any effort maintaining it - just dump it all as a big list on a web page, accessible to WFG members (since making it public would invite spam and privacy concerns). In the worst case we can ignore it entirely and remove it from the next release, but I think there's a chance it could be useful.

(y)

Link to comment
Share on other sites

Very nice!

Can you also shows the OS?

Anyway I am not able to compile 0ad on Linux since yesterday, it stops to:

SimContext.cpp
ParamNode.cpp
ComponentManagerSerialization.cpp
Linking simulation2

While waiting for Philip to answer, have you updated workspaces (running update-workspaces.sh on Linux)? Also, do you get an actual error or is that all you get?

Link to comment
Share on other sites

Very nice!

Can you also shows the OS?

Anyway I am not able to compile 0ad on Linux since yesterday, it stops to:

SimContext.cpp
ParamNode.cpp
ComponentManagerSerialization.cpp
Linking simulation2

lol...first I thought it would be my own mistake(that's why I didn't ask about it in the forum) but I've got the same problem on both Ubuntu and Arch GNU/Linux!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share


×
×
  • Create New...