detritux Posted March 27, 2011 Report Share Posted March 27, 2011 Hi,I just tried to compile 0ad from the git repository on arch linux (64). I get the following error when starting the game:file.cpp(132): Function call failed: return value was -110301 (Error during IO)Function call failed: return value was -110301 (Error during IO)Location: file.cpp:132 (WaitUntilComplete)I attached the crashlog.txt.Any help greatly appreciated.Cheers,crashlog.txt Quote Link to comment Share on other sites More sharing options...
janwas Posted March 27, 2011 Report Share Posted March 27, 2011 Thanks for the report; aio_return failing is bad news. What kind of disk is this on?Can you tell us the value of errno at that spot in the code?Finally, could you please add some instrumentation to check what the actual return values ofaio_error and aio_suspend were?I'd recommend replacing the while loop at file.cpp:124 with something like:int error = 1234, suspend = 1234;for(;{ error = aio_error(&req); if(error != EINPROGRESS) { debug_printf(L"aio_error %d %d\n", error, errno); break; } aiocb* const reqs = &req; suspend = aio_suspend(&reqs, 1, (timespec*)0); // wait indefinitely debug_printf(L"aio_suspend %d %d\n", suspend, errno);}and adding a [debug_]printf of errno and the return value of aio_return. Quote Link to comment Share on other sites More sharing options...
detritux Posted March 27, 2011 Author Report Share Posted March 27, 2011 (edited) I added the debug you asked:During the loop:aio_suspend 0 0aio_error 1 0And after the loop:errno=0 aio_return=-1I also added a debug in the Open function of this same file (line 41), and it appears it's trying to open "/" :Here is the line I used on line 42. printf("Opening %s\n", pathname.string().c_str());Any ideas?Cheers, Edited March 27, 2011 by detritux Quote Link to comment Share on other sites More sharing options...
janwas Posted March 27, 2011 Report Share Posted March 27, 2011 Thanks for that info!So we have a call to aio_error returning EINPROGRESS, then successful aio_suspend, but aio_error returning EPERM. That's weird.There's a scary description of a race condition with a similar result:https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=595499However, this always happens, right? (i.e. it's not a race condition)Can you run the game in strace and post the resulting log? Maybe we'll see some obviously incorrect parameter.Just to rule out other weirdness: this is on a plain hard disk with no special/rare filesystem, right? Quote Link to comment Share on other sites More sharing options...
detritux Posted March 27, 2011 Author Report Share Posted March 27, 2011 Yes it's happening each time I try to run pyrogenesis / pyrogenesis_dbg / test or test_dbg.I attached the strace. Running on a standard hard drive with ext4.strace.txt Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted March 27, 2011 Report Share Posted March 27, 2011 ricotz on IRC reported the same error (backtrace) today, and said "also previously working builds failing now, so it seems to be an ubuntu problem" (but if it happens on Arch too then presumably not exclusively an Ubuntu problem so I have no idea really). Quote Link to comment Share on other sites More sharing options...
janwas Posted March 27, 2011 Report Share Posted March 27, 2011 hm, Philip has wow-ed me before with an strace log listing each syscall and the parameters.Unfortunately that doesn't seem to have worked here - it just shows the usual program output (or did we get the wrong file?)I am relieved to hear "previously working builds failed", i.e. it isn't due to recent changes and is probably an OS bug. This is actually plausible because I don't think many programs use aio.Are you willing and able to report this bug in your distro? Not sure which channels are appropriate there.It'd be interesting to just call those basic file APIs on existing files - it doesn't seem to be failing immediately, since there is mention of texture conversion in the backtrace and we open/read other files first (several config files and then hwdetect.js). If we get a relatively simple reproducible test case (probably requires reading multiple files) showing the problem, it's much more likely to be fixed. Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted March 28, 2011 Report Share Posted March 28, 2011 Running it like "strace ./pyrogenesis 2>&1 >output.txt" should save the output (which goes to stderr by default). Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted March 28, 2011 Report Share Posted March 28, 2011 Oops, I think that's wrong. Should be "strace ./pyrogenesis 2>output.txt" or "strace ./pyrogenesis 2>&1 | tee output.txt" or similar, hopefully. Quote Link to comment Share on other sites More sharing options...
detritux Posted March 28, 2011 Author Report Share Posted March 28, 2011 Here is the new strace. I'll try to get a simple example and see if I can report this as a bug to Arch. Cheers,Ugooutput.txt Quote Link to comment Share on other sites More sharing options...
janwas Posted March 28, 2011 Report Share Posted March 28, 2011 Thank you for this log, it is very helpful.I recognize events from various stages of GameSetup.cpp's Init():InitVfsg_Logger = new CLogger;CNetHost::Initialize();CONFIG_Init(args);and - crucially - nothing afterwards. This means our very first attempt to load a file fails, which should make this much easier to reproduce.It may be important to set up one or more FAM requests via dir_watch_Add, but otherwise, a simple File::Issue and FileImpl::WaitUntilComplete ought to cause the breakage.open("/home/ugo/Projects/0ad/binaries/data/config/default.cfg", O_RDONLY) = 6sched_getparam(8986, { 0 }) = 0sched_getscheduler(8986) = 0 (SCHED_OTHER)rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9056976000mprotect(0x7f9056976000, 4096, PROT_NONE) = 0clone(child_stack=0x7f9056978fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f90569799d0, tls=0x7f9056979700, child_tidptr=0x7f90569799d0) = 8990rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0futex(0x7fff99caa97c, FUTEX_WAIT_PRIVATE, 1, NULL) = 0write(1, "file.cpp(132): Function call fai"..., 80file.cpp(132): Function call failed: return value was -110301 (Error during IO)And this is supposed to be lio_listio and aio_suspend? So the horrible old glibc emulation of aio via threads is still in place? Dear lord. Quote Link to comment Share on other sites More sharing options...
Vincent Posted April 2, 2011 Report Share Posted April 2, 2011 Hi,I was pointed here from another thread (http://www.wildfiregames.com/forum/index.php?showtopic=14568), where I reported a similar (likely the same) issue. I've also attached a log of strace as well, if it helps any.If it's of any relevance, I'm running 64-bit Debian GNU/Linux, with a kernel based on 2.6.38.output.txt Quote Link to comment Share on other sites More sharing options...
janwas Posted April 2, 2011 Report Share Posted April 2, 2011 Thanks for this report. I see the same thing - the first aio call announces itself with sched_getparam, and boom.Interestingly, this is preceded by "ERROR: Error initializing FAM" on stdout (our output).Do you have gamin installed? Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted April 2, 2011 Report Share Posted April 2, 2011 The game works fine for me when Gamin is missing. I'd guess the problem might be when using a 2.6.38 kernel, given it's only started recently - I can try testing that myself when I next reboot. Has anyone run on 2.6.38 without getting this error, or has anyone got this error on earlier kernel versions? Quote Link to comment Share on other sites More sharing options...
janwas Posted April 2, 2011 Report Share Posted April 2, 2011 A patch for disabling aio is attached. Performance is going to suffer, but it should at least work (tested on Windows).disable_aio.patch Quote Link to comment Share on other sites More sharing options...
Vincent Posted April 3, 2011 Report Share Posted April 3, 2011 Just tested the patch as well, and now 0ad builds and runs itself without a hitch! Thank you! Now I can finally enjoy the game. Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted April 3, 2011 Report Share Posted April 3, 2011 I updated to kernel 2.6.38 (from 2.6.36), and now I get this error, so it definitely seems kernel-related. Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted April 3, 2011 Report Share Posted April 3, 2011 Broken by da48524eb20662618854bb3df2db01fc65f3070c.Fixed by 243b422af9ea9af4ead07a8ad54c90d4f9b6081a.2.6.38.2 has the first commit, not the second. (2.6.38.1 has neither). I don't know if anyone is going to backport the fix into a 2.6.38.3 or something.There's no possible user-space workaround, except for not using glibc aio at all. Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted April 3, 2011 Report Share Posted April 3, 2011 A patch for disabling aio is attached. Performance is going to sufferI see no measurable difference in load times, when running from SVN (no public.zip) with hot or cold cache.Could we just apply this patch with a "#if OS_LINUX / #define DISABLE_AIO 1", at least until some time in the future when nobody is still using the buggy kernel and there's a measurable performance impact? aio is presumably very rarely used (else someone would have likely noticed the bug before releasing the kernel), so avoiding it is good for compatibility. Quote Link to comment Share on other sites More sharing options...
janwas Posted April 3, 2011 Report Share Posted April 3, 2011 I see no measurable difference in load times, when running from SVN (no public.zip) with hot or cold cache.As you note, that'd be due to the fact that you're not using an archive, which aren't compressed in 0ad anyway.However, neither is true at work, so disabling AIO would hurt. As discussed, aio is now disabled (as if that patch were applied) on Linux. Quote Link to comment Share on other sites More sharing options...
Gallaecio Posted April 7, 2011 Report Share Posted April 7, 2011 If it really decreases performance, would it be possible to somehow check kernel version before compilation so the exception is just for that kernel? I understand previous kernels and those to be released later doesn't have this issue. Quote Link to comment Share on other sites More sharing options...
janwas Posted April 7, 2011 Report Share Posted April 7, 2011 hm, I don't think this is worth additional complexity in the build system.As noted above, this only hurts if you're wanting to do some computation while waiting for the previous asynchronous I/Os to complete. We (and most other) applications don't, but stuff at work that shares this codebase does. However, it's Windows-only and not affected by the disabling, so I don't feel any pressing urge to change things. Quote Link to comment Share on other sites More sharing options...
Ykkrosh Posted April 7, 2011 Report Share Posted April 7, 2011 (The check would have to be at runtime anyway, since you could build the game then upgrade your kernel to a buggy one, or could run a copy of the game that was built on a different system.) Quote Link to comment Share on other sites More sharing options...
Gallaecio Posted April 9, 2011 Report Share Posted April 9, 2011 (The check would have to be at runtime anyway, since you could build the game then upgrade your kernel to a buggy one, or could run a copy of the game that was built on a different system.)I'm thinking in distros offering 0ad, which could rebuild it as they upgrade kernel if needed, as I'm currently rebuilding it with the patch to run with our upgraded kernel. But if the perfomance issue is only in Windows, I guess it doesn't matter. Quote Link to comment Share on other sites More sharing options...
Gallaecio Posted August 19, 2012 Report Share Posted August 19, 2012 I am running into this with Linux 3.4.6, and the patch fails (I am guessing because of intermediate changes in lib/file/file.cpp).Maybe we need a parameter to ./update-workspaces.sh that disables aio? (--disable-aio). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.