Jump to content

File.cpp(132): Function Call Failed: Return Value Was -110301 (Error During Io)


detritux
 Share

Recommended Posts

Hi,

I just tried to compile 0ad from the git repository on arch linux (64). I get the following error when starting the game:

file.cpp(132): Function call failed: return value was -110301 (Error during IO)

Function call failed: return value was -110301 (Error during IO)

Location: file.cpp:132 (WaitUntilComplete)

I attached the crashlog.txt.

Any help greatly appreciated.

Cheers,

crashlog.txt

Link to comment
Share on other sites

Thanks for the report; aio_return failing is bad news. What kind of disk is this on?

Can you tell us the value of errno at that spot in the code?

Finally, could you please add some instrumentation to check what the actual return values of

aio_error and aio_suspend were?

I'd recommend replacing the while loop at file.cpp:124 with something like:


int error = 1234, suspend = 1234;
for(;
{
error = aio_error(&req);
if(error != EINPROGRESS)
{
debug_printf(L"aio_error %d %d\n", error, errno);
break;
}
aiocb* const reqs = &req;
suspend = aio_suspend(&reqs, 1, (timespec*)0); // wait indefinitely
debug_printf(L"aio_suspend %d %d\n", suspend, errno);
}

and adding a [debug_]printf of errno and the return value of aio_return.

Link to comment
Share on other sites

I added the debug you asked:

During the loop:

aio_suspend 0 0

aio_error 1 0

And after the loop:

errno=0 aio_return=-1

I also added a debug in the Open function of this same file (line 41), and it appears it's trying to open "/" :

Here is the line I used on line 42.

printf("Opening %s\n", pathname.string().c_str());

Any ideas?

Cheers,

Edited by detritux
Link to comment
Share on other sites

Thanks for that info!

So we have a call to aio_error returning EINPROGRESS, then successful aio_suspend, but aio_error returning EPERM. That's weird.

There's a scary description of a race condition with a similar result:

https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=595499

However, this always happens, right? (i.e. it's not a race condition)

Can you run the game in strace and post the resulting log? Maybe we'll see some obviously incorrect parameter.

Just to rule out other weirdness: this is on a plain hard disk with no special/rare filesystem, right?

Link to comment
Share on other sites

hm, Philip has wow-ed me before with an strace log listing each syscall and the parameters.

Unfortunately that doesn't seem to have worked here - it just shows the usual program output (or did we get the wrong file?)

I am relieved to hear "previously working builds failed", i.e. it isn't due to recent changes and is probably an OS bug. This is actually plausible because I don't think many programs use aio.

Are you willing and able to report this bug in your distro? Not sure which channels are appropriate there.

It'd be interesting to just call those basic file APIs on existing files - it doesn't seem to be failing immediately, since there is mention of texture conversion in the backtrace and we open/read other files first (several config files and then hwdetect.js). If we get a relatively simple reproducible test case (probably requires reading multiple files) showing the problem, it's much more likely to be fixed.

Link to comment
Share on other sites

Thank you for this log, it is very helpful.

I recognize events from various stages of GameSetup.cpp's Init():

InitVfs

g_Logger = new CLogger;

CNetHost::Initialize();

CONFIG_Init(args);

and - crucially - nothing afterwards. This means our very first attempt to load a file fails, which should make this much easier to reproduce.

It may be important to set up one or more FAM requests via dir_watch_Add, but otherwise, a simple File::Issue and FileImpl::WaitUntilComplete ought to cause the breakage.


open("/home/ugo/Projects/0ad/binaries/data/config/default.cfg", O_RDONLY) = 6
sched_getparam(8986, { 0 }) = 0
sched_getscheduler(8986) = 0 (SCHED_OTHER)
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f9056976000
mprotect(0x7f9056976000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f9056978fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f90569799d0, tls=0x7f9056979700, child_tidptr=0x7f90569799d0) = 8990
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7fff99caa97c, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
write(1, "file.cpp(132): Function call fai"..., 80file.cpp(132): Function call failed: return value was -110301 (Error during IO)

And this is supposed to be lio_listio and aio_suspend? So the horrible old glibc emulation of aio via threads is still in place? Dear lord.

Link to comment
Share on other sites

The game works fine for me when Gamin is missing. I'd guess the problem might be when using a 2.6.38 kernel, given it's only started recently - I can try testing that myself when I next reboot. Has anyone run on 2.6.38 without getting this error, or has anyone got this error on earlier kernel versions?

Link to comment
Share on other sites

A patch for disabling aio is attached. Performance is going to suffer

I see no measurable difference in load times, when running from SVN (no public.zip) with hot or cold cache.

Could we just apply this patch with a "#if OS_LINUX / #define DISABLE_AIO 1", at least until some time in the future when nobody is still using the buggy kernel and there's a measurable performance impact? aio is presumably very rarely used (else someone would have likely noticed the bug before releasing the kernel), so avoiding it is good for compatibility.

Link to comment
Share on other sites

I see no measurable difference in load times, when running from SVN (no public.zip) with hot or cold cache.

As you note, that'd be due to the fact that you're not using an archive, which aren't compressed in 0ad anyway.

However, neither is true at work, so disabling AIO would hurt. As discussed, aio is now disabled (as if that patch were applied) on Linux.

Link to comment
Share on other sites

hm, I don't think this is worth additional complexity in the build system.

As noted above, this only hurts if you're wanting to do some computation while waiting for the previous asynchronous I/Os to complete. We (and most other) applications don't, but stuff at work that shares this codebase does. However, it's Windows-only and not affected by the disabling, so I don't feel any pressing urge to change things.

Link to comment
Share on other sites

(The check would have to be at runtime anyway, since you could build the game then upgrade your kernel to a buggy one, or could run a copy of the game that was built on a different system.)

I'm thinking in distros offering 0ad, which could rebuild it as they upgrade kernel if needed, as I'm currently rebuilding it with the patch to run with our upgraded kernel. But if the perfomance issue is only in Windows, I guess it doesn't matter.

Link to comment
Share on other sites

  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...