Jump to content
Sign in to follow this  
acme_pjz

aken.sys security concerns

Recommended Posts

Hello all developers,

I'm a Windows user and recently I download and installed 0 A.D. alpha8 version. After installation I surprisingly found some kernel-mode driver (sys) files in the game's binary folder, and there aren't any READMEs or documentations which explains what these drivers are and what they will use for. And in file property dialog I found that these drivers have invalid digital signatures (this situation often occurs in malicious software, namely 'fake digital signature'). You know that nowadays the Internet are full of malicious software, virus, trojans, backdoors and rootkits, and ordinal software should not contain any kernel-mode (Ring0) code, so at first I thought these files are rootkits and delete them immediately.

After that I searched the 0 A.D. forum and the source code, found that these files are high resolution timers which reads HPET timer, and the kernel-mode driver exports functions including manipulating \\Device\\PhysicalMemory, manipulating ports and reading/writing MSR register and so on. According to my experience and Windows experts, MSR register is used by SYSENTER instruction and reading/writing it can do some malicious stuff such as kernel function hook. And send malicious commands to certain port could lead to some behavior which can be used by malicious software, for example send commands to shut down the power immediately. Moreover, manipulating \\Device\\PhysicalMemory can read/write any position of kernel memory, including SSDT, shadow SSDT and other important function tables so one can use this to make malicious program or rootkits.

I believe that 0 A.D. contains no malicious code, but once the game loads the kernel-mode driver, then other malicious program can make use of this driver to perform harmful operations so the driver can be a huge security loophole of the system. I strongly recommend move all the HPET timer operations to the driver and remove the public export function of manipulating \\Device\\PhysicalMemory and ports and MSR register. IMHO the best solution is completely remove the kernel-mode driver (at least for normal players), because the usage of high resolution timer post on the forum says that it will be used on profiling only.

Share this post


Link to post
Share on other sites

Thanks for joining the forum to post your concern!

I imagine the similar winring0 library also removed most of its

functions after version 1.3 for these same reasons.

Let's begin with the signature. You will find it to be

basically OK and trusted by a root certificate, so the

driver's integrity and source can at least be established.

However, Vista and Win7 have apparently started requiring

an additional bit in the certificate, and I'm not clear

on which. "Basic Constraints" and "Key Usage" are marked

with an exclamation point, so those are probably the

culprits. "Digital Signature (80)" looks OK - maybe they

also want an additional one for kernel-mode drivers, but

I haven't come across mention of this and don't see which

of the other possibilities it would be.

Basic Constraints is AFAIK used to limit the chain length,

and I can't do anything about that. Indeed one problem we

are facing is that the certificate is from Fraunhofer,

but I will be leaving their employ within 2 weeks, having

finished my PhD. We therefore cannot count on changes to

the certificate or driver.

Now let's look at the situation concerning driving loading.

Due to the above issue, x86 Windows Vista and later will

refuse to start the Aken service. The x64 versions of

Windows are even more restrictive - they require signing

with a additional "cross certificate". On all of these

systems, the driver will not load unless the user disables

CodeIntegrity checks (resulting in a warning ever time such

drivers load) or enables test-signing mode (which

aken_install.bat is equipped to do). The latter is also a

security hazard in that it allows anything with a semi-decent

(even self-signed) certificate to load. Ironically enough,

the bone-headed attempts at securing the system by effectively

disallowing independent drivers lead to less security. You

will find a similar approach here:

http://www.freeotfe.org/docs/Main/impact_of_kernel_driver_signing.htm

On WinXP x86, the driver will actually load without user

intervention - unless the command-line argument -wNoMahaf

is given. That system is also where normal users' need for

the driver is greatest, because the time source may only have

a resolution of 1 or more ms (too low for game usage) and even

fall behind over time.

So, where to go from here? On the one hand, we could argue

that WinXP is totally insecure anyway because everything

basically runs as Admin. mahaf.cpp could easily examine the

Windows version and act accordingly. We can also throw

up our hands and say we'll do timing as badly as other

games - though problems have been reported, at least we

won't be alone. For example, -wNoMahaf could be made the

default by means of a windows shortcut. It could also be

done from within the game code, but hopefully in such a

fashion that the apps at work don't need an EnableMahaf()

call to be added. On that point - again, I will soon be

switching jobs, and the new employer is quite careful with

IP rights. It is possible that I will no longer safely be

able to contribute to 0ad - that will need to be discussed

with them first. (It would be a shame, because then the

codebase - which I am sure will continue to see use - would

fork and no longer be updated.)

One final point - the driver is also used at work to provide

access to the performance counters, some of which are not

available in VTune/Parallel Amplifier (e.g. Nehalem uncore

for bandwidth/memory controller measurements). This requires

write access to the MSRs. Moving all the logic into the driver

would be possible, but another huge change that won't happen

quickly, i.e. within the next 2 weeks.

So - what's y'all's opinion? How shall we proceed?

Share this post


Link to post
Share on other sites

On WinXP x86, the driver will actually load without user

intervention - unless the command-line argument -wNoMahaf

is given. That system is also where normal users' need for

the driver is greatest, because the time source may only have

a resolution of 1 or more ms (too low for game usage) and even

fall behind over time.

Could/should we set up the hwdetect reporter to report the resolution (and other relevant data) for each of the timers available to each user? That might help us quantify the number of users who have inadequate time sources and would benefit from this, rather than just hypothesising their existence.

(1ms resolution doesn't sound like it should cause serious problems anyway - I think the only semi-accurate measurement we really need is the length of a frame, so that we can interpolate unit movement smoothly across frames, but 1ms is a small fraction of the ideal 16ms frame length so it shouldn't cause noticeable interpolation error, and GPUs introduce latency with greater jitter than 1ms anyway.)

(Falling behind doesn't sound like it should cause serious problems either - a small amount of timer drift is inevitable (due to temperature and relativity etc) so the networking code needs to cope with it anyway (I think it currently doesn't but that's the least of its problems and it needs redesigning), and then it should cope fine with larger timer drift, as long as it's not worse than maybe 10ms drift per 1000ms (which would cause graphical jerkiness as it gets out of sync with other players by more than a frame length).)

We therefore cannot count on changes to the certificate or driver.

Being responsible for distributing unfixable code sounds like a pretty bad idea in general to me (as well as it being counter to the aims of the GPL, by removing users' control over the code running on their computer).

So - what's y'all's opinion? How shall we proceed?

My opinion (which I think I have given before) is that the solution is worse than the problem (in terms of the driver code introducing complexity and potential insecurity and potential bugginess and unfixability, and rightfully scaring users, and generally being an evil thing for a user-space game to do, vs the problem of getting slightly less accurate timers for a probably very small proportion of users), so I'd still be happy to remove the solution entirely :). (I'd be happier if we subsequently collected statistics of how small/large a proportion of users are affected by bad timers, to determine the size of the problem and either become more confident in the decision or discover a need to reassess the situation.)

Share this post


Link to post
Share on other sites

I think it would be enough to just report timer_Resolution.

(Incidentally, a colleague asked to namespacify timer.h, turning it

into timer::Resolution etc - any objections?)

The current code only allows choosing one timer (the best

available) and it'd be a spot of a bother to get them all.

However, we can still deduce a lot from that - presence of

an HPET can be gleaned from the chipset (i.e. CPU), and

I would recognize the other timers based on their frequency

(because the CPU frequency is also known).

Let me expand on the 1 ms. That's only if somewhat calls timeSetPeriod

and speeds up the timer interrupts to 1 kHz, which can slow down

the entire system by 10-15% and reduce "battery run time on mobile

systems by as much as 25 percent" [http://download.microsoft.com/download/3/0/2/3027D574-C433-412A-A8B6-5E0A75D5B237/Timer-Resolution.docx]

Certainly not an ideal solution, but it's still better than the

default of 15 ms (!), which is just terrible. And I suppose the

driver, while evil, may still be an attractive option on WinXP

laptops with semi-decent graphics cards.

As to falling behind, here's someone mentioning > 5 s slippage in about

30 minutes play (= 2777 ppm), which apparently took down their

network protocol:

http://osdir.com/ml/games.devel.windows/2002-09/msg00000.html

That's FAR worse than thermal drift, which is on the order of

200 ppm for a really grungy oscillator, and maybe 20 ppm for a

decent quartz.

However, if the network protocol is quite robust to such things,

then we can maybe forget about the slippage. I have no idea why

the previously mentioned game had that problem, but doubt it was

because the developers were idiots.

Good point about the spirit of the GPL. As you can tell, I really

hate the new driver signing hoops, especially because it's to

prevent circumventing DRM (which is truly something that removes

user control). As is, the driver was designed to provide all the

requisite possibilities and therefore not require changes.

Unfortunately that does bring the security issues with it :/

Changes to the driver are actually still possible - the current

signature is no better (actually apparently worse) than some

self-signing thing that could fairly easily be managed.

aken_install would still need to enable test mode, and

additionally just install a certificate.

That said, I agree about the insecurity and scariness. Again, it

comes down to how we balance that vs. battery life or timer problems.

Maybe we'd be best served with some kind of opt-in thing, which

is basically what aken_install already is. I have therefore

disabled StartDriver by default in mahaf.cpp.

Anyway, it certainly would be interesting to get that data

on the number of people with bad timers. Would you please

add timer_Resolution to the hwreport?

Share this post


Link to post
Share on other sites

Hi guys, I just installed 0 A.D. for the first time (Alpha 8). About 5 seconds after launching the game, it minimised and this message popped up:

post-13931-0-24766800-1325703633_thumb.p

I'm using Windows 7 x64 Professional by the way.

Share this post


Link to post
Share on other sites

Thanks for mentioning so. That is indeed the expected behavior on Win7 x64.

The error message is misleading or false in that the driver is not unsigned - it is just not signed with a special Microsoft certificate.

You can work around it by enabling test mode (see aken_install.bat) or launching the game with the -wNoMahaf command line switch or updating the game to the most recent SVN revision, which no longer attempts to start the driver. (It will still be accessible if you have used aken_install, though.)

Share this post


Link to post
Share on other sites
(Incidentally, a colleague asked to namespacify timer.h, turning it into timer::Resolution etc - any objections?)

(No, except for the minor theoretical concern that using the generic name "timer" in the global scope is more likely to conflict with other code (which might use "timer" as a global variable or a class etc) than the current function names are, so it's kind of defeating the goal of namespaces. I guess it should ideally be called something more like "jwlib::timer::Resolution()" to avoid potential naming collisions, except probably with a better name :))

it certainly would be interesting to get that data on the number of people with bad timers. Would you please add timer_Resolution to the hwreport?

I failed to do this earlier, but I've added it now, so I'll wait another while then look at the data.

Share this post


Link to post
Share on other sites
using the generic name "timer" in the global scope is more likely to conflict with other code

True. It's on the back burner for now as I've already left for Singapore; we'll see what the new employer's policy is.

And yes, naming is especially hard for this kind of miscellany. jwlib isnt so bad, cbloom appararently reached the same conclusion with cblib ;)

I failed to do this earlier, but I've added it now, so I'll wait another while then look at the data.

Cool, thanks! I am looking forward to seeing this data.

Share this post


Link to post
Share on other sites

Preliminary data from SVN users:


Frequency | Users
----------------|------
3656.000000 MHz | 1
3635.000000 MHz | 1
3392.000000 MHz | 1
3013.000000 MHz | 1
2194.000000 MHz | 1
2100.000000 MHz | 1
2094.000000 MHz | 1
1995.000000 MHz | 1
1495.000000 MHz | 1
1000.000000 MHz | 47
14.318180 MHz | 3
14.285714 MHz | 1
3.579545 MHz | 2
3.092138 MHz | 1
3.092119 MHz | 1
2.441416 MHz | 1
1.948281 MHz | 1
1.000000 MHz | 1
1.000000 MHz | 1
1.000000 MHz | 2
0.000500 MHz | 2

(If a single user reports different frequencies each time they run the game, they'll be counted once for each value reported; I don't know if that's occurring in this data.)

Share this post


Link to post
Share on other sites

Interesting, thanks for the data. Looks like 9 people with the TSC (though the 1.5 GHz one is quite a low frequency).

The vast majority is apparently running Linux, which reports 1 GHz. Also interesting - they probably scale the

TSC. 4 are using the HPET - unsure whether on Linux or via Aken (and one of them has the frequency hopping thing enabled,

which is why it doesnt exactly hit the frequency of the classic PC master clock). 2 using PMT, which is rather buggy and bad.

It'd really be good to combine this data with OS. Then we've got (most likely) 4 Windows users where QPC is returning TSC/1024.

2 poor guys are most probably using our TGT fallback - that's the one with miserable millisecond resolution and the falling-behind bug.

This is regrettable - I'd love to ask them what hardware they are on. Can we get at that data?

Finally, we have 4 people with 1 MHz (why aren't they combined into one bin) - not sure what that one is, probably Linux again.

It looks like 2(PMT)+4(scaled TSC)+2 TGT out of ~15-20 Windows users would benefit from HPET (the ones with scaled TSC only slighty).

Does seem worthwhile for them. Interestingly, the majority is running Linux (or Mac, I have no idea what they report).

Really do need the OS info to draw a conclusion :)

Share this post


Link to post
Share on other sites
we have 4 people with 1 MHz (why aren't they combined into one bin)

They're not quite 1MHz - their resolutions are different from around the 13th significant figure.

Really do need the OS info to draw a conclusion :)

Does this work better? (Data is grouped by OS and frequency; "CPUs" is the CPU identifiers of everyone in that row, skipping the Linux row because there's loads. The OS "Unix" is a non-Linux Unix, presumably BSD.)

I hope I'm not influencing the results too much with all my VMs

We probably need a larger sample size before drawing any reliable conclusions, in which case your influence will be negligible :P

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...