0 A.D. menu bug

Alexander Livingstone · December 5, 2021

I will read that all and follow tomorrow sry I'm tired.

**Stan`** · December 5, 2021

4 hours ago, nwtour said:
This 2.5 percent is offset by more efficient indexing in the file system.

If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation
10kk stat()
time perl -e 'my $i = 0;while(1) { stat("./binaries/system/pyrogenesys"); $i++; exit unless $i % 10_000_000; }'
10.47 sec

10kk tar stat
time perl -MArchive::Tar -e 'my $a = Archive::Tar->new("test.tar");while(1) { $a->contains_file("file"); $i++; exit unless $i % 10_000_000; }'
118.23 sec

I suppose linux does things differently too

4 hours ago, nwtour said:
This 2.5 percent is offset by more efficient indexing in the file system.

If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation
10kk stat()
time perl -e 'my $i = 0;while(1) { stat("./binaries/system/pyrogenesys"); $i++; exit unless $i % 10_000_000; }'
10.47 sec

10kk tar stat
time perl -MArchive::Tar -e 'my $a = Archive::Tar->new("test.tar");while(1) { $a->contains_file("file"); $i++; exit unless $i % 10_000_000; }'
118.23 sec

I suppose linux does things differently too

**vladislavbelov** · December 6, 2021

22 hours ago, nwtour said:

This 2.5 percent is offset by more efficient indexing in the file system.

If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation

There is also a file system sector size which affects alignment of files (it's usually 4KiB on Windows).

I've tested on Windows few file sizes with 2 mods with the same list of files: unpacked and packed.

File size 480 bytes, unpacked costs +972% reading time than packed (total read ~4MiB).
File size 2333 bytes, unpacked costs +690% reading time than packed (total read ~80MiB).
File size 111333 bytes, unpacked costs +9.7% reading time than packed (total read ~1GiB).
File size 521111 bytes, unpacked costs -32% reading time than packed (total read ~1GiB).

nwtour · December 6, 2021

1 hour ago, vladislavbelov said:

File size 480 bytes, unpacked costs +972% reading time than packed (total read ~4MiB).

File size 2333 bytes, unpacked costs +690% reading time than packed (total read ~80MiB).

File size 111333 bytes, unpacked costs +9.7% reading time than packed (total read ~1GiB).

File size 521111 bytes, unpacked costs -32% reading time than packed (total read ~1GiB).

On a hard disk with a cache of 128 megabytes (WDC WD2005FBYZ), multi reading of 4 megabytes from from one file is very cheap compared to random files in the file system

If this was meant as a "contiguous memory", then I agree

Edited December 6, 2021 by nwtour

**vladislavbelov** · December 6, 2021

31 minutes ago, nwtour said:

On a hard disk with a cache of 128 megabytes (WDC WD2005FBYZ), multi reading of 4 megabytes from from one file is very cheap compared to random files in the file system

If this was meant as a "contiguous memory", then I agree

4MiB are read in both cases: a) for unpacked they are read in a random order file by file with 480 bytes each b) for packed they are read in a random order from a ZIP archive file by file 480 bytes each. All tests have multiple prewarms (they read total size multiple times), so in theory I should have warm HDD caches.

Ideally I expect if I read multiple files with a total size less than a cache size then it should have similar time for unpacked and packed mods.

But as I mentioned a file system has own meta information and alignment which might increase a "reading cost" of small files.

nwtour · December 6, 2021

1 hour ago, vladislavbelov said:

4MiB are read in both cases: a) for unpacked they are read in a random order file by file with 480 bytes each b) for packed they are read in a random order from a ZIP archive file by file 480 bytes each. All tests have multiple prewarms (they read total size multiple times), so in theory I should have warm HDD caches.

Ideally I expect if I read multiple files with a total size less than a cache size then it should have similar time for unpacked and packed mods.

But as I mentioned a file system has own meta information and alignment which might increase a "reading cost" of small files.

There is a disk cache - it works with very high speed It loads the data around the requested seek()/read() system calls. This allows you to load it completely to the cache in case of requests to one file and receive data at the SATA-2 interface speed. https://www.alphr.com/what-is-hard-drive-cache/ (Block Reading Ahead and Behind)

You argue that both tests worked with a disk cache and an archive 10 times faster. It can not be

I brought the test that the meta-information from the file system is always faster than metainformation from the archive. The file system is an hot indexed file information database in the kernel space. And any archive - always has a primitive way on requests inside

**vladislavbelov** · December 6, 2021

38 minutes ago, nwtour said:

You argue that both tests worked with a disk cache and an archive 10 times faster. It can not be

It might be hard to believe but it can

39 minutes ago, nwtour said:

The file system is an hot indexed file information database in the kernel space. And any archive - always has a primitive way on requests inside

It's true, but it has a tiny detail. Even the most powerful general filesystem is general, which means it has own tradeoffs. And it can't fit for all possible cases. The same for archives.

So the trick is that when we load a mod in pyrogenesis we cache a list of its files and store an offset for each of them. Which means we don't need to do an expensive system call for each file.

43 minutes ago, nwtour said:

I brought the test that the meta-information from the file system is always faster than metainformation from the archive.

I might only assume that in your case tar has a worse index.

nwtour · December 7, 2021

19 hours ago, vladislavbelov said:

So the trick is that when we load a mod in pyrogenesis we cache a list of its files and store an offset for each of them. Which means we don't need to do an expensive system call for each file.

Red card for unsportsmanlike conduct
This is obtained by re-implementation of the NoSQL database.

Edited December 7, 2021 by nwtour

**vladislavbelov** · December 7, 2021

8 minutes ago, nwtour said:

Red card for unsportsmanlike conduct
This is obtained by re-implementation of the NoSQL database.

"It's not stupid if it works"

So now you understand why we can use a plain archive instead of raw files.

Sign In

0 A.D. menu bug

Recommended Posts

Alexander Livingstone

Link to comment

Share on other sites

Stan`

Link to comment

Share on other sites

vladislavbelov

Link to comment

Share on other sites

nwtour

Link to comment

Share on other sites

vladislavbelov

Link to comment

Share on other sites

nwtour

Link to comment

Share on other sites

vladislavbelov

Link to comment

Share on other sites

nwtour

Link to comment

Share on other sites

vladislavbelov

Link to comment

Share on other sites

Join the conversation

Forums