Alexander Livingstone Posted December 5, 2021 Author Report Share Posted December 5, 2021 I will read that all and follow tomorrow sry I'm tired. Quote Link to comment Share on other sites More sharing options...
Stan` Posted December 5, 2021 Report Share Posted December 5, 2021 4 hours ago, nwtour said: This 2.5 percent is offset by more efficient indexing in the file system. If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation 10kk stat() time perl -e 'my $i = 0;while(1) { stat("./binaries/system/pyrogenesys"); $i++; exit unless $i % 10_000_000; }' 10.47 sec 10kk tar stat time perl -MArchive::Tar -e 'my $a = Archive::Tar->new("test.tar");while(1) { $a->contains_file("file"); $i++; exit unless $i % 10_000_000; }' 118.23 sec I suppose linux does things differently too 4 hours ago, nwtour said: This 2.5 percent is offset by more efficient indexing in the file system. If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation 10kk stat() time perl -e 'my $i = 0;while(1) { stat("./binaries/system/pyrogenesys"); $i++; exit unless $i % 10_000_000; }' 10.47 sec 10kk tar stat time perl -MArchive::Tar -e 'my $a = Archive::Tar->new("test.tar");while(1) { $a->contains_file("file"); $i++; exit unless $i % 10_000_000; }' 118.23 sec I suppose linux does things differently too Quote Link to comment Share on other sites More sharing options...
vladislavbelov Posted December 6, 2021 Report Share Posted December 6, 2021 22 hours ago, nwtour said: This 2.5 percent is offset by more efficient indexing in the file system. If you change instead of reading - for access to file meta-information, then the file system will play out the time saved due to defragmentation There is also a file system sector size which affects alignment of files (it's usually 4KiB on Windows). I've tested on Windows few file sizes with 2 mods with the same list of files: unpacked and packed. File size 480 bytes, unpacked costs +972% reading time than packed (total read ~4MiB). File size 2333 bytes, unpacked costs +690% reading time than packed (total read ~80MiB). File size 111333 bytes, unpacked costs +9.7% reading time than packed (total read ~1GiB). File size 521111 bytes, unpacked costs -32% reading time than packed (total read ~1GiB). Quote Link to comment Share on other sites More sharing options...
nwtour Posted December 6, 2021 Report Share Posted December 6, 2021 (edited) 1 hour ago, vladislavbelov said: File size 480 bytes, unpacked costs +972% reading time than packed (total read ~4MiB). File size 2333 bytes, unpacked costs +690% reading time than packed (total read ~80MiB). File size 111333 bytes, unpacked costs +9.7% reading time than packed (total read ~1GiB). File size 521111 bytes, unpacked costs -32% reading time than packed (total read ~1GiB). On a hard disk with a cache of 128 megabytes (WDC WD2005FBYZ), multi reading of 4 megabytes from from one file is very cheap compared to random files in the file system If this was meant as a "contiguous memory", then I agree Edited December 6, 2021 by nwtour Quote Link to comment Share on other sites More sharing options...
vladislavbelov Posted December 6, 2021 Report Share Posted December 6, 2021 31 minutes ago, nwtour said: On a hard disk with a cache of 128 megabytes (WDC WD2005FBYZ), multi reading of 4 megabytes from from one file is very cheap compared to random files in the file system If this was meant as a "contiguous memory", then I agree 4MiB are read in both cases: a) for unpacked they are read in a random order file by file with 480 bytes each b) for packed they are read in a random order from a ZIP archive file by file 480 bytes each. All tests have multiple prewarms (they read total size multiple times), so in theory I should have warm HDD caches. Ideally I expect if I read multiple files with a total size less than a cache size then it should have similar time for unpacked and packed mods. But as I mentioned a file system has own meta information and alignment which might increase a "reading cost" of small files. Quote Link to comment Share on other sites More sharing options...
nwtour Posted December 6, 2021 Report Share Posted December 6, 2021 1 hour ago, vladislavbelov said: 4MiB are read in both cases: a) for unpacked they are read in a random order file by file with 480 bytes each b) for packed they are read in a random order from a ZIP archive file by file 480 bytes each. All tests have multiple prewarms (they read total size multiple times), so in theory I should have warm HDD caches. Ideally I expect if I read multiple files with a total size less than a cache size then it should have similar time for unpacked and packed mods. But as I mentioned a file system has own meta information and alignment which might increase a "reading cost" of small files. There is a disk cache - it works with very high speed It loads the data around the requested seek()/read() system calls. This allows you to load it completely to the cache in case of requests to one file and receive data at the SATA-2 interface speed. https://www.alphr.com/what-is-hard-drive-cache/ (Block Reading Ahead and Behind)You argue that both tests worked with a disk cache and an archive 10 times faster. It can not be I brought the test that the meta-information from the file system is always faster than metainformation from the archive. The file system is an hot indexed file information database in the kernel space. And any archive - always has a primitive way on requests inside Quote Link to comment Share on other sites More sharing options...
vladislavbelov Posted December 6, 2021 Report Share Posted December 6, 2021 38 minutes ago, nwtour said: You argue that both tests worked with a disk cache and an archive 10 times faster. It can not be It might be hard to believe but it can 39 minutes ago, nwtour said: The file system is an hot indexed file information database in the kernel space. And any archive - always has a primitive way on requests inside It's true, but it has a tiny detail. Even the most powerful general filesystem is general, which means it has own tradeoffs. And it can't fit for all possible cases. The same for archives. So the trick is that when we load a mod in pyrogenesis we cache a list of its files and store an offset for each of them. Which means we don't need to do an expensive system call for each file. 43 minutes ago, nwtour said: I brought the test that the meta-information from the file system is always faster than metainformation from the archive. I might only assume that in your case tar has a worse index. Quote Link to comment Share on other sites More sharing options...
nwtour Posted December 7, 2021 Report Share Posted December 7, 2021 (edited) 19 hours ago, vladislavbelov said: So the trick is that when we load a mod in pyrogenesis we cache a list of its files and store an offset for each of them. Which means we don't need to do an expensive system call for each file. Red card for unsportsmanlike conduct This is obtained by re-implementation of the NoSQL database. Edited December 7, 2021 by nwtour Quote Link to comment Share on other sites More sharing options...
vladislavbelov Posted December 7, 2021 Report Share Posted December 7, 2021 8 minutes ago, nwtour said: Red card for unsportsmanlike conduct This is obtained by re-implementation of the NoSQL database. "It's not stupid if it works" So now you understand why we can use a plain archive instead of raw files. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.