Jim's
Tutorials

Spring 2010
course
navigation

Jim's comments

I agree that it's disappointing that there wasn't more of a measurable difference across these different file system parameters.
Given that you're using a standard benchmark, your method looks plausible. I don't know enough about disk performance factors to be sure. Though I do wonder what the factor limiting performance is, if it isn't this block size, and if there are other parameters that might show a bigger effect - or if the disk driver or hardware is somehow compensating.
I suppose it's also conceivable that these benchmarks are designed to measure the hardware performance (and therefore are doing just that) rather than the typical system file usage performance. Though I don't see why the two would be different.
Anyway, I think it's interesting that this approach (which still feels reasonable) doesn't confirm the discussion in the text.
- Jim

Richard's work

The goal of this project was to test the effect of different block sizes on hard disk transfer speeds. I did this by formatting several partitions of a hard disk to have different block sizes, then running a benchmark utility on the partitions.
Methodology
Hardware:
Intel Atom 230 processor
1GB Kingston RAM
200GB Seagate Barracuda 7200.7 hard disk drive
Software:
Arch Linux, running 64-bit Linux 2.6.33
IOzone 3.47, compiled to an AMD-64 executable on the test machine using GCC 4.5.0
In this test, I analyzed two different file systems--NTFS and Ext4--with block size of 512 bytes on NTFS, and block sizes of 1KB, 2KB, and 4KB on both file systems. I initially planned to examine BTRFS and XFS as well, but BTRFS (version 0.19) does not seem to support variable block sizes as well as it claims and XFS did not cooperate with my version of Linux.

The file systems were created with the standard Linux mkfs.ntfs and mkfs.ext4 commands; I modified the NTFS “sector size” and the EXT4 “block size” defaults, but assume that both of those refer to block size and not physical sector size. Each file system was given a blank 10GB partition which was not written to before the test. I performed the test with a shell script (attached) that mounted each partition, moved the IOzone binary to it, ran the benchmark, and copied the results file to my home directory.

The benchmarks that were taken in this test were: write speed, read speed, rewrite speed, reread speed, random read speed, random write speed, and strided read speed. Although I believe that other data such as CPU usage or disk latency would have been valuable, only disk transfer rate was recorded.

Results and Discussion
Although all of the recorded data is provided, note that the read data for file sizes < 2GB are less accurate as the test machine had enough RAM to buffer most or all of the test file, thus giving the read speed of the RAM rather than the disk. As such, I base my discussion on the 2GB test file. Also, IOzone reads and writes files in several record sizes; I only mention the largest record size, 16MB. The smaller sizes never produce significantly better performance for the tests used, and typically produce worse.

Finally, IOzone does not speak to statistical variance in its documentation; I am not sure if a repetition of a test would yield the same data, although I attempted to minimize all other variables and believe that recollected data would at least be similar.

The results of the test were disappointing, and perhaps speak to the futility of "tuning" modern file systems. I used as many block sizes as the tested file systems would allow, and expected a noticeable shift in performance when reading and writing larger file sizes, but this, overall, was not the case. EXT4 did achieve increased write speed with a block size > 1KB, but the speed of all the other tests on it actually decreased. NTFS had increased performance with larger block size, but NTFS with 512B block size had the highest reread performance of any of the tested settings; its random read performance was also among the highest, although those did not vary much.

While I did not perform a test that attempted to read multiple smaller files—a more commonly performed task for most purposes—I was more interested to see whether there was a benefit to larger block size than whether there was a drawback. Having not found much of a benefit means that testing for such a drawback would be pointless; there is no reason to prefer the larger size over the smaller.

This data counters Tanenbaum’s discussion on block size. It seems that modern file systems and operating systems cache, read ahead, and otherwise work around block size to the point that it is almost insignificant in the performance of a block device. Further, with the size of modern hard disks, it seems that the discussion on the size/space tradeoff of hard disk block sizes is moot. In order to eke greater performance from hard drives, one must look at tunables other than block size.

Footnote about looking at the actual data: It is space-delimited, so while it can be opened as a text file, it’s a lot nicer to look at if you import it into Excel or something and tell it to split on spaces (and then trim out the irrelevant text at the top).

One more footnote: I just realized that I was writing the benchmark data to a file on the same partition and drive. Oops. I don't think it should have done much, though.

attachments [paper clip]

     name last modified size
[COD]bench.sh May 6 2010 1:39 pm 1.50kB    benchdata.zip May 6 2010 1:39 pm 60.9kB [IMG]benchmark.png May 6 2010 1:29 pm 40.7kB