ZFS space accounting & storage efficiency: quick notes

December 5, 2024

Actually, The official guideline rules.
First of all, use ls -ls --block-size=1 or du -B 1 to read physical file size. Physical file size traditionally means how much physical disk space this file has taken. However this is not true in ZFS. I performed some experiments and it is not affected by vdev type and level of raidZ but apparently affected by compression and other fs metadata like extended attributes.

For incompressible files, the physical size is always greater than actual size. It means that ZFS will never be 100% efficient in this case. I couldn't find any way to optimize the storage efficiency here (ZFS pre-allocates metadata space, so turning off checksum does not reduce the space).
This awesome blog post wrote about ZFS under-the-hood layers and there is a lot between this physical file size to what is being stored to disk. According to this blog ZFS splits the file by recordsize, compress & pads each split to nearest 512B (there are nuiances here, please refer to the blog post), then try to distribute the file with “physical blocks” (defined by ashift parameter when the pool was created). This distribution may be straightforward in non-raidZ vdevs.

One storage efficiency pitfall here is disabling the compression. Seems to me that without compression ZFS will directly store each recordsize split. This means, if a file terminates in the middle of a split, it will be padded with 0 to recordsize and they are directly stored to disk without any reduction of those 0s.

Zvols are different but also covered in this post.
In the case of raidZ, here is another very informative blog post. It describes additional padding ZFS does in raidZ. The spreadsheet the author created is very useful when deciding vdev width with your workload (more specifically recordsize or volblocksize).

Two factors that affects storage efficiency:
- For a X-wide raidZn array, normally ZFS should allocate (X-n) data blocks with n parity blocks. But at the end of the file ZFS would always allocate the parity blocks even if there are less than X-n data blocks. So the very last parity blocks aren't so efficient. Let's define “inefficient space” as if the file grows by this much, there aren't more parity blocks needed. Then the “inefficient space” is filesize mod (X-n). Assume a even distribution of file size, the average “inefficient space” per file is (X-n)/2*blocksize.
- For a raidZn array, ZFS will always allocate multiple of (n+1) blocks to contain both data and parity. So average waster space per file here is (n+1)/2*blocksize.

You can reach out to me at [email protected], or via Mastodon.