This week we reported to Apple a serious flaw in macOS that can lead to data loss when using an APFS-formatted disk image. Until Apple issues a macOS update that resolves this problem, we're dropping support for APFS-formatted disk images.
Note: What I describe below applies to APFS sparse disk images only — ordinary APFS volumes (e.g. your SSD startup disk) are not affected by this problem. While the underlying problem here is very serious, this is not likely to be a widespread problem, and will be most applicable to a small subset of backups. Disk images are not used for most backup task activity, they are generally only applicable when making backups to network volumes. If you make backups to network volumes, read on to learn more.
Disk images are handy devices. They're files, but they act like a hard drive – you mount a disk image by double-clicking the file, then it behaves like it's another hard drive attached to your Mac. macOS has been using disk images for decades, and we find them particularly useful when making backups to network volumes. By formatting the disk image volume using an Apple-native format, we can do things like back up system files.
Naturally, when Apple introduced APFS in macOS High Sierra, we sought to offer support for using APFS on destination disk images when doing so would match the format of the source volume. As far as creating and mounting disk images is concerned, APFS and HFS+ are easily interchangeable, so adding support for APFS was very straightforward. Unnoticed by us, Apple, and thousands of developers, however, is a very subtle behavioral difference that is specific to APFS on a sparse disk image.
Earlier this week I noticed that an APFS-formatted sparsebundle disk image volume showed ample free space, despite that the underlying disk was completely full. Curious, I copied a video file to the disk image volume to see what would happen. The whole file copied without error! I opened the file, verified that the video played back start to finish, checksummed the file – as far as I could tell, the file was intact and whole on the disk image. When I unmounted and remounted the disk image, however, the video was corrupted. If you've ever lost data, you know the kick-in-the-gut feeling that would have ensued. Thankfully, I was just running some tests and the file that disappeared was just test data. Taking a closer look, I discovered two bugs in macOS's "diskimages-helper" service that lead to this result.
An APFS volume's free space doesn't reflect a smaller amount of free space on the underlying disk
In the past with HFS+ formatted disk images, the disk image volume would automatically adjust its free space to accommodate any differences between the disk image volume's capacity and the actual amount of free space on the underlying disk. So for example, if you had created a disk image with a capacity of 500GB on a 500GB network volume, but then you added 400GB of stuff to the network volume outside of the disk image, now there's only 100GB of space for stuff on the disk image. Accordingly, when you mount the disk image, it would report its own disk usage as 400GB and its free space as 100GB (even if there is literally nothing on the disk image volume). The math always felt weird, but the result was right – the disk image can't practically accommodate more than 100GB of data, so the free space should reflect that. This behavior is documented in Apple's
hdiutil man page:
To prevent errors when a filesystem inside of a sparse image has more free space than the volume holding the sparse image, HFS+ volumes inside sparse images will report an amount of free space slightly less than the amount of free space on the volume on which image resides.
This behavior has been a known quantity for many, many years. HFS+ still performs this adjustment on High Sierra, so it does not appear to be a regression, rather just an oversight that is specific to APFS.
If this were the only bug, however, this issue would be just an annoyance. The larger issue occurs when any application tries to write more data to the disk image volume than the underlying disk can accommodate.
The diskimages-helper application doesn't report errors when write requests fail to grow the disk image
The diskimages-helper application works quietly in the background, responding to filesystem requests made to the disk image volume. It's essentially a broker, or middle-man. There's a disk image file on disk, but applications don't interact directly with the file, they need to interact with a filesystem. So the diskimages-helper application presents a filesystem interface on top of that disk image file. When you make a write request to a mounted disk image volume, the request goes to the diskimages-helper application, which translates that request into changes to the disk image file.
When you initially create a "sparse" disk image file, that file is very small, e.g. <100MB. It's just large enough to hold some pre-allocated space for the filesystem structures. As you copy files to the disk image volume, the file grows. Herein lies the bug. Following the earlier example, suppose you attempt to copy 200GB of data to that 500GB disk image file. This shouldn't be possible, because there was only 100GB of free space left on the underlying disk. The APFS disk image reports that there's 500GB of free space available, though, so what the heck, let's do this! The first 100GB of data does successfully get written into the disk image file – the disk image file has grown now to 100GB. But now the underlying disk is completely full, and the disk image file can no longer grow – the diskimages-helper application is getting "No space left on device" errors when trying to write data to its band files. At this point, you'd think that the diskimages-helper application would do one of the following:
- Report a "No space left on device" error to the process making the write request
- Refuse additional write requests – sorry, no more space
- Unmount the disk image – we have to stop this insanity
- Quit – please, just stop pretending to do something that you're not actually doing!
- Panic the kernel – we're writing data into a VOID, STOP!
Alas, none of those things happen (and no, it should never panic the kernel, but writing to the void is equally unreasonable). diskimages-helper continues to accept writes, and the application asking to write the data continues to send data, eventually completing with apparent success.
The final illusion
After files failed to actually make it to a physical disk somewhere, you'd think (hope?) that perhaps, at least, the file would appear smaller on the disk image. This is probably the most alarming part of this bug – because the filesystem structures are stored on a section of pre-allocated space on the underlying disk, the diskimages-helper application has no trouble updating filesystem metadata. So file size, modification date, permissions, etc – all of those attributes are fine. In yet another bizarre twist, we found that many times a truncated file would even validate a checksum test. Presumably the diskimages-helper retains some of the file data in RAM, because again, the data never made it to the underlying disk. This part is perhaps the hardest to explain in text, so I created a video to demonstrate the problem:
Proactively avoiding data loss
CCC creates and uses disk images when you select "New disk image..." from the Destination selector. Starting in CCC 5.0.4 and up to CCC 5.0.8, CCC will automatically create an APFS-formatted disk image if the source is an APFS-formatted volume. Today we're posting an update to CCC (5.0.9) that will revoke support for creating APFS-formatted disk images. Additionally, if you have a task that is currently backing up to an APFS-formatted disk image, CCC will issue a warning when that task completes indicating that APFS-formatted disk images are no longer supported by CCC. This will not prevent you from using an APFS-formatted disk image, and indeed, if your underlying destination is not overly full, there's no need for panic here. Nevertheless, disk images eventually fill up, so our recommendation to our users will be to migrate away from an APFS-formatted disk image at your earliest convenience.
What should the average user do?
The average CCC user will be unaffected by this APFS shortcoming. CCC 4 users are completely unaffected; users not yet on High Sierra are unaffected. Our usage statistics indicate that less than 7% of CCC backup tasks leverage a sparse disk image, and of those, less than 12% are APFS-formatted. We recommend that CCC 5 users update to 5.0.9, and then make a brief review of your backup tasks – open CCC and select each task, and read the task plan. If the task is configured to back up to a disk image, it will plainly state that, e.g.:
CCC will copy selected items from Macintosh HD into a disk image at Macintosh HD.sparsebundle on NAS Backup.
If you do have a task that is configured to back up to a sparsebundle or sparseimage disk image, hover your mouse over the source icon. A tooltip will indicate how the source volume is formatted. If your source's filesystem is APFS, then your destination disk image might be formatted as APFS as well. When you run that backup task, CCC will update the disk image as usual, and then when the task completes, CCC will issue an error if that disk image is formatted as APFS. If you see that warning, we recommend deleting the destination disk image at your earliest convenience. Again, if the underlying destination is not very full and has never been near capacity (e.g. if it has always had more than 50GB of free space), then there's no reason to be alarmed and you can remove the destination disk image on a weekend or at night when you have ample time to allow CCC to recreate an HFS+ formatted disk image. If your underlying disk has ever filled up to capacity, though, you should delete the disk image and allow CCC to replace it with an HFS+ formatted disk image.
Note: Disk images that end with a .dmg suffix are not affected by this problem. The storage for dmg disk image files is pre-allocated when those files are created, so their storage space is guaranteed. CCC specifically does still provide support for APFS formatted dmg files.
Is this a problem specific to CCC?
No, this problem will affect any application that writes to APFS-formatted sparse disk images that reside on a full or nearly-full disk. I tested this scenario with copies via the Terminal application, Finder copies, and even exported a file from QuickTime. In every case, the application that was copying or creating the file was completely unaware that any problem had occurred when writing the data to disk. In the QuickTime case, I was able to immediately open the exported file and play it start to finish. After ejecting and remounting the disk image, QuickTime qouldn't open the file. The core of this problem resides in macOS's diskimages-helper application and can only be resolved by an update to macOS.
Until Apple resolves this disk images bug, we strongly recommend that people avoid using APFS-formatted sparse disk images for any purpose with any application.
Update March 30, 2018: This issue persists on macOS 10.13.4 (17E199)