A bug in macOS 10.15.5 impacts bootable backups, but we've got you covered

APFS

A bug in macOS 10.15.5 impacts bootable backups, but we've got you covered

Update (July 16, 2020): Apple fixed the underlying OS issue described below in macOS 10.15.6. We'll have a CCC update out in the next week or so that puts our workaround on the back burner. Every challenge opens up new opportunities, and that's actually how we're seeing this incident in retrospect. Rather than just hoping for a fix, we invested in a solution, and that solution puts us in a solid position for the next major OS.

Update (May 29, 2020): This issue is now addressed in CCC 5.1.18, which is available for immediate download – CCC can make bootable backups of macOS 10.15.5. Choose "Check for Updates" from the Carbon Copy Cloner menu to get the latest version of CCC. Special thanks to my team members for helping put this release together so quickly, the folks at Wordcrafts for a wicked-fast turnaround on these UI translations, and for several beta testers that helped us knock out some kinks along the way.


Early last week we discovered an APFS filesystem bug in a beta of macOS 10.15.5. The technical details of the bug are laid out below, but the short version is that we're no longer able to use our own file copier to establish an initial bootable backup of a macOS Catalina System volume. To be very clear – existing backups are unaffected, and this has no effect on CCC's ability to preserve your data, nor any effect on the integrity of the filesystems on your startup disk or your backup disk. The impact of this bug is limited to the initial creation of a bootable backup.

So that's a lemon... But hey, summer has arrived here in the northern hemisphere, so let's make some lemonade!

Creating bootable backups in a post-10.15.5 world

Last year at Apple's Developer Conference, Apple suggested that backup software could use Apple's "Apple Software Restore" (ASR) for cloning APFS volume groups. Initially I dismissed this – I shouldn't have to use Apple's black-box utility to do my job, I prefer to take full responsibility for my backups. Anticipating a world in which Apple continues to restrict access to APFS rather than grant it, though, we decided to invest a fair amount of time evaluating this functionality, and we've been beta testing it for the last 8 months. I don't like to lean on ASR for general backups because it has some shortcomings and doesn't give any insight into its internal activity (e.g. files copied, errors encountered), but in this very narrowly-defined case, we can leverage Apple's proprietary utility just to establish bootable backups. We posted a beta last Sunday with new UI around this functionality, and we intend to continue producing bootable backups by leveraging ASR for the initial backup.

I already have backups of 10.15.4, do I need to do anything special after updating to 10.15.5?

No. If you established your backup on a previous version of Catalina, then your backup volume already has functional firmlinks and CCC will continue to update that volume just fine. Apply the CCC 5.1.18 update, but no additional special steps are required.

I'm trying to create a new backup of 10.15.5. How do I proceed to create a bootable backup?

If you're running 10.15.5 and you're backing up a Catalina system volume to an empty disk, then you should apply the CCC 5.1.18 update. After updating to 5.1.18, run your backup task again. If any corrective action is required, CCC will present the options to you automatically. If your backup task runs successfully, you're all set.

The new functionality is documented here:

Cloning macOS System volumes with Apple Software Restore


What did Apple break in 10.15.5?

The chflags() system call can no longer set the SF_FIRMLINK flag on a folder on an APFS volume. Rather than fail with an error code that we would have detected, it fails silently – it exits with a success exit status, but silently fails to set the special flag. That's a bug in the APFS filesystem implementation of chflags – if a system call doesn't do what you ask it to do, it's supposed to return an error code, not success. That's a fairly nasty bug too. Apple preaches that you should always check your error codes, and we do – religiously. This bug slipped past us for who knows how long because the system call exits with a success error code.

We don't need to set many of these flags, nor set them frequently – just on the first backup of the macOS system volume. It happens to be essential to the functionality of an APFS volume group, though, so the failure to set these flags means that new full-system backups created on 10.15.5 and later won't be bootable, and it will appear as if none of your data is on the destination (to be clear, though, all of the data is backed up). Kind of the opposite of what we're trying to do here. It's hard to find kind words to express my feelings towards Apple right now. Suffice it to say, though, I'm extremely disappointed that Apple would introduce this kind of bug in a dot-release OS update. We've seen 5 major updates to Catalina now, we should expect to see higher quality than this from an operating system.

Last Monday (May 18) I filed a bug report with Apple on this subject (FB7706647). I also opened an incident with Apple's Developer Technical Support. I got a sympathetic ear, and someone to advocate my case with Apple Engineering – which I greatly appreciate. However, it was for naught. The last 10.15.5 beta shipped last Wednesday (May 20) – without a fix. Then today, Apple shipped 10.15.5 with this nasty little bug, and here we are, creating whole, but slightly less functional backups.

Could this simply be a security fix – maybe Apple doesn't want third-parties to create firmlinks?

If that's the case – if this is not actually a bug and is actually an intentional change by Apple, then I would argue that this is far worse than a bug. First, if third-parties should not set or remove the SF_FIRMLINK flag, then that should be documented alongside the flag's definition (i.e. in /usr/include/sys/stat.h). Second, if you're not going to allow the setting of the SF_FIRMLINK flag, then the system call should return -1 and set errno to EPERM – reporting success and failing is reprehensible. Last, and most important – making such a change in a production OS release with no warning is openly hostile to third-party developers who were relying on the documented functionality.

I suppose we'll find out in future OS releases whether this was a simple filesystem bug that slipped into a production OS release, or if Apple finds it to be an acceptable practice to blindside developers by silently removing documented functionality in the middle of a production OS release cycle.