Thoughts about progress indication

Design, Philosophy

Thoughts about progress indication

Carbon Copy Cloner has two different methods of providing progress indication, "fast" and "more accurate". "Perfectly smooth progress indication every time" is what many people want, along with an estimate of how long the task will take, but they often don't realize the cost to achieve that for a backup task that copies only items that have changed since a previous backup task.

Fast

If you don't deselect anything from CCC's list of items to be copied, CCC will start copying files immediately. Because CCC has no idea how much data will have to be copied to the destination, it bases progress indication on the number of files that have been processed vs. the number of files reported to exist on the source volume. If you have several really, really large files, progress indication may appear a bit uneven -- the progress bar will move steadily as CCC processes smaller files, but will move very slowly when large files are encountered. People generally want CCC to start copying right away, though, because it's faster to do it that way, and people are impatient.

More accurate

If you have excluded any items from the list of items to be copied, the total number of files on the source volume (which is easy to obtain very quickly) is no longer an accurate measure of how many files will be considered in the backup task. CCC will pre-scan the source volume to get an exact list of the items will be considered for copying. With this list in hand, CCC knows exactly how much data is in the data set that you're copying. With this information, CCC can now provide progress indication based on the amount of data that has already been considered vs. the total data in the data set. This method has its drawbacks as well, if you have several really, really large files that CCC decides are already up to date, the progress bar will jump ahead quite a bit when those files are considered.

Perfectly Smooth Progress Indication Every Time

The only way to achieve perfectly smooth progress indication, and to provide a "time remaining" estimate would be to pre-scan the source and the destination and compare those lists of files to determine not only how much data is in the data set, but also to determine which files are already up to date and which files will need to be copied. This would give a third data point, "total data that needs to be copied", and CCC could use that to determine progress. I have thus far decided against implementing this kind of progress indication because it is impractical. The goal of running CCC is to get your stuff backed up and to do so with as little effect to your productivity as possible. The amount of time that it would take to basically do a dry run of the backup is a significant portion of the time that it would take to actually do the backup (for times that you are updating an existing backup). I don't think I've ever drawn up these numbers before, so I decided to do that this morning. The results from the first task are a backup that I ran last night. The second set is a dry run from this morning.


  • 641,633 files and folders to consider
  • 141.67GB of data in the data set
  • 186 seconds to pre-scan the source volume
  • 1.8GB, 1145 files copied
  • 280 seconds for the comparing and copying phase
  • 466 seconds total time

  • 641,759 files and folders to consider
  • 142.02 GB of data in the data set
  • 174 seconds to pre-scan the source volume
  • 0 files copied (because it's a dry run)
  • 205 seconds for the comparing and copying phase
  • 379 seconds total time

So you can see that, even though CCC processes between 3400 and 3600 files per second, it simply takes a fair amount of time to collect that kind of information.

"Why can't CCC use the information from the dry run to determine which files need to be copied?"

In theory, CCC could collect a list of files that should be updated during the dry run, then copy just those items during a second "live" pass. While this is probably reasonable for a small data set, it's less practical for a larger data set. Especially when the source volume is your startup disk, the filesystem is constantly changing. Files come and go and are updated every minute that the computer is awake, so the list of files that should be copied is constantly in flux. This kind of method would also be more complex and prone to errors. Considering the importance of the task and that the benefit is entirely cosmetic, this is something that I have decided to not offer in CCC. That's not to say that I won't ever consider it, especially if that kind of method offered some other benefit. For example, people often wonder "What files is CCC planning to delete or archive on the destination?" or "Will the destination have enough room to complete an incremental update to my existing backup set?". These are features that I would like to provide in CCC, and the "dry run" functionality is likely how I would offer them.