Backup software that doesn’t back up—by design

Update 2018-01-23: Arq support has announced the release of version 5.11 with background validation, obsoleting some of the concerns in this post. 😁 Further comments below.


Sorry for the clickbait headline, but this is important. As of this writing, I’m watching in horrified fascination as Arq Backup validates an 8.8 TB backup set on Amazon Cloud Drive. It started on March 6th at 02:00 and is not close to being finished. Arq has been validating for over 4 days(!) and probably won’t finish until the 7th day, when it will rest. 😏

(Boy was I optimistic when I first wrote this. See updates below for the reality.)

This would be fine if it did this maintenance activity in the background in parallel with the backups, or at least paused to let other backups run. It doesn’t. My hourly backups have been patiently waiting for 106 (count ’em) hours and will probably be stalled for ~170 hours before the validation finishes. I don’t dare stop this validation because I don’t know whether Arq is smart enough to resume next time, or if it starts all over from the beginning again if it never completes once. I do know that if I interrupt a validation to allow other backups to run, the validation does not resume for that backup the next time it runs. It waits the default 60 days before it validates again. I’ve been told that subsequent validations supposedly won’t take as long (this is the second validation of this set), but in the meantime no backups for days.

To aggravate the situation, Arq doesn’t warn you that this validation has started and your data is unprotected. Unless you watch it like a hawk, like I do, then you will be blissfully unaware that your data is at risk. Unprotected for days at a time. That critical project you’re working on? I sure hope you have other backup schemes like Time Machine.

You can’t rely on Arq as your sole backup means.

Unfortunately, Arq is my sole offsite backup means. For a week my data will be at risk, backed up only on a Time Machine disk. In case of fire or theft, I’m screwed. The risk is low, but isn’t that why we back up offsite?

As I said to Arq Support in early January when I first noticed this aberrant behaviour,

Backup programs are like insurance. You hope you never need it, but it can be a life saver when you do. Would you be happy with a car that turned off ALL safety systems—air-bags, seatbelt pretensioners, stability control, anti-lock brakes—for [170] hours of driving because it was running a diagnostic on the air-bags and wouldn’t stop until you manually halted it?

And it didn’t warn you?

Arq’s behaviour is by design. Backup software that doesn’t back up for days, and doesn’t tell you it’s not backing up. By design. Imagine! 😱

Arq is still better than CrashPlan, although I don’t recall CrashPlan halting backups for this long.


Update 2017-03-12:

Remember how I said I’d have to rely on my Time Machine backup because Arq was so unreliable? Yeah, well I found out this morning that Time Machine silently stopped backing up 36 hours ago leaving my data with no backups for that window of time. Fortunately, since I’ve known Time Machine to do this on occasion, I had written an audit script to notify me when it had stopped for > 24 hours. I had to set it as high as 24 because Time Machine periodically reindexes for many hours and I was getting too many false alerts. Once Time Machine starts indexing, there’s nothing you can do but let it run. AND, as of macOS Sierra (siooma?), Time Machine only runs when it feels like it.


Update 2017-03-14:

Yesterday, Arq has decided it needed to re-upload files it backed up a long time ago that haven’t changed. Does this mean the validation failed and Arq is just doing its job to ensure the backup is intact? If so, who borked the original backup? Arq or Amazon? Any time a validation feels it needs to re-upload files, that should be a major red flag to the developer to either ensure his program isn’t screwing up, or for him to write up a serious bug report to the cloud service being used as the backup destination. Someone’s program is buggy.

As of this writing, I’m forecasting Arq will be finished validating in 16 days (gasp!). Did I mention all other backups have stopped waiting for this validation to complete? So yeah, 16 days without any hourly backups running. By design.


Update 2017-03-20:

This is the two week anniversary of Arq stopping all backups while it validates a backup set. PARTY TIME!

No, wait. I should be in mourning for the loss of my backups. I’ve been using backup software on the Macintosh since Redux on a Mac Plus when I backed up a massive 100 MB hard disk with 100, 1.4 MB floppy disks. I’ve forgotten all the backup software I’ve tried. In recent memory I administered Retrospect at a company that backed up the entire disk on each of 10,000 computers using 100+ servers at 10 sites—daily. I’ve used Time Machine and CrashPlan for personal use. I’ve never seen backup software that stops backing up for over two weeks while it does maintenance. This is what Arq Backup does by design. I can’t stress that enough.

Neither Redux nor Retrospect needed to do maintenance. The worst I can remember CrashPlan stopping backups for maintenance was a week, and I think they fixed that as I only saw maintenance for hours while it synchronized blocks before I gave up on it and moved to Arq.

I’m hoping Arq’s next validation of this set is much shorter as I’ve been promised by support. That’s why I’m adamant about letting this validation complete since it’s the second validation I remember seeing and I wouldn’t consider two weeks to be a short amount of time. I aborted the previous validation because I needed the other backups to run. I suspect Arq doesn’t resume from an aborted validation, but starts over.

Since Arq’s default interval before validating is 60 days, if you have a large amount of data and a fast uplink to Amazon Cloud Drive, you can probably expect to see the same behaviour I’m seeing when it does your first validation. Using my ETA of ~16 days for 8.8 TB, expect Arq to take 45 hours per TB to validate the backup in your first 60 days. I don’t know how much faster it would be to a local drive. I’ll know in a couple of months if Arq will, in fact, validate faster the next time. But it’s totally unacceptable to halt backups for two weeks even once. Hourly backups should run hourly, right? Unfortunately, I’m not sure the Arq developers agree with me.

IMHO, Arq desperately needs to multi-task validations and backups. Even different backup sets should proceed concurrently if the source and destinations are different between sets. I’m not holding out much hope for that though.


Update 2017-03-22 16:26:04:

🎼 Celebrate! Celebrate! 🎶 Dance to the music! 🎵

Arq Backup finally—FINALLY—finished validating a 9 TB backup after

16 days, 13 hours, 26 minutes, and 0 seconds

810 backups missed

Un-be-lievable. Backup software not backing up for over 16 days without warning, leaving you at risk of data loss, is like…

  • Your car disabling all safety systems for 16 days of driving while it runs a routine diagnostic, but doesn’t warn you.
  • Your security system DVR not recording any video for 16 days while it runs a disk check, but doesn’t warn you.
  • The aircraft anti-collision system not being functional for 400 flying hours because it’s doing a self-test, without informing the pilot.

You get the idea. To say I am appalled at this design choice is to be generous. I can only hope the developers of Arq Backup understand the severity of this scalability issue, ask themselves “what the hell were we thinking,” and get it fixed ASAP.

And one additional bug that needs to be fixed. I have Arq configured NOT to “Include file list in backup logs and email reports.” At the end of the validation, it emailed a 60,346 line message detailing every block it uploaded.

I’ll close by saying…

It is never acceptable for any critical software service to stop performing its primary function for 16 days, let alone without warning the user.


Update 2017-05-07:

Has it been 60 days already?! My, how time flies when you don’t have to babysit backups. Yes, it’s been 60 days since Arq did a validation so it was scheduled to do another on the 6th. I know you’re all wondering if what support told me about validations being shorter after the first one was true. I won’t leave you in suspense.

Yes. The validation that started on the 6th was indeed much faster than the validation that started this posting. Instead of taking 397.5 hours to validate, it only took a mere 26 hours, 43 minutes, and 29 seconds. I would celebrate this except for one teensy detail.

None of my hourly backups ran for 26 hours!

The other nasty discovery is that, since this was the third validation and I had aborted the first validation to let other backups run, Arq does not resume from an aborted validation. It starts over from the beginning. That is why the second backup took 16.5 days. It had started from the beginning.

With large backups, you simply must let Arq finish validating or you will be perpetually in a lengthy validation cycle as it starts from the beginning each time. Sad, but true.

  • Don’t manually abort validations.
  • Don’t reboot your computer.
  • Don’t do anything that would cause the validation to fail.

Arq will penalize you severely, if you do. Oh, you still need those hourly backups to run? Tough. Arq doesn’t care. You’ll have to find some other way to safeguard your data while Arq fails to perform its primary function of backing up your data while it does maintenance. And since Arq doesn’t warn you it’s off to La La Land, you better watch it closely if you value your data.

Man, I hope Arq Backup 6 fixes this über-serious design flaw.


2017-06-03:

Arq did an object cleanup today. It’s not clear what that means as Arq’s Help is notoriously weak on this and other areas of its operation. I’m guessing it has something to do with thinning backups.

It ran for over 7 hours, once again blocking all other backups from running.

It’s nice that Arq does all this validation and cleanup. It would be much nicer if it didn’t stop doing it’s primary function—backing up our data—while it did maintenance. Can you imagine a restaurant that stopped serving customers while an employee mopped the floor or cleaned the washrooms? Or if the city shut down a residential street while the street sweeper cleaned the roads? I consider reliably backing up my data to be far, far more important.


Update 2018-01-23: Arq support has announced the release of version 5.11 with background validation obsoleting this post. 😁

However, I will wait a week or so before updating to see if early adopters report any problems. This is a major change but a most welcome one.

 

Author: Tom

Destroyer of software. If I haven't tested it, it hasn't been tested.

13 thoughts on “Backup software that doesn’t back up—by design”

  1. Thanks for this information about Arq – quite an eye-opener.
    Arq’s online help is not so good and doesn’t state clearly that this happens.
    What was Arq’s response to you about all this?
    I’m wondering if I’ve chosen the right software now…

    1. I agree that Arq’s online help is poor. I’ve worked at a company backing up 10,000 computers daily on 100+ servers at multiple sites using a competing product, so I have strong opinions on backup software. Arq’s only response was that validation is slow the first time but would be faster on subsequent validations. That is correct, but if you’re backing up a lot of data between those 60-day validations, then it will still stop backing up for hours or days. I don’t care what maintenance Arq needs to do. Backup software needs to be designed to continue performing its primary function while it does maintenance in the background.

      Worse though, is that Arq doesn’t warn you that your data is not being backed up while it takes days or weeks to validate. Using Arq, you simply must have another backup method to cover you while Arq does extended maintenance.

      I still use Arq. I’m content to live with its quirks and deficiencies because it’s the best solution for my needs. It’s OK for simple backups, but it scales poorly for large backups with multiple destinations and/or schedules.

    2. It’s fixed in Arq 5.11. Validation happens in the background now, independent of the backup process, so backups happen as scheduled with normal performance. Validation also uses multiple threads where before it was single-threaded.

  2. Thank you so much for writing this, Tom. I found it while trying to google why in tarnation my Arq verify is taking days even though it is verifying with a fast local machine over a gigabit pipe. Now I know 🙂

    My own read on this is that Arq just isn’t there yet, and that makes me sad, because like you, I really want to like it.

    In the meantime, I have been having stellar results with Duplicacy command line on both Linux and Windows. Super super fast, and less inscrutable by far. Maybe give it a go? It’s free.

    1. Thank you for the suggestion about Duplicacy. I’ll have a look at it. The problem with switching between backup client brands is that you have to start over from scratch again and lose the file version history, and deleted files.

      I suspect Arq started out as something for the developer who had simple needs and has grown organically since then. While it’s nice that it supports several cloud services, I would be much happier if its core engine was more robust. It has other issues I haven’t documented on this blog. It needs a lot of work before its the kind of backup software you can set and forget, and trust it to do its job.

    2. I checked out Duplicacy’s GUI license. It’s per computer whereas Arq’s is per user. I have a lot of old computers I keep around that do various dedicated functions, so Arq’s licensing is much less expensive for me.

      I still need to look into the CLI more. The license page is confusing. It says, “The CLI version is free for personal use so there are no personal licenses for the CLI version.” It also shows pricing of $20/user/year. I guess that means only if you’re using the CLI for business.

  3. Arq 5.11 was just released and it adds validation as a separate process, and it is also multi-threaded.

    I’m testing it now, i stumbled upon your post when my backup stopped for a few days and validation was the culprit.

    1. You must leave it on. With the release of 5.11 with the new background validation, that may have changed. Contact Arq support for more information.

  4. I checked my backups today to see it was doing a validation, so I checked the version – I am running the new version 5.11.0. I found that it was 18% through a 2 TB data store after 54 hours. The backups are stored on local disk, and the NAS it backs up from is on the local gigabit network. It had completed validation on two other sets of data from the same source/backup destination exceeding 3TB in about five hours, so no clue why this one was slow. You’ll be unsurprised to know that in the 60 hours since the validation started, none of my hourly backups had run. Nice that they SAY it does background validation – I don’t know that I’m seeing it. Maybe it’s only after the first validation? This is my first validation since switching to Arq – Maybe I’ll just keep cancelling it until they get it right.

    1. Only yesterday was I brave enough to install it on one computer. I haven’t verified that it does background validation though. I haven’t been brave enough to install it on my main computer with the bulk of my critical data. When I saw 5.11.0, .1, and .2 come out in rapid succession, that made be a bit leery. IF this works as claimed—fast and in the background—it will be a very nice feature. There are so many design changes needed for reliability and scalability. I’m hoping this is one step in that direction.

      Note that 5.11.0 had a nasty memory leak. Make sure you’re running 5.11.2 or later.

Leave a Reply

Your email address will not be published. Required fields are marked *