You can’t make me

This post started off as a simple update to my previous post. But as I delved into the problem that I was seeing with Arq Backup, I went further down the rabbit hole until I decided it warranted its own post.

While keeping an eye on my changes to my monitor scripts to support parallel execution of backups and validation, I noticed that Arq went off the rails and took over 11 hours to scan 302 GB, which it normally does in a few minutes. Why? I have no idea. There was nothing in the logs.

Backup session for /Users/tom started on February 9, 2018 at 12:01:17 AM EST
Found completed backup record for /Users/tom
Saved backup record for /Users/tom
Scanned 302.017 GB (540713 files)
Uploaded 258.2 MB
Backup session for /Users/tom ended on February 9, 2018 at 11:13:32 AM EST

So of course, my overnight and hourly backup scripts were all delayed until late this morning. 😞

I checked again on the 10th, and this time it took 10 hours to run. Arq is so unpredictable. Sigh. So now I’ve turned on Debug logging to see if I can find out why Arq is slowing to a crawl.

Then, because I’m poking around in the logs, I see that Arq is trying to parse /Users/tom/Library/Arq/config/targets/.DS_Store, a macOS system file that many folders have. If all Arq wants to process are .target files, then why is it trying to parse invisible system files it’s got no business looking in? 🙄

2018/02/08 13:00:58:821 ERROR [thread 307] failed to parse target file .DS_Store: Error Domain=TargetFactoryErrorDomain Code=-1 “error parsing /Users/tom/Library/Arq/config/targets/.DS_Store: error parsing XML plist: Error Domain=NSXMLParserErrorDomain Code=4 “Line 1: Document is empty
” UserInfo={NSLocalizedDescription=Line 1: Document is empty

With debug logging enabled I check on the morning of the 11th and notice some bizarre stuff, which is becoming the norm with this app. Sure enough, my backup didn’t finish until 10:15 this morning. Again.

The first logs I wanted to check were the arqcommitter logs, which are the logs for the process that actually does most of the work backing up. But guess what? Arq had deleted all previous logs prior to 10:11, so I couldn’t find out what was happening at 00:01 when my backup was supposed to run.

So then I checked the Arq Agent logs. At least Arq had left log files going back to the beginning of the month. As I looked in these logs I noticed 8,000(!) occurrences in the last 24 hours of:

2018/02/11 00:00:00:309 DEBUG [thread 307] 0x7fc973e11f70 AgentAppController.m:899 power source changed to AC Power; on battery power = false

The problem is, I’m running on a Mac mini which doesn’t even have a battery! So it’s theoretically impossible to switch away from battery power as frequently as several times per second.

I don’t know if this is yet another Arq bug, or yet another macOS High Sierra bug. When it’s a tossup as to which is the flakier, and they’re supposed to be the most reliable software running—well—what can you do?

Then I see logs like this:

2018/02/11 00:01:04:960 DETAIL [thread 307] 08044CA0-0B10-4F80-BD8C-6ABAEAD91A47: next validate date=2017-10-09 00:31:57 +0000

Ya gotta love that the first timestamp is local time (EST or UTC-0500) and the next timestamp is UTC+0000, requiring mental arithmetic for every log that does crap like that.

As I skim through the logs until the 00:01 start time for the backup that is now taking hours instead of minutes to run, I see it start, start and finish running the pre-backup script, spit out 20 more spurious battery power logs, start the backup session for the folder, find a completed backup record at 00:01:31 and then … nothing until 08:00:04. It just stops. Even the spurious battery power logs stop, which makes me think that they’re definitely due to an Arq bug. Anybody got a can of Raid®? A Costco-sized can?

This looks like the notorious Arq bug where you can set a pause interval for a specific destination but it applies globally! It works this way because Arq only runs one backup at a time yet bizarrely lets a backup start before it’s paused, thus hanging all backups. This was reported a long time ago, but maybe it’s never been fixed.

So, maybe it’s my fault. Maybe I turned on a pause for between 00:00 and 08:00. It sure looks like it, right? I have 10 destinations configured and I check each one. None of them are configured to pause at all, let alone within this time interval. And with nothing in the logs as to why the damn thing is pausing, how the hell do I get Arq Backup to—you know—back up? This is a new problem in 5.11.

I know humans are all supposed to get 8 hours of sleep at the same time of night, but I didn’t think that applied to software.

So where do I go from here? I checked all the .target XML files and the pauseDuringWindow option was false. Maybe something turned this on and I need to force Arq to reread its configuration. So I guess I’ll perform the standard debugging voodoo of killing the Arq Agent, or better yet, restarting the Mac.

Then, since Arq suspiciously deleted all arqcommitter logs from the time I need them, I guess I have no choice but to watch the logs at 00:01 tomorrow morning and hope to see something before Arq once again hides the evidence.


Update 2018-02-11 23:15:

I’m 45 minutes away from finding out, hopefully, why Arq stalls just past midnight. But in the meantime, my wife’s computer is sending me emails with hundreds of logs (many identical) like:

Error: /Users/<userid>/Music/iTunes/iTunes Library.itl: The operation couldn’t be completed. (MissingErrorDomain error 0.)
Error: /Users/<userid>/Music/iTunes/iTunes Library.itl: db select: no row!
Error: /Users/<userid>/Music/iTunes/iTunes Library.itl: The operation couldn’t be completed. (MissingErrorDomain error 0.)

I’ve said it a couple of times before, I don’t understand why Arq thinks I want to look at hundreds, or thousands, or tens of thousands of the same log. Tell me once, tell me how many identical logs, and knock it off.

Oh it’s great to have these logs I guess, but they’re kinda useless. First of all, what the hell does MissingErrorDomain even mean, and just what the hell am I supposed to do to correct it?

Anyway, I Googled that error and came across Arq’s release notes where … wait for it … this bug has been fixed twice before. In 5.1.8 on July 4, 2016, and  5.3.3 on October 16, 2016. Well, here we are at 5.11.2 on February 11, 2018 and the damn error is back. I think the Arq developers need to beef up their regression testing.

I’m guessing I need to clear Arq’s cache yet again. I don’t know why its caching is so flaky. I’ve cleared the cache on various machines numerous times to correct a variety of errors and the beat goes on and on.

I mean, it’s nice that Arq has extensive error detection and all, but its recovery is poor. A lot of manual intervention is required for bugs that shouldn’t even be in a 5.11 app.

I’m an unpaid babysitter.


2018-02-12 00:01:

The good news is that restarting the Mac yesterday kicked Arq in the butt and it’s now running “normally” in that it’s completing the backup immediately instead of pausing until 08:00.

The bad news is that I’m friggin’ up past midnight babysitting Arq. 😡 This is becoming a second career.


2018-02-17:

I shut down Arq at noon on February 12th until 15:30 on the 14th so I could backup, format, and restore a failing 4 TB drive. I didn’t want Arq’s I/O interfering with backing up and restoring this drive to other drives (I needed the free space on one of Arq’s destination disks). What, are you kidding? Of course I didn’t use Arq for that—I used ChronoSync. Much, much faster than Arq, I can easily split the 4 TB amongst multiple destination disks, I can watch the progress, and reversing the backup to a restore-all is trivial. It would be tedious to use Arq for this purpose.

At 15:30 on the 14th I started Arq again and it ran fine on the morning of the 15th. Then, on the 16th…

… you guessed it, Arq is once again hanging at midnight and not finishing a backup that normally takes a few minutes, until 11 hours later. And, last night my wife had to clear the cache on her computer to stop Arq bitching about 1,236 missing .pack files, which Arq felt duty-bound to log every damn one in the email it sent. But I digress.

I did a reboot of my computer last time to clear the bug. This time I’m going to see if just restarting the Arq agent will work. Version 5.11.2 is more of a PITA than normal.

Arq is by far the buggiest backup software I’ve ever used. I used to tell people, “Yeah, I’ve seen far too many bugs in this app, but go ahead, try it out for free for 30 days. It might work for you.” No more. You just never know what new and “wonderful” bug will strike Arq next.

Author: Tom

Destroyer of software. If I haven't tested it, it hasn't been tested.

Leave a Reply

Your email address will not be published. Required fields are marked *