I’m old enough to remember the “good ‘ol days” when an OS X app was AppleScriptable, including having triggers. While AppleScript is far from being my favourite language, it does some things very well and hey, something is better than nothing. Due to the notorious iTunes and it’s partner in crime, iCloud Match, I’ve needed AppleScript to fix up major problems in the iTunes database. It was a lifesaver.
Some apps, especially cloud apps, will provide an API you can use to execute functions and receive event notifications. Then there are the apps that simply call a shell script but don’t accept much control in return. Finally, there are apps that are closed, isolated, and give you the finger if you try to control them.
In the last year, I’ve had to expend a lot of time on Arq Backup due to some serious bugs and design deficiencies. I’ve written a complex hodgepodge of Python scripts, Hazel triggers, and a PHP script to give me a web interface into Arq’s status so I can watch it like a hawk. I wish I could set and forget, but Arq requires a lot of babysitting. It’s a complex interface because Arq doesn’t have an API. It will call pre and post-backup shell scripts, but there are major problems with its design.
Documentation is for pussies
At the time of this post, if you visit Arq’s online documentation, there is no mention of scripting, even though Arq’s destination configuration allows you to enter pre and post scripts. You can’t even search the documentation from that page. I mean, Google provides free website custom searching, so there’s really no excuse these days (for the last decade) not to have a search box for your documentation.
No problem, I’ll just search Google directly for “Arq Backup script”. Cool, the first result is Scripting Arq – Arq Backup. However, when I tried to access that page, there was no response. Nor was there a response from the www.haystacksoftware.com home page.
Thankfully, Google has a cache that I can look at. But there are only a few terminal commands listed. No mention of the pre and post-backup scripts.
The macOS Arq app has the same help as the online help, but at least there’s a search box … that yields zero results searching for “script”.
OK, pet peeve time…
Every single user interface element that a user interacts with in your application needs to be clearly documented. If you offer a checkbox, I need to know when and why I should check it. If you offer fields for pre and post-backups, I need to know what the entry restrictions are, exactly when it’s called, is it a root or user call, is it a blocking call, if it’s called from within a shell, if so then what shell, what arguments are passed and what those argument types and ranges are, and/or can I pass my own arguments, can those arguments be quoted or escaped, what data or code can be returned to Arq, and what it will do with the data or code.
You’re doing your users a serious disservice if you expect them to reverse engineer your scripting options. Please, document them!
No, I’m not going to document what I’ve learned here. If you need to know more about scripting, contact Arq support and ask them why there’s no documentation.
Well, if you read on you’ll learn more about scripting as I point out its deficiencies.
Don’t argue with me
Fine, you’ve managed to figure out what text is acceptable to be entered into Arq’s destination Before and After Backup fields. Then you quickly find out that Arq doesn’t pass you any arguments, like what backup is running. Sure, you can pass in your own arguments with severe restrictions I’ll leave as an exercise for the reader to reverse engineer.
So let’s assume you create some unique identifier that will tell your script what backup is running. It’s only for backups (more on that later). Well, now you can do things like dump a SQL database first or other prep work. Sweet.
In my case, I’m interested in status information and there’s not much status at the beginning of a backup but at least I can indicate which backup has started.
This is the end … maybe
The post-backup script suffers from the same argument restrictions as the pre-backup script.
My post-backup script wants to know whether the backup was successful. All the information I need on the status of the backup is only in the log, and it’s given a timestamp as a name and I couldn’t find any way to pass the log file name as a parameter. Thank goodness, in this case only, that Arq serializes backups because then you know that the newest log file is for the backup that’s running. Or … is it? Read on.
Parse this farce
To grab the status of the backup, you have to parse the latest log file. But guess what? Arq calls the post-backup script before the final status line has been written to the log, or maybe flushed to disk. Fortunately, through trial and error, if the script waits about two seconds, it can read the final line in the log.
And really, what programmer doesn’t love parsing text files that can and do change on the whim of the developer? This is much more fun when you have different computers running different versions of the app and the developer has changed the log format. Just peachy. This happened in the crossover from Arq 5.10 to 5.11.
This is why programmers want APIs. A well-written API doesn’t change frequently. If it does change, there’s a transition period with the old API or parts of the API deprecated before the final change. Coexistence between old and new is allowed. Perhaps the data is presented as XML, or even better, JSON. International standards are hopefully used, like for date formats use ISO 8601, not something ambiguous like, 11/1/18. Unfortunately, Arq’s logs use three(!) data formats that I can remember seeing; ‘February 6, 2018 at 00:01:05 EST’, ‘February 6, 2018 at 12:01:05 AM EST’, and ‘2018-02-06 05:01:05 +0000’. Perhaps there are more, but this is silly enough, and why should I need to guess?
With a text log, text tends to change without warning, and may not even be consistent. Give me a well thought out API any day.
Should you have the misfortune to have your backup fail, Arq will write a line into the log saying the backup aborted and … won’t call your post-backup script!
The last thing my status page knows is backup A is running and backup B is also now running. But that’s incorrect. I know Arq serializes backups so it’s impossible for two to be running at the same time.
I guess I could wait until another pre-backup script is called to see if the last backup aborted, but that could be hours later. Meanwhile, the status is incorrect.
So how the hell do I know when to update the status of the aborted backup if Arq won’t call my post-backup script?
Hazel to the rescue!
Hazel monitors folders and files for changes. So I tell Hazel to monitor Arq’s log folder and it gets triggered every time a file changes. So yes, I do have to handle the normal situation where both my post-backup script and the script Hazel calls are run. But at least when Arq aborts a backup, I can update the status on my monitor page within seconds.
Yes, I know this is getting real kludgey, but it works.
Maybe I will and maybe I won’t
My status web page likes to know what’s going on so with the pre and post-backup scripts populating status into a JSON file the PHP script can read, I’m all set, right?
You see, Arq only calls your pre and post-backup scripts for—well—backups. The scripts are not called for the maintenance activities of object cleanup, validation, or budget enforcement. No problemo, I rely on Hazel to catch any of these events starting and stopping.
Oh, that backup schedule you’ve carefully crafted so as not to impact your work? It’s ignored when it comes to maintenance. These activities seemingly run at some random time during the day. If there’s a method behind this madness, the documentation won’t tell you about it.
In general, don’t rely on Arq’s documentation to tell you how it behaves.
As of Arq version 5.11, validation runs in parallel with backups. I presume object cleanup and budget enforcement do not run in parallel because no mention of them was made in the 5.11 announcement.
Currently, my status monitor assumes only one activity per backup at a time so I have some work cut out for me to allow both a backup and validation status to be shown at the same time. And remember above where I said I could rely on the most recent log being for the current backup? Yeah, maybe not. Perhaps a validation will start in parallel with a backup, in which case there will be two log files open at the same time, and I can’t rely on the most recent being for the currently running backup. I haven’t tested this yet, nor should I have to guess. The behaviour should be clearly documented.
By the way, that new multi-threaded validation spun up 600% CPU (6 cores) causing the fans on my poor Mac mini to come on full blast. This is not bad, unless you’re recording a podcast, or on the phone, or doing something else that needs the CPU more urgently. If that’s the case, remember to Pause Activity… during that important work. This assumes that pausing activity pauses all activity. The documentation says nothing about validation at all.
I wish developers would design in automation from the beginning. Bolting it on later always causes problems. But I also suspect very few Arq users use automation, partly because of all the problems I mentioned above, so I don’t expect many resources to be committed to building a proper API.
Arq is not a product targeted at professionals, or medium or larger businesses. Manage your expectations accordingly.