The case of the missing API

I’m old enough to remember the “good ‘ol days” when an OS X app was AppleScriptable, including having triggers. While AppleScript is far from being my favourite language, it does some things very well and hey, something is better than nothing. Due to the notorious iTunes and it’s partner in crime, iCloud Match, I’ve needed AppleScript to fix up major problems in the iTunes database. It was a lifesaver.

Some apps, especially cloud apps, will provide an API you can use to execute functions and receive event notifications. Then there are the apps that simply call a shell script but don’t accept much control in return. Finally, there are apps that are closed, isolated, and give you the finger if you try to control them.

In the last year, I’ve had to expend a lot of time on Arq Backup due to some serious bugs and design deficiencies. I’ve written a complex hodgepodge of Python scripts, Hazel triggers, and a PHP script to give me a web interface into Arq’s status so I can watch it like a hawk. I wish I could set and forget, but Arq requires a lot of babysitting. It’s a complex interface because Arq doesn’t have an API. It will call pre and post-backup shell scripts, but there are major problems with its design.

Documentation is for pussies

At the time of this post, if you visit Arq’s online documentation, there is no mention of scripting, even though Arq’s destination configuration allows you to enter pre and post scripts. You can’t even search the documentation from that page. I mean, Google provides free website custom searching, so there’s really no excuse these days (for the last decade) not to have a search box for your documentation.

No problem, I’ll just search Google directly for “Arq Backup script”. Cool, the first result is Scripting Arq – Arq Backup. However, when I tried to access that page, there was no response. Nor was there a response from the www.haystacksoftware.com home page.

Thankfully, Google has a cache that I can look at. But there are only a few terminal commands listed. No mention of the pre and post-backup scripts.

The macOS Arq app has the same help as the online help, but at least there’s a search box … that yields zero results searching for “script”.

OK, pet peeve time…

Every single user interface element that a user interacts with in your application needs to be clearly documented. If you offer a checkbox, I need to know when and why I should check it. If you offer fields for pre and post-backups, I need to know what the entry restrictions are, exactly when it’s called, is it a root or user call, is it a blocking call, if it’s called from within a shell, if so then what shell, what arguments are passed and what those argument types and ranges are, and/or can I pass my own arguments, can those arguments be quoted or escaped, what data or code can be returned to Arq, and what it will do with the data or code.

You’re doing your users a serious disservice if you expect them to reverse engineer your scripting options. Please, document them!

No, I’m not going to document what I’ve learned here. If you need to know more about scripting, contact Arq support and ask them why there’s no documentation.

Well, if you read on you’ll learn more about scripting as I point out its deficiencies.

Don’t argue with me

Fine, you’ve managed to figure out what text is acceptable to be entered into Arq’s destination Before and After Backup fields. Then you quickly find out that Arq doesn’t pass you any arguments, like what backup is running. Sure, you can pass in your own arguments with severe restrictions I’ll leave as an exercise for the reader to reverse engineer.

So let’s assume you create some unique identifier that will tell your script what backup is running. It’s only for backups (more on that later). Well, now you can do things like dump a SQL database first or other prep work. Sweet.

In my case, I’m interested in status information and there’s not much status at the beginning of a backup but at least I can indicate which backup has started.

This is the end … maybe

The post-backup script suffers from the same argument restrictions as the pre-backup script.

My post-backup script wants to know whether the backup was successful. All the information I need on the status of the backup is only in the log, and it’s given a timestamp as a name and I couldn’t find any way to pass the log file name as a parameter. Thank goodness, in this case only, that Arq serializes backups because then you know that the newest log file is for the backup that’s running. Or … is it? Read on.

Parse this farce

To grab the status of the backup, you have to parse the latest log file. But guess what? Arq calls the post-backup script before the final status line has been written to the log, or maybe flushed to disk. Fortunately, through trial and error, if the script waits about two seconds, it can read the final line in the log.

And really, what programmer doesn’t love parsing text files that can and do change on the whim of the developer? This is much more fun when you have different computers running different versions of the app and the developer has changed the log format. Just peachy. This happened in the crossover from Arq 5.10 to 5.11.

This is why programmers want APIs. A well-written API doesn’t change frequently. If it does change, there’s a transition period with the old API or parts of the API deprecated before the final change. Coexistence between old and new is allowed. Perhaps the data is presented as XML, or even better, JSON. International standards are hopefully used, like for date formats use ISO 8601, not something ambiguous like, 11/1/18. Unfortunately, Arq’s logs use three(!) data formats that I can remember seeing; ‘February 6, 2018 at 00:01:05 EST’, ‘February 6, 2018 at 12:01:05 AM EST’, and ‘2018-02-06 05:01:05 +0000’. Perhaps there are more, but this is silly enough, and why should I need to guess?

With a text log, text tends to change without warning, and may not even be consistent. Give me a well thought out API any day.

Abort! Abort!

Should you have the misfortune to have your backup fail, Arq will write a line into the log saying the backup aborted and … won’t call your post-backup script!

The last thing my status page knows is backup A is running and backup B is also now running. But that’s incorrect. I know Arq serializes backups so it’s impossible for two to be running at the same time.

I guess I could wait until another pre-backup script is called to see if the last backup aborted, but that could be hours later. Meanwhile, the status is incorrect.

So how the hell do I know when to update the status of the aborted backup if Arq won’t call my post-backup script?

Hazel to the rescue!

Hazel monitors folders and files for changes. So I tell Hazel to monitor Arq’s log folder and it gets triggered every time a file changes. So yes, I do have to handle the normal situation where both my post-backup script and the script Hazel calls are run. But at least when Arq aborts a backup, I can update the status on my monitor page within seconds.

Yes, I know this is getting real kludgey, but it works.

Maybe I will and maybe I won’t

My status web page likes to know what’s going on so with the pre and post-backup scripts populating status into a JSON file the PHP script can read, I’m all set, right?

Nope.

You see, Arq only calls your pre and post-backup scripts for—well—backups. The scripts are not called for the maintenance activities of object cleanup, validation, or budget enforcement. No problemo, I rely on Hazel to catch any of these events starting and stopping.

Oh, that backup schedule you’ve carefully crafted so as not to impact your work? It’s ignored when it comes to maintenance. These activities seemingly run at some random time during the day. If there’s a method behind this madness, the documentation won’t tell you about it.

In general, don’t rely on Arq’s documentation to tell you how it behaves.

Parallel universe

As of Arq version 5.11, validation runs in parallel with backups. I presume object cleanup and budget enforcement do not run in parallel because no mention of them was made in the 5.11 announcement.

Currently, my status monitor assumes only one activity per backup at a time so I have some work cut out for me to allow both a backup and validation status to be shown at the same time. And remember above where I said I could rely on the most recent log being for the current backup? Yeah, maybe not. Perhaps a validation will start in parallel with a backup, in which case there will be two log files open at the same time, and I can’t rely on the most recent being for the currently running backup. I haven’t tested this yet, nor should I have to guess. The behaviour should be clearly documented.

By the way, that new multi-threaded validation spun up 600% CPU (6 cores) causing the fans on my poor Mac mini to come on full blast. This is not bad, unless you’re recording a podcast, or on the phone, or doing something else that needs the CPU more urgently. If that’s the case, remember to Pause Activity… during that important work. This assumes that pausing activity pauses all activity. The documentation says nothing about validation at all.

Wishful thinking

I wish developers would design in automation from the beginning. Bolting it on later always causes problems. But I also suspect very few Arq users use automation, partly because of all the problems I mentioned above, so I don’t expect many resources to be committed to building a proper API.

Arq is not a product targeted at professionals, or medium or larger businesses. Manage your expectations accordingly.

 

Author: Tom

Destroyer of software. If I haven't tested it, it hasn't been tested.

4 thoughts on “The case of the missing API”

  1. So I was all set to use Arq until I read about users such as you who are having difficulties in restoring backed up data. So, what alternative data format (.tar.gz/.xz) and tool would you recommend to someone who wants to backup around 500GB of data?

    1. I have far more problems actually backing up data than restoring. I’ve had to do a few restores, sometimes of a single file, sometimes with a folder. Arq has some deficiencies, such as only being able to restore a single item at a time, file or folder. I know that sounds pretty silly when you consider Macs have been able to select multiple file items since almost forever, but that’s where we are in 2018. If you restore a folder, it won’t restore the folder to a snapshot, but will leave newer files. Sometimes this is what you want and sometimes not. I honestly haven’t investigated all the ins and outs of restoring files and folders. In some ways, I’m afraid of what I’ll find.

      If you have simple needs, Arq will probably work fine. I’ve backed up a dozen TB and, although Arq gives me a lot of grief, it’s still the best I’ve found for my needs. I’ve configured 10 destinations for one user account alone. I run 4 computers with Arq and I seem to be fighting some problem every month, at least. But so far I haven’t lost a lot of data, or been permanently unable to backup or restore, although I’ve come close several times requiring extensive intervention.

      Arq has a 30-day free trial. I’d say give it a try. Back up everything you can, try as many configurations of backups as you think you’ll need, and try some test restores to see if Arq is right for you. I’m sticking with Arq until something better comes along, or it totally lets me down.

  2. The UI doesn’t seem very intuitive to me. The little search box couldn’t find my file quickly although there were just two files backed up. For the moment, I’m using Duplicati. It’s free and satisfies my current needs. Thanks for your perspective.

    1. I absolutely agree that the UI is terrible. Create two backups to the same destination and Arq names them the same. If you have 10 destinations configured like I do, you have to remember the order they appear in the left panel of the app if you want to use the menu bar’s Backup Now. I’m not the only one that’s pointed that out to the developers, yet it remains unfixed.

      I wrote a post about how Arq’s search feature is useless.

      I tried Duplicati but it failed to backup even a single file and only gave me a cryptic error message, so I gave up. Another reader recommended Duplicacy. Take a look at that one.

Leave a Reply

Your email address will not be published. Required fields are marked *