Recovering from an Arq Backup semi-permanent error

Error: pack index data length 3292 is smaller than size of pack_index + index_objects

I’ve experienced this semi-permanent Arq Backup bug on enough occasions on various computers, sadly, that I’ve become adept at resolving them without needing to contact Arq’s support. That means I can get the backup running again in a short period of time without the need to go back and forth with support.

If you’re seeing these errors too, follow along to see what I do to correct this nasty condition and allow the backup to resume.

I call this a “semi-permanent” error because, although you’re going to lose some data correcting the error, you should be able to recover your backup so that it can resume. But without manual intervention, Arq will not recover on its own.

The first time I encountered this bug, support helped me locate the offending file thus allowing the backup to proceed … until it encountered another error. It got to the point where I had to write a Python script to look at all the files and find the corrupted ones. There were many and no explanation was ever offered as to why Arq corrupted all those files.

That was back in May, 2017. It’s now November and I’m still seeing these bugs on occasion so I must conclude that no fix for it has been applied. Or perhaps a fix was applied, but no code was added to detect and correct already corrupted backups. Data corruption bugs are one of the most serious kinds, so I was hoping fixing this would be a higher priority. Or, it could just be damn difficult to reproduce and fix.

WARNING: If you chose to try this yourself, you’re doing so at your own risk. I won’t be responsible for you permanently and irretrievably damaging your backup. If you’re at all squeamish, just contact Arq support and let them tell you how to fix it.

Symptom

Hopefully, you’re using Arq’s error reporting and are getting emailed when an error occurs. The subject line will tell you the number of errors. That number used to be wrong due to another Arq bug, but it appears to be working correctly now.

Scan through the email until you see something like:

Backup session for /Users started on November 6, 2017 14:19:50 EST
Error: pack index data length 3292 is smaller than size of pack_index + index_objects
Failed to read current backup record. Contact support@arqbackup.com for assistance.

Locating the problem

Before you can fix this bug, you must enable Arq’s Debug logging. Launch Arq, open its Preferences, click on the Advanced tab, and near the bottom you will see Log Level: Info (by default). Switch that to Debug. Close the prefs.

Run the backup again using the app’s Backups > Backup Now menu. Wait for it to complete. I suggest you pause Arq for an hour or so so that no further logs complicate the issue. Use the app’s Backups > Pause… menu.

The email you get will not have sufficient information. You’re going to have to dig into the internal Arq logs. This is for the macOS operating system. I’ve never had success using Arq under Windows in a VM so I don’t know what the equivalent steps will be.

Open the Mac’s Console app from the Utilities folder on the computer that’s experiencing the problem and, using the left selection column, pick the ~/Library/Logs/arqcommitter logs, then select today’s date.

Use command-F to open up a search input box and enter in “failed to load” without the quotes. This should take you to two lines that will look something like below. Confirm both lines contain the “…pack index data length <nnn> is smaller than…” line like in your emailed message.

2017/11/06 14:20:54:934 DETAIL [thread 330f] failed to load <PackId: packset=68A8C615-F6F8-4D26-B0F3-22EDD5280E79-trees,sha1=003690d4efe598a4ca790e591691d5a285ef9c2c>: Error Domain=PackIndexErrorDomain Code=-1 “pack index data length 3292 is smaller than size of pack_index + index_objects” UserInfo=0x7ff034ae2b30 {NSLocalizedDescription=pack index data length 3292 is smaller than size of pack_index + index_objects}
2017/11/06 14:20:54:934 ERROR [thread 330f] PIELoader: load error occurred: Error Domain=PackIndexErrorDomain Code=-1 “pack index data length 3292 is smaller than size of pack_index + index_objects” UserInfo=0x7ff034ae2b30 {NSLocalizedDescription=pack index data length 3292 is smaller than size of pack_index + index_objects}

You’re interested in the packset=… as highlighted above in red. Now comes the fun part—locating this folder.

Navigate to the destination where you’ve directed Arq to store the backup. In my case, it was on a server so I simply accessed the server volume. If you’re using a cloud service, you will need to use that service’s file viewer. I don’t use any cloud services, so you’re on your own here.

If you’re only storing one backup there, you’re lucky. Otherwise you have to determine which of the cryptically-named folders is the one you’re interested in.

In the Arq app, click on the destination in the CONFIGURE BACKUPS section of the left column. In the right column you will see something like:

Backup Set UUID: DA2CE692-5471-45A2-9F20-8B27FE1FD847

Open that same-named folder. Then open the nested “packsets” folder to see probably several pairs of …-blobs and …-trees folders. One of those will have the same name as in the error message above, in my case “68A8C615-F6F8-4D26-B0F3-22EDD5280E79-trees”.

Note, I’ve seen corruption of files in both the blobs and trees folders, so be careful which one your error message is flagging.

Don’t panic!

Before you open that folder, understand that, in my case, this folder contains … wait for it … 28,816 files! Needless to say, this causes some issues trying to view its contents. The Mac’s Finder will initially tell you there are no files in the folder, but just be patient. They’ll be displayed eventually. Even the Terminal command, ls, takes some time to display results.

When you see the list of files you will note they are paired as .index (or .index2) and .pack files. The sorting is not alphabetic, it’s kinda numeric since the filenames are strings of hexadecimal characters. To find the correct file I found it easiest to just start typing the first characters of the filename so that the Finder highlights the correct file. Which filename you ask? Why, the one in green in the error message above after “sha1=”. For me, “003690d4efe598a4ca790e591691d5a285ef9c2c.index” and .pack.

You can confirm this is the bad set of files by using a text editor and opening the .pack file. The file should start with “PACK” and then a lot of garbage. Go ahead, open another .pack file to confirm. Most likely in the bad file you will see a file that’s all nulls or has anything that doesn’t start with “PACK”.

Solution

Delete that pair of files. Say what?! Yes, you’re going to have to lose some data. No, I don’t know what data will be lost. Sorry, I didn’t write this bug so you, like me, are just going to have to live with an unknown hole in your backup. Yes, that is bad. Yes, I think it should be fixed too.

Don’t blame yourself. Arq runs as a service without user interaction after you’ve configured it. So when Arq starts reporting errors, those are errors due to bugs in the program. Of course, I’m assuming you respected the README.TXT file at the root level of the destination that reads:

Please do not modify this folder or any of the files or folders within it or your Arq backups may become unreadable.
If you have questions, please email support@arqbackup.com.

Arq is a complex program with complex data structures so you don’t mess around with these files unless it’s an emergency. Well, since your backup isn’t working anymore, this constitutes an emergency so you’re just going to have to delete that pair of files.

In my case, the corrupted files were last modified on November 24, 2016, almost a year ago. Why Arq is just complaining about them now is … uh … interesting. Arq is supposed to verify backups every 60 days (configurable) so why wasn’t this detected multiple times since the files were corrupted? This is a common occurrence with this bug. The corrupted files are months old, Arq’s verification doesn’t catch them, and they only mysteriously become a problem months later.

You’re going to want to clear Arq’s cache for the backup in error too. In the Verification step below, I didn’t wait for the backup to complete because Arq needed to cache the pack indices and that takes a long time, so I went to bed. I woke up to error messages about missing .pack files. Clearing the cache once or twice will resolve this issue. I don’t know why clearing it more than once should be necessary, but that’s what I’ve seen.

The consequences of deleting the cache are that the next backup runs much slower as it has to recache from the destination. A small price to pay to get a working backup again. Bonus! You recover lost disk space.

However, when I went to do that, Arq hung. Twice. Another nasty bug. After force quitting Arq the second time, I decided to manually clear out the cache. The cache files are located in ~/Library/Arq/Cache.noindex/. Rather than trying to figure out which cryptically-named folders are for which backup, I just delete them all. These cache folders grow forever (bug or design intent?), it seems, as I’ve had caches of 50-90 GB on other computers when they start off 1/10th that size. I’ve seen one other person reporting the same problem on Twitter, so it’s not just my configuration. Most people wouldn’t even think to check Arq’s cache until they start getting low on disk space and wonder why.

Once a month, I check the size of Arq’s caches and decide if they’ve grown bigger than I like.

I don’t like that I have to do cache maintenance myself. Since Arq seems to use SQLite databases for the caches, I’m speculating that perhaps stale entries aren’t being deleted and the database is not being compressed. Whatever the issue, caches shouldn’t grow continuously by leaps and bounds. Small growth proportional to the archive growth would be normal.

So, back to clearing the cache Arq’s way… Open Arq’s Preferences, click on the Destinations tab, select the backup you’re working on, click the Clear Cache… button on the lower-right. If Arq doesn’t hang (which it seldom does), it will restart the Arq Agent and itself.

Verification

Leave Arq’s logging set to Debug and force another backup. When I first encountered this bug, so many files were corrupted, I needed to write a Python script to scan through the 14,000+ .index files to find all the ones in error (I scanned the index files rather than the pack files because the index files are smaller). Let’s hope you don’t encounter that nightmare. If you’re unlucky, you might need to do this more than once, so that’s why you should leave Debug logging on for now. If you encounter more than three corrupted files, it could be time to turn to Arq support. At least by reporting it to them, it will give them an indication of how widespread the problem is. Even if you’ve been brave enough to fix the bug yourself, report it anyway to Arq support.

If you’re at the point where you need a script to fix your backups, you need to let Arq support know and pass the buck to them.

Regrettably for me, as I was writing up this solution while fixing my backup, it encountered 5 more corrupted files in the -trees folder, and 5 more in the -blobs folder, for a total of 11 pairs(!), all last modified on the same day almost a year ago. Geez, how much data did I lose? And I only checked 1 of the 5 pairs of folders under the packsets parent folder. I definitely need to check them all. Who knows what other corrupted files are in the unchecked folders?

Update: I modified my script to scan all the -blobs and -trees folders, and also searched for the UTF-16 encoded pattern of “com.apple.system.Security”. This pattern is found repeated in corrupted files that are not all nulls. When I first reported this to support, I was hoping this would be a clue as to what was causing the corruption, but since the bug remains to this day, I guess not.

Even though Arq wasn’t complaining about them, this second pass found 18 more corrupted pairs for a total of 29 (count ’em) pairs. Yikes! So it’s a good thing I checked. Who knows when Arq would start complaining about those files? But I do mourn the loss of my data, whatever it was.

Without a script to locate the corrupted files, it would not be practical for anyone, including Arq support, to fix this backup. Arq needs to self-heal.

Anyway, to find the subsequent files while searching for “failed to load”, make sure you’re looking at the latest one. You can do this by checking the scroll bar at the right of the Console window while looking at the logs.

After clearing the cache and starting the backup again, to manage your expectations, my backup took 2.5 hours to complete. Most of the time was recaching as not much data was uploaded. Still, I have a working backup again.

At this point, while your backup is running (fingers crossed), you may be wondering like I am, “If Arq knows exactly which files are the problem, and knows that deleting them is the only solution to allow the backup to resume, then why doesn’t it delete these files automatically and repair the backup itself?” That is a good question.

Completion

If you’ve found and deleted all the corrupted files, you can now switch Arq’s logging back to Info.

Conclusion

Congratulations! That wasn’t so bad was it? 😉

I continue to use Arq because it’s the best solution for me at this time. That does not mean I’m in any way satisfied with the number and severity of the bugs I’ve encountered. I can only hope that they get addressed soon, or I find a more robust backup program. Unfortunately, many of the bugs I’ve found in Arq were encountered long after the 30-day free trial. I have no reason to believe any other software I would test would be any different. Then there’s the issue of losing all the backup history and starting over from scratch with another backup program. I did that once with CrashPlan, and once after dropping Amazon Cloud Drive after their bait-and-switch unlimited data offer was rescinded. I’m tired of starting from scratch. Better the devil you know …

Addendum

2017-11-07: While I was writing up an update to this post, my daughter forwards me her third email from Arq. This one reporting 1177 .pack files not found. Just one error:

Error: /Users/<userid>/Documents/BusyCal Backups/BusyCal 2017-11-05 12-36.bcbu/Info.plist: /<userid>/D31EE621-7DDD-4A50-8EF0-E99B83857445/packsets/F2B05A32-D850-4A3D-9DA8-416046F21C81-blobs/6b5bc267f76b3929945b9d6ddbb42ffa0730078d.pack not found at destination

The backup will never work properly until these errors are corrected. This makes the fifth* incidence on three different computers in recent memory.

In one incident, over 9000 errors were reported. Arq should throttle these errors. After the first 10 errors Arq should say, “9000 similar errors.” I don’t need massively long emails. I’m not going to look at every line.

Previous times I managed to get the backups on track again by clearing the cache. This time I ran the same script I used to fix this bug to see if the underlying cause is backup corruption. Nope. So the only fix I know is to clear the cache every time Arq thinks a .pack file is missing.

I’m really getting tired of the amount of hand-holding Arq is starting to require.

* As of 2018-01-14, it’s now at nine incidents that I remember, but I’m not going to give a daily update of this frequent problem that is still unfixed as of Arq 5.10.

Author: Tom

Destroyer of software. If I haven't tested it, it hasn't been tested.

2 thoughts on “Recovering from an Arq Backup semi-permanent error”

  1. humph. You are the top google result for “pack not found at destination”. Which is the error I’m getting now. 127000 instances of this error. I’m also getting tired of the amount of hand holding Arq is requiring…….

    1. Try clearing the cache as a first step. The post gives instructions on how to do that. I’ve had to do this frequently on multiple computers, so it’s a common bug with Arq that’s been present far too long.

Leave a Reply

Your email address will not be published. Required fields are marked *