Restic Disk Usage Has Me Tearing My Hair Out

Hello. I am a former Database Administrator of Oracle databases, not an employee of Oracle. A couple of comments:

  1. The database is shut down while the backup is run. This is good method for simple backups and there is much more complexity if the database has too be kept running.
  2. While I have never administered a MySql database backup / restore I think the lack of deduplication is because there is no duplication. In an Oracle db every block, what ever the minimum size the db uses to write to disk, contains a number which is related to the last time something was changed in the block. That means every time an update is run every block which is changed gets a new number. While the database block likely does not match the size of the restic block, it is very likely that any production database has many, perhaps most, of the blocks being changed. This means that from restic’s point of view it cannot deduplicate the block so it will create a new block.
  3. I do not know if there is a block header at the start of the tablespace, disk file, which gets set to the maximum of the new number. If this were to happen I’m guessing that only the restic blocks which were changed would be backed up. But what would happen if the first row is changed to add one character? Would that cause all subsequent blocks to look different because they are different group of characters? That would stop deduplication.
    Further reading: mysql log sequence numbers

Please, please make sure to try a recovery and starting the database. I have had many people claim they have recovered databases by recovering files but when the database is actually started there were serious error messages.
Good luck.

Yeah, that’s correct - both are compressed, it’s just that ZFS isn’t able to compress encrypted data very well. This is perfectly expected and is not specific to encrypted files produced by restic - the same would happen with any other encrypted data/files.

Yeah, I agree. But it’s still an important baseline - if we can establish this then there’s less confusion about the compression’s effect on things, and we can then start determining if the growth of the repository better matches the growth of the data you back up with restic.

It’s not so much restic that cancels anything - you are literally talking two different types of files here. You simply cannot expect your destination ZFS to be able to compress encrypted files, regardless of what produced those files. In other words, if you want an encrypted backup, you’ll have less compression.

It would be a shame if you have to trade a really good backup software for something else though, but I totally get that if you rely on the compression you might have to ditch the encryption aspect of your backups. It’s a tradeoff you can do once you know the reasons for and the effects of such a decision.

@forbin Have you considered backing up dumps of the databases instead of the database directories themselves? E.g. if you could run a dump that produces what’s generally called “rsyncable” and pipe that to restic, this might perhaps yield a better overall result than having restic back up raw database files (which I think is what @punchcard was saying too).

For example, something like mysqldump ... | gzip --rsyncable | restic ... --stdin --stdin-filename siteFoo.sql might be worth trying. Perhaps you need some additional option to mysqldump to optimize how it lines up the data in the file, to make sure it’s as compressible as it can be, but try it like this first if you don’t know.

I guess I should say, if restic does not have an option to disable encryption (thereby enabling the full benefit of compression) then I may have to look for something else. I agree, switching would be a shame, as I have been delighted with restic in other important ways. I just may not have enough storage to support it. Enterprise NVME isn’t cheap, as you know, and our backup infrastructure was built on the concept of oversubscribing the disks. I have not given up yet, though, The second puzzle is the dramatic growth of the restic repo, basically doubling every day. I can only keep about 3-4 days of backups before the 15 TB destination volume fills up, which is just weird when you’re talking about a 2.2 TB source data set, and restic is supposedly doing incrementals. .If I can solve that one, maybe I can keep restic.

Please consider what I last wrote about using database dumps, zipped and then piped to restic. At least please address it, it’s a but annoying to suggest things and not even get an answer :wink:

Yeah, let’s use the non-ZFS testing as a baseline to make sure that the repository only grows with pretty much only what restic adds to the repository. Removing the compression and/or ZFS factor helps clear things up a bit.

I’ve considered it a few times over the years, but sqldump is orders of magnitude slower. The way we do backups is, we flush the tables and logs and place the database in a read-locked state. We then do an LVM snapshot of the whole folder, and then release the lock. The database ends up being locked for less than 1 second, so we can basically run a 24x7 shop and customers can access the system day and night. The worst thing that happens is that their application may appear to pause for a split second. Nobody even notices. We then rsync the snapshot to the first storage server. This gives us a complete copy of the whole mysql folder, with all the databases, logs, configuration files, etc., which makes it super easy to restore an instance to the exact running condition. With mysqldump, that’s all a lot harder and slower, and introduces more issues for customers who want 24x7 access. Every database has its own filesystem, so we typically do 5 snapshots and 5 rsyncs in parallel. Thus, we can backup 100+ MySQL instances with 2-3 TB of data in about 40 minutes, without downtime, and nobody sees a performance problem.

Sorry, I kind of lied about that because I didn’t want to complicate the discussion. It’s not really shut down. It’s just placed in a safe quiescent state for a split second. .

Sounds like you know what you’re doing :slight_smile: I get that it might not be an option to use dumps instead :confused: I guess maybe you could use a read-only replica to do the dumping from, but then you’d still need the additional disk space for that. Hm, what about doing something like tar <path-to-your-snapshot-copy-of-the-database> | gzip --rsyncable | restic backup --stdin ...?

  • While I have never administered a MySql database backup / restore I think the lack of deduplication is because there is no duplication. In an Oracle db every block, what ever the minimum size the db uses to write to disk, contains a number which is related to the last time something was changed in the block. That means every time an update is run every block which is changed gets a new number. While the database block likely does not match the size of the restic block, it is very likely that any production database has many, perhaps most, of the blocks being changed. This means that from restic’s point of view it cannot deduplicate the block so it will create a new block.
  • I do not know if there is a block header at the start of the tablespace, disk file, which gets set to the maximum of the new number. If this were to happen I’m guessing that only the restic blocks which were changed would be backed up. But what would happen if the first row is changed to add one character? Would that cause all subsequent blocks to look different because they are different group of characters? That would stop deduplication.

Makes sense.

That’s good advice. We have tested our DR methodology many times and it works well.

Not to mention CPU and memory. If funds were limitless… :slight_smile:

That’s spell is not in my magik book. but it looks like something worth investigating.

hello @forbin, i had some deja-vu as restic+zfs was discussed with you before on similar topic of data sizes. :slight_smile:
Refer to Why is the repo 300% larger than the source folder? - #23 by forbin

  1. Did that ever get resolved?

  2. How do you conclude that the remote disk now ran out of space; based on the du command or did the zfs system notify that it is filled up?

  1. No. Still struggling to understand why that’s happening.

  2. Because nothing could be written to the drive, and the df command showed it was 100% full. Also, restic itself would not run because it complained that the disk was full.

I can’t help directly, but I can say MySQL database dumps tend to have unique things in them such as the date / time the dump was made, which can impact deduplication. With a bit of luck the block boundaries will still align and you’ll get deduplication, I experiments / validation would be rqeuired.

I dunno man.

The source folder on the source server is 2.2 TB in size (4.5 TB uncompressed) and has 141 sub-folders.

On the restic repo on the destination server, I did…

restic forget --keep-last 1

restic prune

After it’s all done…

restic snapshots | grep snapshots

141 snapshots

Same number of snapshots as source folders, but used size is twice as much as the uncompressed size of the source…

du -hs restic_repo

9.4T restic_repo

I give up.

Have you try other backup solution (e.g borg) and compare the final size?

Throwing a hail mary here.

Could these be something in golang that is causing this weird behavior? I say this because some users of Kopia (also written in golang) have reported a similar issue:

@forbin Still waiting for your report after testing the exact same steps but with e.g. EXT4 or whatever non-ZFS (and for cryin’ out loud not BTRFS either). If you don’t want to take a systematic approach to debugging this problem, no wonder you give up.

1 Like

After our last conversation, I concluded that part of the problem is that restic stores data as basically uncompressible encrypted blobs, and ZFS is providing no value in that scenario. For whatever reason, after a few days, 2.2 TB of compressed source data (4.5 TB uncompressed) was ballooning to 9 TB on the destination server, so the volumes kept filling up and we were unable to keep more than 1 or 2 incremental backups. If it worked somewhat better on EXT4, that would have been interesting but not terribly useful, since the storage servers will stay on ZFS.

I originally switched to restic because my previous solution, rdiff-backup, was inexplicably going extremely slow. I finally worked around that problem, and now it is fast again. After removing the restic repositoires and performing my backups with rdiff-backup, the source and destination servers both show 2.2T of data, as expected, and incrementals only add a few hundred MB. ZFS compression ratio is also up to almost 400%.

Restic has a lot of cool features, but it looks like my storage stack likes rdiff-backup better and it meets our basic needs. Too bad we couldn’t get the problems ironed out, but I’ve been hammering on this issue (perhaps not as “systematically” as you would evidently prefer) for months, and we’re out of time. I do appreciate the input we received.

I do understand that since the servers will stay on ZFS, it won’t help you solve your problem by performing the exact same test that we were discussing earlier using another filesystem.

However, restic will only add the amount of data that changed in the source files since the last backup, due to how it chunks and deduplicates the data it backs up. It’s quite unlikely that you hit a bug where restic for some odd reason adds more data than what we expect. If your used storage is ballooning, the cause is likely something else, and thats what I was curious to find out.

So in summary, if you need to keep the current configuration, then you’ll have to live with the issue of the ballooning storage, which is due to factors that we have not yet identified, but that are unlikely to be caused by restic itself. It’s probably more a matter of how you backup the data and how the data is structured, with the combination of ZFS features. But it’s really hard to know because we only see glimpses of your setup and configuration.

I totally understand that you want to just get moving with other matters, so I guess we’ll have to leave this one as it is :slight_smile:

Since your problem is related to encrypted data being uncompressible (no matter which software you use for the encryption), you should try the newest version of restic (0.14) because it includes compression :slight_smile:

1 Like

@forin If you can find the time I’d also be very interested how restic performs with compression. :slight_smile: