Why is restic not able to restore all files 100% like they were before?

restician · July 5, 2023, 8:01am

Hi,

I am trying to restore a deleted file exactly like it was before deletion. I am not able to do it.

This is the file before deletion (ls -lkh)

1.1G  -rw------- 1 root root /var/snap/lxd/common/lxd/disks/default.img (1045376 bytes)

This the file after restic restore

31G   -rw------- 1 root root /var/snap/lxd/common/lxd/disks/default.img (31457288)

This is the file after restic restore --sparse

1021M -rw------- 1 root root /var/snap/lxd/common/lxd/disks/default.img (1050484)

Is it possible to restore the file like it was before (in terms of disk space)? If yes, how? If not, why?

And my second question, why is the --sparse option not default? Is there a potential problem with that option. Is there a reason I should not add it to all my restore scripts?

Just to clarify, all three files are identical (“same sha256 checksum”) except for the space used on the drive.

alexweiss · July 5, 2023, 10:13am

restic does not handle sparse files (correctly). During backup, no information about sparseness is saved and the --sparse option for restore is nothing but a simply hack to replace empty blobs (=blobs full of zeroes) with a sparse hole in the restored file.
This also means that a non-sparse file containing only zeroes (assumption: large enough) will be restored to a spares file if you use restore --sparse.

Unless the sparseness information is completely saved by backup, a file cannot be exactly restored regarding sparseness.

restician · July 5, 2023, 11:23am

@alexweiss Thanks.

Unless the sparseness information is completely saved by backup, a file cannot be exactly restored regarding sparseness

How about rustic and sparse files?

[Found the answer in issue #3914. Rustic does not support sparse files either.]

This also means that a non-sparse file containing only zeroes (assumption: large enough) will be restored to a spares file if you use restore --sparse.

Is there any downside to that? I mean, could this create any issues?

I am wondering now whether restic is the right tool for me. Or are all backup tools like that? Am I expecting too much from a backup tool?

MichaelEischer · July 5, 2023, 7:22pm

What is the reason why you want to exactly restore the sparse regions of a file? It won’t affect the file content, so it’s purely a non-functional aspect. What is “exact” enough? Technically, the restored file contents will always be stored at different parts of a disk than before and thus yield a different data layout on disk.

kapitainsky · July 5, 2023, 7:27pm

Spending time on this would be counterproductive and with dubious use - restic --sparse at the moment does what is the best option - restores content bringing sparse original ballooning to the minimum.

restician · July 6, 2023, 7:54am

What is the reason why you want to exactly restore the sparse regions of a file? It won’t affect the file content, so it’s purely a non-functional aspect.

@MichaelEischer I guess the problem is that with Restic default options a couple of few “innocent looking” files could suddenly take a huge amount of disk space after a restore (see my example 1GB vs 30GB). I would not consider this as a “non-functional aspect” even though the content is technically the same.

The work-around for that would be to use the --sparse option, but then all files apparently could become sparse. Apparently that is not a good thing either. Of course we could try to exclude and include sparse files in separate backup/restore runs (but often we don’t know which files are sparse in advance, although with a shell script we could find them, but then the hope was that the backup tool would do all that ).

What is “exact” enough?

Sparse files from the source should be restored as sparse by default. If the restored sparse file is actually a few MBs smaller, that would be “exact enough” for me.

MichaelEischer · July 6, 2023, 7:25pm

Once the backup command detects that a file is sparse, then we can just as well restore it exactly. There’s already an issue for that Precise tracking of sparseness information · Issue #3914 · restic/restic · GitHub .

restician · July 7, 2023, 2:17am

Ok, great. Then there is hope (like with “compression”).

Sure, but it is not necessary, because, as you said already, “it’s purely a non-functional aspect”. To save time in the implementation, I would go for the simple version (not “preserving the exact file regions”).

restician · July 7, 2023, 9:20am

I will use this Linux one-liner to find and exclude sparse files

find . -type f -printf "%S\t%p\n" | gawk '$1 < 1.0 {print}'

Although that results in a rather large number files. Just need to find a tool that handles sparse files correctly to backup those files (maybe Duplicacy).

martinleben · July 7, 2023, 8:19pm

Why does it matter? Or rather, what breaks if you always use --sparse?

AlBundy · July 7, 2023, 9:10pm

wirhout --sparse: sparsed file are restored “unspares” so they need more space
with --sparse: “unsparsed” files are restored sparsed so they need less space

in both cases a restore of your backup does not match the original state and this is nothing I would expect from a backup-tool.
→ and of course I don’t talk about where the bits are stored on the disk.

In my opinion, size, timestamps, content, permissions and maybe more should be the same after a restore.

martinleben · July 7, 2023, 10:39pm

Yes, with “–sparse” the restored files are smaller that the originals. Yes, there is a ticket to follow up on that, so if this is the only complaint, then you can stop reading here.

My question still stands: What breaks if you always use sparse? IFAICT nothing, except someone’s test routine which might compare actual disk usage, as measured by för example “du”.

kapitainsky · July 7, 2023, 11:01pm

A little bit problem is that -sparse is not default so some users might be caught by surprise. But otherwise spot on.

restician · July 8, 2023, 4:26am

I asked this question already, but the developer dodged the question

I don’t know the answer either, but if there is no problem why aren’t all files on your Linux (or Windows or APFS) installation sparse? Why isn’t ‘restore sparse’ the default in Restic? Why isn’t “cp --sparse” the default? And so on. Common sense suggests “there is probably a reason”.

I saw this in a recent StackExchange question.

IMO people should avoid creating/using sparse files unless they absolutely need them. Such files result in insane amount of FS fragmentation and extra work from the FS driver. There are many more disadvantages and pitfalls. …

… all works well if you have >80% free space (looks crazy but that’s what it is). If you’re under 60% and have lots of files, it all goes downhill fast. ext4 cannot defragment free space, only individual files and fragmentation quickly becomes an insurmountable issue. AFAIK xfs is the only native Linux FS which can defragment everything (files and free space). [1]

[1] [ How transparent are sparse files for applications?](filesystems - How transparent are sparse files for applications? - Unix & Linux Stack Exchange)

alexweiss · July 8, 2023, 5:07am

One usecase of having non-sparse files: If you want to “reserve” some space on disc for whatever purpose, you could save an empty non-sparse file which you can delete on-demand. If you are in such a setting and have backup+restore, you’ll loose that “space reservation”.
Funnily, exactly this option was discussed in this forum as a remedy for repositories being unprunable due to full disks

Related topic: xkcd: Workflow
This also applies to things changed by a backup + restore

martinleben · July 8, 2023, 8:37am

I don’t think anybody dodged the question, actually. When I read it I thught it was rhetorical. Just go for it!

restician · July 10, 2023, 12:25pm

I tested Duplicacy with sparse files and, out of the box, Duplicacy restores sparse files exactly like there were (“sparse in, sparse out”, “same size on disk”).

I will have to use Duplicacy now too. Unfortunately, because Duplicacy is really “an unpleasant to use” backup tool. I guess, I can still use Restic for folders which do not have special files, like my “image and video” collection.