Unified view of latest versions of all files in all snapshots

glego · November 15, 2023, 8:56am

I’m also searching for a backup solution with similar features. I need a backup system that retains copies of files in the destination even after they’re deleted from the source, along with block-based storage features like deduplication, encryption, and integrity checks, similar to Restic.

My ultimate objective is to maintain a ‘latests’ folder where all files are stored indefinitely. This means files should never be deleted from this folder. In case a file gets updated or replaced, I want the ability to access previous versions through snapshots.

Have you discovered a method to accomplish this using rclone or Restic?

Kind regards,
Glenn

rawtaz · November 15, 2023, 10:13am

Restic indeed does all of this, so it should match your needs/requirements.

How do you envision that this works when files are all in one big pile and also when different versions of the same file are named the same, if you want all versions of all files represented in that one folder? Say you have ten versions of file A in that folder, and this file comes from your local computer where it was named just “A”, do you expect to have it renamed all the time in the remote folder or what are your intentions?

This is exactly what you have with restic.

What you described so far matches what restic does 100%.

EDIT: Correction, it does not match the “having a latest folder where all files are stored indefinitely”, but since you then said that you wanted to access those files’ different versions using snapshots, I wrote that restic matches your request fully.

kapitainsky · November 15, 2023, 11:39am

as @rawtaz described, restic meets all your requirements but the one about “folder with all files including deleted”

But there are backup programs which can do this. For example I use other software (not free though and macOS/Windows only) which has option Keep deleted files in subsequent backup records. It is effectively different way to view the latest snapshot.
I would never consider turning it on but everybody has different needs.

If this is your ultimate goal ping me on priv for the name. I do not want to use this forum for advertising.

glego · November 15, 2023, 1:34pm

Hi @rawtaz ,

Thank you for your insights. I understand that Restic covers most of my needs, but it doesn’t fulfill the specific requirement of preserving deleted files in the latest backup snapshot. This unique feature is critical for my scenario, where I need to maintain an indefinite history of a particular folder and its subfolders, along with their files.

To clarify, here’s how I envision the backup process:

Backup 1: If I start with ten files (abcdefghij) in the source, the same ten files should be in the latest backup (abcdefghij).
Backup 2: If I have only five files (fghij) in the source, the backup should still retain all ten original files (abcdefghij).
Backup 3: If I add new files, making 12 in total (ab fghij klmno) in the source, the backup should expand to include 15 files (abcdefghijklmno).

This approach ensures that even as files are deleted from the source, they remain intact in subsequent backups. Additionally, if a file is updated or a new one appears, it’s added to the latest version.

I get that the files are accessible in their respective snapshots, but in the scenario I described, I’d need to manually go through Backup 1 to retrieve files (cde). This process could become cumbersome if there are numerous snapshots to sift through.

I understand this is a specialized requirement, particularly useful for edge devices with limited disk space, where maintaining historical records and backups simultaneously is needed, without needing to navigate through multiple snapshots.

Thanks for taking time on this matter.

Best regards, Glenn

rawtaz · November 15, 2023, 1:53pm

You really need to answer the question I asked you earlier. The reason for this is that all this entire discussion boils down to is representation of the backed up/stored files.

glego · November 15, 2023, 2:08pm

Maybe I’m not fully grasping your query, as I’m uncertain about the location of this large collection of ‘A’ files you mentioned. In my earlier example, I illustrated having two versions of files a and b: one version in snapshot 1 and a second version in snapshot 3. In the latest backup, it would display the second version from snapshot 3. Could you please clarify your question a bit more?

doscott · November 15, 2023, 2:17pm

I believe you have a misunderstanding of how restic works. Files are not backed up; the data that makes up the files is backed up. If you never perform a forget/prune operation, all of the data that can be used to restore all versions of any file every “backed up” continue to exist in the data of the repository, regardless of whether they exist or don’t exist in the source.

If you want the ability to use the forget/prune operation for some data and not for others, this can be achieved through the appropriate use of tagged backups, or possibly the use of multiple repositories for specific retention models.

As the others have said, restic can do everything you have asked for, but you need to look at how if works without comparing it to other backup methods you have looked at.

glego · November 15, 2023, 2:23pm

I understand, thanks for the clarification once more @doscott.

@Rawtaz indeed pinpointed the core issue—it’s about how the snapshots are represented. I’m not looking to just show the latest snapshot, or using tags or otherwise but rather a composite view that combines all snapshots and the latest version, representing every file and its most recent iteration.

If someone could demonstrate how to accomplish this, it would be greatly appreciated.

rawtaz · November 15, 2023, 2:25pm

I see. So you mean that in the latest snapshot, only the very latest version of all files that were ever backed up, and regardless of whether they were then later deleted, should be shown.

This also naturally means that if you ever want to get access to any of the older versions, you will have to look in older snapshots. In restic you can do this using the find command (there’s also ls), which you can use to find a file regardless of it being the most recent version of itself or not.

How would you in that latest snapshot of yours want to see files that have gotten their name changed? File A which was named A but was later renamed to B, should the latest snapshot only list B or should it also list A (which is then an older version, technically), or both? If both, why should renamed files be shown in different versions in the same (latest) snapshot, but not non-renamed files?

All in all, is it really so hard to use the find command to just find the files when you need to go back to an older version? Restic has all of the features you need, besides this very very specific and maybe somewhat inconsistent way of presenting the backed up files, and honestly I don’t know of any other software that has what you want either.

You could use Git, but it too will not display all the files. You can use rclone or rsync, with some archiving/renaming, but these are not the same type of thing and you will lack most of the features restic can help you with. I think you’re pretty much overcomplicating this, but then again I don’t know your specific use case - why you want this particular functionality.

glego · November 15, 2023, 7:47pm

Yes, that’s precisely what I’m looking for.

Ultimately, my goal is to mount this combined snapshot representation using FUSE and share it through Samba for accessibility from multiple clients. To effectively utilize the find command in this setup, it seems I’ll need to develop a wrapper or some kind of user interface.

Great question. In the scenario where files are renamed, I anticipate that both files would appear in the unified snapshot view. Ideally, Restic would handle deduplication, thus not consuming double storage space for these files. If a new file version is created later, it would then effectively “overwrite” the previous version in this representation.

Currently, I’m employing a two-step process where I first use rclone copy to transfer data to a large storage space, and then I apply Restic for backups on this large storage account. My aim now is to streamline this process by combining these steps into a single, more efficient operation.

In this situation, we have numerous edge devices that store images until they reach near-full capacity or surpass a 14-day retention period. For troubleshooting purposes, we often need to retrospectively examine specific states of these devices. However, this is challenging because the filenames are generated randomly, and we rarely have accurate dates to determine which snapshot to search. This lack of precise information makes it difficult to pinpoint the needed data and ascertain its location in the respective snapshots.

In conclusion, my hope was that this unified snapshot representation would be feasible, potentially leading to significant storage savings. I appreciate everyone’s contributions and insights on this matter.

rawtaz · November 15, 2023, 8:06pm

Again, it already does.

Restic already stores all the backed up data deduplicated, so that is yet again one of the things it has. All it does not have is this very specific representation of the snapshots that you’re looking for.

I’m going to break this recent discussion out into its own thread

glego · November 16, 2023, 7:43am

The more I reflect on it, what I aim to create is essentially a cold storage system for historical data, forming a ‘master view’ of a folder’s entire history. This approach is primarily motivated by the potential for storage savings. For instance, in my current setup, my ‘hot’ storage (the rclone copy) occupies about 1.6 TB, while my ‘cold’ storage (the Restic backup) is under 900 GB. This feature makes sense for my use case. Even in a personal context, if my NAS were full, I could delete files from it and mount this unified view back onto the NAS, effectively mirroring it in cold storage in the cloud.

Edit: Would it be appropriate to suggest this feature on GitHub, or does it fundamentally contradict the essence of Restic as a backup tool?

sc2maha · November 16, 2023, 12:39am

This does not make sense to me, though @rawtaz has explained it very well so I hope you are clear now.

It sounds to me like what you want is versioned backups (a la restic) but with filenames and directory structure preserved (a la rclone). You may want to look at rsnapshot – though I warn you it will take much more space than restic will, for an equivalent number of “snapshots” because rsnapshot just uses hardlinks for dedup. (Consider a 1 GB file where a 1 MB change has occurred since the last backup. Restic will use 1 MB (approx) for the next backup; rsnapshot will use 1 GB).

sc2maha · November 16, 2023, 1:13am

You should also conside restic find. It’s pretty decent at finding stuff, though I am not a JSON/jq expert so I tend to process the non-JSON output for what I need from it.