Locating files that were backed up unintentionally

I’m trying to reduce the size of my backups. In the past I’ve caught large files that were unintentionally being backed up due to misconfiguration on my part.

Is there a way to list all files in a repository, sorted by size? I’d like to dump that to a log file and scan the list manually. It might also be useful to sort directories by their recursive size instead so I can better prioritize my search path.

Alternatively, can anyone else think of a useful way to achieve the desired outcome (locate unintentional backups of large files)?

Thank you,
Gili

If nothing else, you can alwaus use restic’s mount command (on a supported OS and FUSE version of course) and simply use the mounted directory as any other filesystem.

That said, you could use the ls command with the --json option and pipe that through jq to format the data however you want it and then through sort or similar.

And is there a way to also delete those known to be superflous files from a backup? restic mount seems to create a read only mount.

No, restic does not rewrite existing snapshots. Your options are to either correct your backup sets and run a new backup so you get a new snapshot with only the things you want in it, and then forget and prune the old snapshots, leaving only the snapshot and data you want to keep. Another option is to try the PR at Implement 'rewrite' command to exclude files from existing snapshots by dionorgua · Pull Request #2731 · restic/restic · GitHub but it’s something you do at your own risk :slight_smile:

Thanks for the link, I have subscribed it. For now I can live with the excess space used. Though I’m looking forward for a release including it.

restic ls --long SNAPSHOTID | sort -k 4

If you are sure that ‘unintentional’ files are always bigger than a certain size you can use the --exclude-larger-than flag when doing backups.

1 Like

make that

restic ls --long SNAPSHOTID | sort -n -k 4

1 Like

restic ls --long SNAPSHOTID | sort -n -k 4

Thank you. Is there a way to run this command across all snapshots? I am on Windows so scripting is rather annoying.

Also, the above command does not seem to work for Windows’ sort command. It complains:

Input file specified two times.

Are you able to access the WSL?

There scripting should be as possible as under Linux (as it’s effectively Linux).

If you can’t, then Powershell is still very powerful, but I have no experience in scripting it.

For what’s it’s worth, here is a Java script I coded for Windows:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Restic
{
	private static final String REPOSITORY = "[your repository]";
	private static final String PASSWORD = "[your password]";

	public static void main(String[] args) throws IOException, InterruptedException
	{
		Restic restic = new Restic();
		List<String> snapshots = restic.getSnapshots();
		System.out.println("snapshots: " + snapshots);
		List<FileInfo> sortedFiles = new ArrayList<>();
		for (String snapshot : snapshots)
		{
			sortedFiles.addAll(restic.listFiles(snapshot).stream().filter(info -> info.size() >= 100_000_000L).
				toList());
		}
		sortedFiles.sort(Comparator.comparingLong(FileInfo::size).reversed());
		for (FileInfo fileInfo : sortedFiles)
			System.out.println(fileInfo);
	}

	/**
	 * @return the list of snapshots associated with the restic repository
	 * @throws IOException          if an I/O error occurs
	 * @throws InterruptedException if the process is interrupted
	 */
	private List<String> getSnapshots() throws IOException, InterruptedException
	{
		List<String> lines = run("snapshots");
		List<String> result = new ArrayList<>();
		LocalDate latestDate = null;
		boolean prefixFound = false;
		for (String line : lines)
		{
			if (!prefixFound)
			{
				prefixFound = line.matches("^-+$");
				continue;
			}
			if (line.matches("^-+$"))
			{
				// suffix found
				break;
			}
			String[] tokens = line.split("\\s+", 5);
			String snapshot = tokens[0];
			if (snapshot.length() < 8)
				continue;
			String dateAsString = tokens[1];
			LocalDate date = LocalDate.parse(dateAsString);
			if (latestDate == null || date.compareTo(latestDate) > 0)
			{
				latestDate = date;
				result.clear();
			}
			result.add(snapshot);
		}
		return result;
	}

	/**
	 * @param snapshot a repository snapshot
	 * @return the list of files in the snapshot
	 * @throws IOException          if an I/O error occurs
	 * @throws InterruptedException if the process is interrupted
	 */
	private List<FileInfo> listFiles(String snapshot) throws IOException, InterruptedException
	{
		Pattern prefix = Pattern.compile("^snapshot " + snapshot + " of \\[([^]]+)]");
		List<String> lines = run("ls", snapshot, "--long");
		List<FileInfo> result = new ArrayList<>();
		String path = "";
		for (String line : lines)
		{
			if (path.isEmpty())
			{
				Matcher matcher = prefix.matcher(line);
				if (matcher.find())
					path = matcher.group(1).replaceAll("\\\\", "/");
				continue;
			}
			String[] tokens = line.split("\\s+", 7);
			result.add(new FileInfo(path + tokens[6], Long.parseLong(tokens[3])));
		}
		return result;
	}

	private record FileInfo(String path, long size)
	{
	}

	/**
	 * @param command the restic command to run
	 * @return the output of the command
	 * @throws IOException          if an I/O error occurs
	 * @throws InterruptedException if the process is interrupted
	 */
	private List<String> run(String... command) throws IOException, InterruptedException
	{
		List<String> prefixedCommand = new ArrayList<>(Arrays.asList("cmd.exe", "/c", "restic", "-r", REPOSITORY);
		prefixedCommand.addAll(Arrays.asList(command));
		ProcessBuilder pb = new ProcessBuilder(prefixedCommand).
			redirectErrorStream(true);
		Map<String, String> env = pb.environment();
		env.put("RESTIC_PASSWORD", PASSWORD);
		Process process = pb.start();
		List<String> result = new ArrayList<>();
		try (BufferedReader in = new BufferedReader(new InputStreamReader(process.getInputStream()));
		     BufferedWriter out = new BufferedWriter(new OutputStreamWriter(process.getOutputStream())))
		{
			out.write("restic -r b2:stalker snapshots");
			out.newLine();
			out.flush();
			while (true)
			{
				String line = in.readLine();
				if (line == null)
					break;
				result.add(line);
				System.out.println(line);
			}
		}
		int rc = process.waitFor();
		if (rc != 0)
		{
			System.out.println(result);
			throw new IOException("Process failed with return code: " + rc);
		}
		return result;
	}
}

It’ll list all files over 100MB, sorted from biggest to smallest. This helped me find a couple unwanted files.

Is that what you end up in when using Windows - Java for scripting tasks? xD

Is that what you end up in when using Windows - Java for scripting tasks? xD

Indeed. It’s easier to code in languages I am familiar with than lose my mind with powershell :slight_smile:

It was that or Python and I am more familiar with Java.

1 Like

Thanks for that suggestion - Mounting the repository and then running baobab (the Gnome disk usage analyzer tool) against the latest snapshot seems to work fine, and gives a nice interactive view of the disk space inside the snapshot (which is IMHO a bit easier to navigate than a long list of big files, since it also helps to find big directories with a lot of small files).

1 Like