cowwoc
December 5, 2021, 7:44am
1
I’m trying to reduce the size of my backups. In the past I’ve caught large files that were unintentionally being backed up due to misconfiguration on my part.
Is there a way to list all files in a repository, sorted by size? I’d like to dump that to a log file and scan the list manually. It might also be useful to sort directories by their recursive size instead so I can better prioritize my search path.
Alternatively, can anyone else think of a useful way to achieve the desired outcome (locate unintentional backups of large files)?
Thank you,
Gili
rawtaz
December 5, 2021, 12:30pm
2
If nothing else, you can alwaus use restic’s mount
command (on a supported OS and FUSE version of course) and simply use the mounted directory as any other filesystem.
That said, you could use the ls
command with the --json
option and pipe that through jq
to format the data however you want it and then through sort
or similar.
NobbZ
December 5, 2021, 2:11pm
3
And is there a way to also delete those known to be superflous files from a backup? restic mount
seems to create a read only mount.
rawtaz
December 5, 2021, 2:24pm
4
No, restic does not rewrite existing snapshots. Your options are to either correct your backup sets and run a new backup so you get a new snapshot with only the things you want in it, and then forget
and prune
the old snapshots, leaving only the snapshot and data you want to keep. Another option is to try the PR at Implement 'rewrite' command to exclude files from existing snapshots by dionorgua · Pull Request #2731 · restic/restic · GitHub but it’s something you do at your own risk
NobbZ
December 5, 2021, 2:29pm
5
Thanks for the link, I have subscribed it. For now I can live with the excess space used. Though I’m looking forward for a release including it.
764287
December 5, 2021, 3:03pm
6
restic ls --long SNAPSHOTID | sort -k 4
If you are sure that ‘unintentional’ files are always bigger than a certain size you can use the --exclude-larger-than
flag when doing backups.
1 Like
make that
restic ls --long SNAPSHOTID | sort -n -k 4
1 Like
cowwoc
December 5, 2021, 5:40pm
8
restic ls --long SNAPSHOTID | sort -n -k 4
Thank you. Is there a way to run this command across all snapshots? I am on Windows so scripting is rather annoying.
Also, the above command does not seem to work for Windows’ sort
command. It complains:
Input file specified two times.
NobbZ
December 5, 2021, 7:17pm
9
Are you able to access the WSL?
There scripting should be as possible as under Linux (as it’s effectively Linux).
If you can’t, then Powershell is still very powerful, but I have no experience in scripting it.
cowwoc
December 7, 2021, 5:11am
10
For what’s it’s worth, here is a Java script I coded for Windows:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Restic
{
private static final String REPOSITORY = "[your repository]";
private static final String PASSWORD = "[your password]";
public static void main(String[] args) throws IOException, InterruptedException
{
Restic restic = new Restic();
List<String> snapshots = restic.getSnapshots();
System.out.println("snapshots: " + snapshots);
List<FileInfo> sortedFiles = new ArrayList<>();
for (String snapshot : snapshots)
{
sortedFiles.addAll(restic.listFiles(snapshot).stream().filter(info -> info.size() >= 100_000_000L).
toList());
}
sortedFiles.sort(Comparator.comparingLong(FileInfo::size).reversed());
for (FileInfo fileInfo : sortedFiles)
System.out.println(fileInfo);
}
/**
* @return the list of snapshots associated with the restic repository
* @throws IOException if an I/O error occurs
* @throws InterruptedException if the process is interrupted
*/
private List<String> getSnapshots() throws IOException, InterruptedException
{
List<String> lines = run("snapshots");
List<String> result = new ArrayList<>();
LocalDate latestDate = null;
boolean prefixFound = false;
for (String line : lines)
{
if (!prefixFound)
{
prefixFound = line.matches("^-+$");
continue;
}
if (line.matches("^-+$"))
{
// suffix found
break;
}
String[] tokens = line.split("\\s+", 5);
String snapshot = tokens[0];
if (snapshot.length() < 8)
continue;
String dateAsString = tokens[1];
LocalDate date = LocalDate.parse(dateAsString);
if (latestDate == null || date.compareTo(latestDate) > 0)
{
latestDate = date;
result.clear();
}
result.add(snapshot);
}
return result;
}
/**
* @param snapshot a repository snapshot
* @return the list of files in the snapshot
* @throws IOException if an I/O error occurs
* @throws InterruptedException if the process is interrupted
*/
private List<FileInfo> listFiles(String snapshot) throws IOException, InterruptedException
{
Pattern prefix = Pattern.compile("^snapshot " + snapshot + " of \\[([^]]+)]");
List<String> lines = run("ls", snapshot, "--long");
List<FileInfo> result = new ArrayList<>();
String path = "";
for (String line : lines)
{
if (path.isEmpty())
{
Matcher matcher = prefix.matcher(line);
if (matcher.find())
path = matcher.group(1).replaceAll("\\\\", "/");
continue;
}
String[] tokens = line.split("\\s+", 7);
result.add(new FileInfo(path + tokens[6], Long.parseLong(tokens[3])));
}
return result;
}
private record FileInfo(String path, long size)
{
}
/**
* @param command the restic command to run
* @return the output of the command
* @throws IOException if an I/O error occurs
* @throws InterruptedException if the process is interrupted
*/
private List<String> run(String... command) throws IOException, InterruptedException
{
List<String> prefixedCommand = new ArrayList<>(Arrays.asList("cmd.exe", "/c", "restic", "-r", REPOSITORY);
prefixedCommand.addAll(Arrays.asList(command));
ProcessBuilder pb = new ProcessBuilder(prefixedCommand).
redirectErrorStream(true);
Map<String, String> env = pb.environment();
env.put("RESTIC_PASSWORD", PASSWORD);
Process process = pb.start();
List<String> result = new ArrayList<>();
try (BufferedReader in = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(process.getOutputStream())))
{
out.write("restic -r b2:stalker snapshots");
out.newLine();
out.flush();
while (true)
{
String line = in.readLine();
if (line == null)
break;
result.add(line);
System.out.println(line);
}
}
int rc = process.waitFor();
if (rc != 0)
{
System.out.println(result);
throw new IOException("Process failed with return code: " + rc);
}
return result;
}
}
It’ll list all files over 100MB, sorted from biggest to smallest. This helped me find a couple unwanted files.
rawtaz
December 7, 2021, 11:05am
11
Is that what you end up in when using Windows - Java for scripting tasks? xD
cowwoc
December 7, 2021, 12:57pm
12
Is that what you end up in when using Windows - Java for scripting tasks? xD
Indeed. It’s easier to code in languages I am familiar with than lose my mind with powershell
It was that or Python and I am more familiar with Java.
1 Like
Thanks for that suggestion - Mounting the repository and then running baobab (the Gnome disk usage analyzer tool) against the latest snapshot seems to work fine, and gives a nice interactive view of the disk space inside the snapshot (which is IMHO a bit easier to navigate than a long list of big files, since it also helps to find big directories with a lot of small files).
1 Like