New archiver code, please test!


#21

It’d be great if you could try to reproduce it! This seems to be an odd corner case that I’d like to handle better.

This may well be your shell fooling you, you could try inserting ruby -e 'puts ARGV.inspect' in front of the restic command to see the expanded arguments:

$ ruby -e 'puts ARGV.inspect' restic backup ~

And so on


#22

Maybe adding KillMode=process to the [Service] section changed the behavior (Heisenbug). It’s only a guess.


#23

@fd0 @Smedley_Butler

I have the same issue when leaving a trailing slash ( / ) when backing up my home directory. The ruby command shows the only difference as the trailing slash when using ~/ instead of $HOME.

It may be unrelated, but I’ve noticed that if you leave a trailing forward-slash ‘/’ on the end of the backup directory, restic appears to try to backup each file once again ( or at least reports it as a new file in the summary with -v ).

The following seems to reproduce this behaviour ( You can make $BASE pretty much anything you want, but just note this will make 100 small files, so it should probably be on temporary storage. Also make sure not to use a trailing slash ‘/’ in BASE ):

export BASE=/tmp/test-restic
mkdir $BASE $BASE-back
export RESTIC_REPOSITORY=$BASE-back
for j in {1..100}; do \
  echo $j > $BASE/$j;  \
done;
restic init
restic backup -v $BASE; # Works as Expected - Discovers 100 new files
restic backup -v $BASE; # Works as Expected - Discovers 0 new files/changes
restic backup -v $BASE/; # Note the trailing '/' - Discovers 100 new files...?
restic backup -v $BASE/; # Note the trailing '/' - Discovers 100 new files again...?

This gives the following output for the restic backup commands:
Fine ( Initial Backup, without the trailing / ):

...
Files:         100 new,     0 changed,     0 unmodified
Dirs:            1 new,     0 changed,     0 unmodified
Added:      1.010 KiB

snapshot ecbb2088 saved

Fine ( Second Backup, without the trailing / ):

...
Files:           0 new,     0 changed,   100 unmodified
Dirs:            0 new,     0 changed,     1 unmodified
Added:      0 B  

snapshot 9f741c05 saved

Weird ( Third Backup, with the trailing / ):

Files:         100 new,     0 changed,     0 unmodified
Dirs:            1 new,     0 changed,     0 unmodified
Added:      0 B  

snapshot 98a4e397 saved

The Fourth restic backup ( with the trailing ‘/’ again ) shows the 100 new files again.
The same is true if there are subdirectories as well, each file is considered new.

Sorry for the long post. Thanks for restic though, looks awesome!

Hope this helps,
jedi453


#24

Ah, that’s it! I can reproduce it and I’ll take care of it. Thanks for spending the time to debug/describe this issue!


#25

Will be resolved in https://github.com/restic/restic/pull/1744


#26

One bit of good feedback from my experience: the new archiver seems to improve performance significantly.

My daily backup would previously take about 5 hours, now seems to take around 2.5 hours.

This is a backup of 9 KVM VMs backing up from the host via guestmount of their backing storage mounted from an LVM snapshot (so VMs can continue running during backup, but whole backup across all VMs is consistent) – somewhat complex so difficult to tell where the bottleneck originally was! (approx 2.5M files, 1TB data total, relatively small amount of data churn across machines to actually backup of around 1.6GB).


#27

Is that a machine with spinning discs or SSDs?


#28

All local spinning disks.


#29

Attention: Please don’t use the new archiver for machines which have a hardware watchdog!

I just found out the hard way that the new archiver code (which does open() and then fstat()) starts the hardware watchdog on my Thinkpad, leading to a reboot 30 seconds later!

I need to fix this, but can only do this tomorrow.


#30

Ok, that was easier than I though, fix incoming https://github.com/restic/restic/pull/1751


#31

Woah, I’ve not heard of this before. What is watchdog and what does it do? And what does it have to do with hardware, this seems like a filesystem issue? (Forgive my ignorance – and the off-topic-ness :slight_smile: )


#32

There is a (watch)dog on Mars! https://en.wikipedia.org/wiki/Watchdog_timer
So from what I could gather it’s more of a process. But anybody else with more knowledge please fill us in or point the way


#33

A watchdog is a device (hardware) or implementation (software) which, once started, needs to be reset regularly. Otherwise the machine or device (think IoT appliance) is reset. This is commonly used in embedded devices and microcontrollers to make sure the program does not get stuck, even in the event of a programming error. Once your process running on the machine stops resetting the hardware (maybe because it went into an endless loop or crashed), the complete machine is reset and starts fresh.

Most server hardware (and apparently my Thinkpad) have such a device, for most Intel mainboards it’s a real hardware device which will physically reset the system.


#34

This doesn’t mean that the structure of the backup changes, right? Just the reading is different?

Asking because I live in a very rural area with painfully slow internet (one of the big reasons I chose to try Restic was the deduplication). After a restic backup to a local repo, that repo gets rcloned to b2, and that second part takes months, so if using the new archiver means I’d have to start the upload from scratch, that would be a huge bummer.

PS Thanks for all your hard work on Restic!


#35

Ah, what that meant was that the structure of the snapshots change (look up what #549 means), so if you run restic backup /home/user/work, for the old archiver code it would create a snapshot which contains /work and for the new archiver code the snapshot contains /home/user/work. The data is not re-uploaded to the repo, the deduplication still works the same way, but since the snapshot structure is different restic will likely re-read all files. But that only means local IO and CPU load, and will not cause data to be re-uploaded.

You can look at the size of your local repo before and after you run the first backup with the new archiver code, it shouldn’t grow that much. Please report back how it goes!


#36

Sounds good, and thanks for the quick response. I heard about the new archiver a week or do ago while reviewing #549 trying to sort out an issue with --one-file-system and backup / /mnt (where /mnt is on a different partition / external drive).

Will check out the new archiver soon and see what happens.


#37

/edit: This has been fixed in #1776 - issue was reported in #1775

So I think even after #1744 I am still having some issues when starting a backup with /home/moritz/ as the source directory to back up.
I noticed it as the CPU spiked up and the backup job was running for 20min in a loop which usually takes 13 seconds.

Here is a screenshot of the strace

Command I ran restic -vv -r b2:restic:/ -p /home/moritz/.restic/passphrase backup /home/moritz/ --exclude /home/moritz/.restic* --one-file-system --tag 'systemd /home/moritz' -o b2.connections=20
Then the following output is generated and sits there idling with super high cpu usage

open repository
lock repository
load index files
using parent snapshot a502ea25
start scan on [/home/moritz/]
start backup on [/home/moritz/]

The parent snapshot is the correct, last snapshot for that directory.

I changed the systemd service to /home/moritz so that it works again but I think that’s some sort of regression in the new archiver code.

Or if the behavior changed, then the documentation should be changed to reflect the new behavior.
https://restic.readthedocs.io/en/latest/040_backup.html#including-and-excluding-files

A trailing / is ignored, a leading / anchors the pattern at the root directory. This means, /bin matches /bin/bash but does not match /usr/bin/restic.