New archiver code, please test!

Hey, I’ve just merged the long-standing PR #1494 which resolves a major issue regarding the file/directory structure within a snapshot, this was tracked as issue #549.

It completely replaces the core archiver code, and now I’d like you to test this new code! I’m already using it for my personal backups for a few weeks, so rest assure I’ve given it some testing already!

For an overview of what has changed, please have a look at the corresponding changelog entry.

Give it a try and report back here! Make sure to try out restic backup --verbose (which gives much more information what restic does), and even restic backup --verbose --verbose (which will print a detailed output for each file restic saves).

This was about four months of work for me, I’m very eager to get some feedback! Please report back what your experiences are (good or bad, doesn’t matter)!

3 Likes

Working on testing this, unfortunately having a lot of trouble with DO Spaces at the moment. But I will say that the scanning portion appears to be vastly quicker. But until Spaces works fully again, I can’t do great testing but here are my initial results.

Thanks for all the great work as always!

-Adam

open repository
password is correct
lock repository
load index files
using parent snapshot bd60875b
start scan
start backup
scan finished in 91.848s
processed 477.667 MiB in 1:49

Files:        6338 new,     0 changed,     0 unmodified
Dirs:          604 new,     0 changed,     0 unmodified
Added:      11.164 KiB

snapshot 1d30a349 saved

So after further testing, I found out that Restic before these changes was being killed by OOM. It’s a 1GB box that restic took over initially 40% of the RAM and then on actual backup would eat up all the rest of the RAM until it was killed, about 3% of the way into the process).

Now, it still takes up about 40% of the RAM as before, but then when it actually goes to backup the files, it finishes without error and takes up about half the amount of RAM as before. So in that regards alone, it’s a win!

-Adam

1 Like

Just a Backup or are their any specific flow you would like to see?

Just a backup, report what you see and how you like the new output (status and normal), and give -v a try :slight_smile:

I’m testing this now and I’m seeing that every time restic runs, it’s taking much longer than it did before.

What used to take under a minute is now taking over 20 minutes and it appears to be saving all files each time it’s run.

At the completion of the run it tells me that there are 17767 new files, 0 changed and 0 unmodified.

2141 new dirs, 0 changed and 0 unmodified.

Does this look right?

I’m running version 0.8.3 (v0.8.3 -191 -g6d4c40f8 compiled with go1.9.5 on linux and backing up to local USB drive.

For the first run this is expected, the structure of the snapshot is slightly different so restic can’t find the previous files and re-reads everything. For subsequent runs it should be fast. Is that the case?

No, every run that I have done has taken over 20 minutes. The last one took 21 minutes.

Additional information.

open repository
password is correct
lock repository
load index files
start scan
start backup
scan finished in 9.284s
uploaded intermediate index 8bc11e65
processed 247.078 GiB in 21:00

Files:       17780 new,     0 changed,     0 unmodified
Dirs:         2147 new,     0 changed,     0 unmodified
Added:      46.475 MiB

snapshot e077e7b5 saved

That’s odd, restic does not find a parent snapshot. Hm. Can you paste the exact call for restic (including the paths to the exact directories/files you’re saving) somewhere? Or send it to me in a message?

You can try passing the last snapshot ID in --parent <id>, but usually restic should detect that by itself.

It seems that the issue is coming from me using the following to define the path of what I’m backing up:

~/ 

When I use the following, it’s all good.

/home/gumper

I’m quite certain that in the past I always used the first path shown above.

I just changed my normal backup and it seemed to be stuck. I can back to it this evening and killed it:

$ ./x99-daily.sh
open repository
password is correct
found 1 old cache directories in /home/wscott/.cache/restic, pass --cleanup-cache to remove them
lock repository
load index files
using parent snapshot 00941b4f
start scan
start backup
scan finished in 481.368s
[10:26] 0.72%  20593 files 2.869 GiB, total 3156788 files 397.372 GiB, 0 errors ETA 23:54:41
[10:27] 0.72%  20593 files 2.869 GiB, total 3156788 files 397.372 GiB, 0 errors ETA 23:56:58
[12:35] 0.72%  20593 files 2.869 GiB, total 3156788 files 397.372 GiB, 0 errors ETA 28:50:19
[14:42:53] 0.72%  20593 files 2.869 GiB, total 3156788 files 397.372 GiB, 0 errors ETA 2023:25:06
[14:43:09] 0.72%  20593 files 2.869 GiB, total 3156788 files 397.372 GiB, 0 errors ETA 2024:01:46
signal interrupt received, cleaning up

(the separate lines are where I hit enter because it was overwriting the same line)

So in 14 hours it made no progress.

The command line looks like this:

restic backup --verbose  \
    /home/wscott \
    --exclude-caches \
    --exclude-file exclude.list

That normally is pretty fast.

I might have a hint. I am running it again and am watching the process. It has the directory $HOME/.lastpass/pipes open for a suspiciously long time. That directory has UNIX pipe files to talk to the lastpass process and I suspect your new scanner tried to read from them.

EDIT

yeah, when I ignored that directory the backup completed:

$ ./x99-daily.sh
open repository
password is correct
found 1 old cache directories in /home/wscott/.cache/restic, pass --cleanup-cache to remove them
lock repository
load index files
using parent snapshot 00941b4f
start scan
start backup
uploaded intermediate index d74d0235
scan finished in 471.370s
uploaded intermediate index ae851ffa
processed 397.594 GiB in 1:36:01

Files:       3165658 new,     0 changed,     0 unmodified
Dirs:        441782 new,     0 changed,     0 unmodified
Added:      1.067 GiB

snapshot eea5b1bd saved
1 Like

Oh, that sounds like a bug, thanks for the report!

Can confirm I’ve seen the same thing and resulting hang (pipe was within Firefox’s configuration directory in my case), found with lsof -p .

I’ve also seen an “error: OpenFile: open home/scponly/dev/null: permission denied” error for a /dev/null device file (was within a chroot environment for an scponly user, hence not the one in my /dev directly which is excluded from backups), although that didn’t cause a hang.

Aside from that, it seems to be working very well for me, and I very much like the new summary output at the end of the backup.

One small suggestion for the verbose output is that I’d be quite keen to be able to get the end summary out (new/changed files and dirs), without having the extra internals details like scan/backup stages, saving indexes etc. That’s because I’d quite like that summary in my overnight backup script but don’t need to see the extra stuff. Not a big deal if it stays as it is, though.

Many thanks for continuing to work on and improve this project – it’s been the perfect solution for my needs.

Running now latest master here:

# restic version
debug enabled
restic 0.8.3 (v0.8.3-191-g6d4c40f8) compiled with go1.10.1 on linux/amd64

Using B2 as backend with -o b2.connections=2.

It’s been runing now for several hours without doing much. It got stuck now in this state for quite a while (bandwidth consumption only changed by a few hundred KB within the last 2h or so):

● restic@moria.service - Backup moria using 'restic'
   Loaded: loaded (/etc/systemd/system/restic@.service; static; vendor preset: disabled)
  Drop-In: /etc/systemd/system/restic@moria.service.d
           └─bandwidth.conf, debug.conf, runtimemax.conf
   Active: active (running) since Mon 2018-04-30 08:10:15 CEST; 2h 37min ago
 Main PID: 1359 (restic)
       IP: 14.9M in, 748.3M out
    Tasks: 16 (limit: 4915)
   Memory: 4.0G
   CGroup: /system.slice/system-restic.slice/restic@moria.service
           └─1359 /usr/local/bin/restic backup --limit-upload=1600 --cache-dir=/var/cache/restic/moria --one-file-system --exclude-file=/etc/restic/moria/backup.exclude --excl>

Apr 30 08:10:15 moria systemd[1]: Started Backup moria using 'restic'.
Apr 30 08:10:15 moria restic[1359]: debug log file /var/log/restic/moria/debug.log
Apr 30 08:10:15 moria restic[1359]: debug enabled

Looking at the debug log, it keeps repeating those 806 index checks forever:

2018/04/30 10:58:34 repository/master_index.go:157  repository.(*MasterIndex).FullIndexes   2922    checking 806 indexes                                                        
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc4202b2000 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc4202b2100 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc420338000 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc4202b2180 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc42029e080 is final
[…]
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc425d97280 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc426d0ec00 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc42a821400 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc43204a780 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:160  repository.(*MasterIndex).FullIndexes   2922    index 0xc436e3d480 is final                                                 
2018/04/30 10:58:34 repository/master_index.go:172  repository.(*MasterIndex).FullIndexes   2922    return 0 indexes

Checking, how often this number of indexes was checked seems to confirm this:

# grep 'checking \w\+ indexes' /var/log/private/restic/moria/debug.log | awk '{print $7}' | uniq -c
      4 803
     31 804
     15 805
    297 806

…it started with just 4 times checking 803 indexes, checked 15 times 805 indexes but is now ATM stuck at having checked 297 times 806 indexes.

Every now and then, it refreshes locks between the index checks:

2018/04/30 10:58:26 restic/lock.go:77   main.refreshLocks   2876    refreshing locks                                                                                            
2018/04/30 10:58:26 restic/lock.go:223  restic.(*Lock).Refresh  2876    refreshing lock bb149df0                                                                                
2018/04/30 10:58:26 repository/repository.go:263    repository.(*Repository).SaveJSONUnpacked   2876    save new blob lock                                                      
2018/04/30 10:58:26 b2/b2.go:200    b2.(*b2Backend).Save    2876    Save <lock/79e7ca7a8d>, name locks/79e7ca7a8d5c6fcb7d04fb176eca2d528c882a9b2ba8153212801b56c369b882         
2018/04/30 10:58:26 b2/b2.go:205    b2.(*b2Backend).Save    2876      saved 140 bytes, err <nil>                                                                                
2018/04/30 10:58:26 debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP REQUEST -----------                                          
POST /b2api/v1/b2_upload_file/0eedcad4d93e4ac76d0d0d1f/c001_v0001038_t0052 HTTP/1.1                                                                                             
Host: pod-000-1038-01.backblaze.com                                                                                                                                             
User-Agent: blazer/0.3.0                                                                                                                                                        
Content-Length: 140                                                                                                                                                             
Authorization: [REDACTED]
Content-Type: application/octet-stream                                                                                                                                          
X-Blazer-Method: b2_upload_file                                                                                                                                                 
X-Blazer-Request-Id: 748                                                                                                                                                        
X-Bz-Content-Sha1: 98daf2199fe7b6e1d21f93723464b6c97a1e252b                                                                                                                     
X-Bz-File-Name: locks/79e7ca7a8d5c6fcb7d04fb176eca2d528c882a9b2ba8153212801b56c369b882                                                                                          
Accept-Encoding: gzip                                                                                                                                                           
                                                                                                                                                                                
                                                                                                                                                                                
2018/04/30 10:58:27 debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP RESPONSE ----------                                          
HTTP/1.1 200 OK                                                                                                                                                                 
Content-Length: 469                                                                                                                                                             
Cache-Control: max-age=0, no-cache, no-store                                                                                                                                    
Content-Type: application/json;charset=UTF-8                                                                                                                                    
Date: Mon, 30 Apr 2018 08:58:27 GMT                                                                                                                                             
Server: Apache-Coyote/1.1                                                                                                                                                       
                                                                                                                                                                                
                                                                                                                                                                                
2018/04/30 10:58:27 repository/repository.go:291    repository.(*Repository).SaveUnpacked   2876    blob <lock/79e7ca7a8d> saved                                                
2018/04/30 10:58:27 cache/backend.go:37 cache.(*Backend).Remove 2876    cache Remove(<lock/bb149df005>)                                                                         
2018/04/30 10:58:27 b2/b2.go:251    b2.(*b2Backend).Remove  2876    Remove <lock/bb149df005>                                                                                    
2018/04/30 10:58:27 debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP REQUEST -----------                                          
GET /file/backup-moria/locks/bb149df00553226eb6012cd61c770a84b30a3c3e6c1982632fdccec90d5cfbb9 HTTP/1.1                                                                          
Host: f001.backblazeb2.com                                                                                                                                                      
User-Agent: blazer/0.3.0                                                                                                                                                        
Authorization: [REDACTED]
Range: bytes=0-0                                                                                                                                                                
X-Blazer-Method: b2_download_file_by_name                                                                                                                                       
X-Blazer-Request-Id: 749                                                                                                                                                        
                                                                                                                                                                                
                                                                                                                                                                                
2018/04/30 10:58:28 debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP RESPONSE ----------                                          
HTTP/1.1 206 Partial Content                                                                                                                                                    
Content-Length: 1                                                                                                                                                               
Accept-Ranges: bytes                                                                                                                                                            
Cache-Control: max-age=0, no-cache, no-store                                                                                                                                    
Content-Range: bytes 0-0/140                                                                                                                                                    
Content-Type: application/octet-stream                                                                                                                                          
Date: Mon, 30 Apr 2018 08:58:28 GMT                                                                                                                                             
Server: Apache-Coyote/1.1                                                                                                                                                       
X-Bz-Content-Sha1: 75be25c4403469417343cdf4be3fd9a1c3f790ff                                                                                                                     
X-Bz-File-Id: 4_z0eedcad4d93e4ac76d0d0d1f_f107f3a01f630be09_d20180430_m085327_c001_v0001105_t0057                                                                               
X-Bz-File-Name: locks/bb149df00553226eb6012cd61c770a84b30a3c3e6c1982632fdccec90d5cfbb9                                                                                          
X-Bz-Upload-Timestamp: 1525078407000


2018/04/30 10:58:28 debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP REQUEST -----------                                          
POST /b2api/v1/b2_delete_file_version HTTP/1.1                                                                                                                                  
Host: api001.backblazeb2.com                                                                                                                                                    
User-Agent: blazer/0.3.0                                                                                                                                                        
Content-Length: 180                                                                                                                                                             
Authorization: [REDACTED]
X-Blazer-Method: b2_delete_file_version                                                                                                                                         
X-Blazer-Request-Id: 750                                                                                                                                                        
Accept-Encoding: gzip                                                                                                                                                           
                                                                                                                                                                                
                                                                                                                                                                                
2018/04/30 10:58:29 debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 2876    ------------  HTTP RESPONSE ----------                                          
HTTP/1.1 200 OK                                                                                                                                                                 
Content-Length: 190                                                                                                                                                             
Cache-Control: max-age=0, no-cache, no-store                                                                                                                                    
Content-Type: application/json;charset=UTF-8                                                                                                                                    
Date: Mon, 30 Apr 2018 08:58:29 GMT                                                                                                                                             
Server: Apache-Coyote/1.1 

AFAICT it doesn’t seem to be related to the handling of pipes as reported by @wscott:

COMMAND  PID   USER   FD      TYPE             DEVICE   SIZE/OFF     NODE NAME
restic  1359 restic  cwd       DIR              254,0       4096        2 /
restic  1359 restic  rtd       DIR              254,0       4096        2 /
restic  1359 restic  txt       REG              254,0   12296224 11542655 /usr/local/bin/restic
restic  1359 restic    0r      CHR                1,3        0t0     1030 /dev/null
restic  1359 restic    1u     unix 0x00000000b4e60444        0t0  7962977 type=STREAM
restic  1359 restic    2u     unix 0x00000000b4e60444        0t0  7962977 type=STREAM
restic  1359 restic    3w      REG              254,0 2973061813  7604481 /var/log/private/restic/moria/debug.log
restic  1359 restic    4u  a_inode               0,12          0     9886 [eventpoll]
restic  1359 restic    5r      DIR              254,0       4096  3407873 /home
restic  1359 restic    6r      DIR              254,0      12288  3407876 /home/elias
restic  1359 restic    7r      DIR              254,0       4096  3409850 /home/elias/.steam
restic  1359 restic   10u      REG              254,0    1234794  7602370 /var/log/private/restic/moria/restic-temp-pack-723406626 (deleted)
restic  1359 restic   13u      REG              254,0     669728  7602414 /var/log/private/restic/moria/restic-temp-pack-360908825 (deleted)
restic  1359 restic   17u      REG              254,0     102050  7602186 /var/log/private/restic/moria/restic-temp-pack-382709872 (deleted)

Can you please run restic manually with backup -v -v? Let’s see if there’s something there, it will print a line for each file/dir processed, and the status display will contain the list of files restic is currently reading.

For the debug log, you can ignore the index files altogether, it’s just the timer which uploads intermediate indexes. Just ignore the lines :wink: What are the last lines above the index checks?

It looks like we’re dealing with a Heisenbug here - once you try observing it, it’s gone…
Keeping -v -v now in my ExecStart= params to wait whether it happens at some point again.

Thanks a lot for all your work!

I’d assign this bug to an open() hanging on a fifo. Let’s retry with the fixed code.

This will be resolved here:

1 Like