Sftp: Estimations wobble a lot

Hi!

I’ve started using restic, and I like what I see so far. Thanks for writing it!

My second repository is a big one (> 1TB) to an SFTP server.

It’s been running for days now, but the estimate is still changing a lot - e.g. a minute ago it was estimating 9h15 left, now it’s at 7h30 and just changed to increasing again. These numbers look unrealistic to me since it managed about 47% in the last 5 days, so it still has days to go.

Has someone looked at getting a better estimator for the ETA?
I think I’m mostly requesting that it converges over time to a realistic estimate. (It can be bad at the start, but as it transfers more data, the estimate should get better, and more stable.)

It’d also be good to add a “days” part for large numbers - ETA 288:15:37 is not a useful number for a standard human :slight_smile:

Thank you,
Thomas

Which restic version are you using? We’ve change the ETA calculation for restic 0.16.0.

As restic deduplicates data before uploading it is pretty much impossible to give a good estimate. restic just does not know how large the upload size for a bunch of files will be.

Hi!
Thanks for the reply.
I’m using restic 0.16.0.
I understand that it’s hard to tell, but right now it looks like it’s mostly estimating from the last minute/file, and if it’s longer running, it could estimate from the last hour or 1000 files.
Then the estimates probably wouldn’t jump around so much.
I must confess I don’t know enough about good ETA algorithms to give advice, it’s just something that jumped into my eye. I can ignore it :slight_smile:
Thanks for your work on restic!
Thomas

ETA depends on many things. Some of those things restic has no control or knowledge about. For example are there other users using the cpu or disk? Are there other machines using the internet connection? At my house the tv goes through the internet. Does the router put a Quality of Service priority on the tv? The internet does not have a consistent throughput.
Also different file types can take longer to process. For example a text file will likely be small and will compress a lot but a VM disk/image will likely be large and not compress well. So the ETA will vary quite a bit depending on what is being backed up at that moment. I have seen on some other software which I can’t remember, had something like three estimates:

  1. based on everything from the beginning.
  2. based on everything in the last period (likely 5 to 10 minutes)
  3. based on the last minute.
    My guess is that a better ETA might be possible but work required to might be a lot. Some data such as how fast is the disk responding, how many bytes per second can be sent through the internet, how fast is the server responding on the other side of the internet might help in estimating but the next user might complain about the overhead of collecting such data.