In principle, the JSON format of Tree objects is extensible without
requiring a format change. In order to not loose information just play
it safe and reject rewriting trees for which we could loose data.
In some rare cases files could be created which contain null IDs (all
zero) in their content list. This was caused by a race condition between
growing the `Content` slice and inserting the blob IDs into it. In some
cases the blob ID was written to the old slice, which a short time
afterwards was replaced with a larger copy, that did not yet contain the
blob ID.
We previously checked whether the set of snapshots might have changed
based only on their number, which fails when as many snapshots are
forgotten as are added. Check for the SHA-256 of their id's instead.
The status bar got stuck once the first error was reported, the scanner
completed or some file was backed up. Either case sets a flag that the
scanner has started.
This flag is used to hide the progress bar until the flag is set. Due to
an inverted condition, the opposite happened and the status stopped
refreshing once the flag was set.
In addition, the scannerStarted flag was not set when the scanner just
reported progress information.
As the FileSaver is asynchronously waiting for all blobs of a file to be
stored, the number of active files is higher than the number of files
from which restic is reading concurrently. Thus to not confuse users,
only display files in the status from which restic is currently reading.
After reading and chunking all data in a file, the FutureFile still has
to wait until the FutureBlobs are completed. This was done synchronously
which results in blocking the file saver and prevents the next file from
being read.
By replacing the FutureBlob with a callback, it becomes possible to
complete the FutureFile asynchronously.
We always need both values, except in a test, so we don't need to lock
twice and risk scheduling in between.
Also, removed the resetting in Done. This copied a mutex, which isn't
allowed. Static analyzers tend to trip over that.
The channel-based algorithm had grown quite complicated. This is easier
to reason about and likely to be more performant with very many
CompleteBlob calls.
As long as only a small fraction of the data in a repository is
rewritten, the keepBlobs set will be rather small after cleaning it up.
As golang maps do not shrink their memory usage, just copy the contents
over to a new map. However, only copy the map if the cleanup removed at
least half the entries.
The set covers necessary, existing and duplicate blobs. This removes the
duplicate sets used to track whether all necessary blobs also exist.
This reduces the memory usage of prune by about 20-30%.
The RetryBackend tests depend on the mock backend. When the Backend
interface is eventually split from the restic package, this will lead to
a dependency cycle between backend and backend/mock. Thus split the
RetryBackend into a separate package to avoid this problem.
Archiver.Save queries the current time multiple times. This commit
removes one of these calls as they showed up while profiling a backup of
a nearly unchanged dataset containing 3 million files.
The string form was presumably useful before the introduction of
layouts, but right now it just makes call sequences and garbage
collection more expensive (the latter because every string contains
a pointer to be scanned).
if x { return true } return false => return x
fmt.Sprintf("%v", x) => fmt.Sprint(x) or x.String()
The fmt.Sprintf idiom is still used in the SecretString tests, where it
serves security hardening.
ID.UnmarshalJSON accepted non-JSON input with ' as the string delimiter.
Also, the error message for non-hex input was less informative than it
could be and it performed too many checks.
Changed ParseID to keep the error messages consistent.