Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
When the tomb is created with a canceled context, then the workers
started via `t.Go` exist nearly immediately. Once for the first time all
started goroutines have been stopped, it is not allowed to issue further
calls to `t.Go`. This is a problem when the started goroutines exit
immediately, as for example the first goroutine might already have
stopped before starting the second one, which is not allowed as once the
first goroutines has stopped no goroutines were running.
To fix this race condition the startup and main task of the archiver now
also run within a `t.Go` function. This also allows unifying the error
handling as it is no longer necessary to distinguish between errors
returned by the workers or the saveTree processing. The tomb now just
returns the first error encountered, which should also be the most
descriptive one.
If the context was canceled then saveTree might receive a treeID or not
depending on the timing. This could cause saveTree to incorrectly return
a nil treeID as valid. Fix this always returning an error when the
context was canceled in the meantime.
A canceled background context lets the blob/tree/fileSavers exit
without reporting an error. The error handling previously replaced
a 'context canceled' error received by the main backup method with
the error reported by the savers. However, in case of a canceled
background context that error is nil, causing restic to loose the
error and save a snapshot with a nil tree.
The archiver first called the Select function for a path before checking
whether the Lstat on that path actually worked. The RejectFuncs in
exclude.go worked around this by checking whether they received a nil
os.FileInfo. Checking first is more obvious and requires less code.
Previously the directory stats were reported immediately after calling
`SaveDir`. However, as the latter method saves the tree asynchronously
the stats were still initialized to their nil value. The stats are now
reported via a callback similar to the one used for the fileSaver.
- The SaveBlob method now checks for duplicates.
- Moves handling of pending blobs to MasterIndex.
-> also cleans up pending index entries when they are saved in the index
-> when using SaveBlob no need to care about index any longer
- Always check for full index and save it when storing packs.
-> removes the need of an index uploader
-> also removes the verbose "uploaded intermediate index" messages
- The Flush method now also saves the index
- Fix race condition when checking and saving full/non-finalized indexes
internal/archiver.readdir and internal/fs.ReadDir were unused.
internal/fs.ReadDirNames and internal/archiver.readdirnames were doing
nearly the same thing, except one sorted its output and opened with
fs.O_NOFOLLOW. Both were only used in internal/archiver.
Adds a SelectByName method to the archive and scanner which only require
the filename as input, and can thus be run before calling lstat on the
file. Can speed up scanning significantly if a lot of filename excludes
are used.
This commit changes how the worker goroutines for saving e.g. blobs
interact. Before, it was possible to get stuck sending an instruction to
archive a file or dir when no worker goroutines were available any more.
This commit introduces a `done` channel for each of the worker pools,
which is set to the channel returned by `tomb.Dying()`, so it is closed
when the first worker returned an error.
This commit changes the archiver so that low-level errors saving data to
the repo are returned to the caller (instead of being handled by the
error callback function). This correctly bubbles up errors like a full
temp file system and makes restic abort early and makes all other worker
goroutines exit.
The previous code tried to be as efficient as possible and only do a
single open() on an item to save, and then fstat() on the fd to find out
what the item is (file, dir, other). For normal files, it would then
start reading the data without opening the file again, so it could not
be exchanged for e.g. a symlink.
This behavior starts the watchdog on my machine when /dev is saved
with restic, and after a few seconds, the machine reboots.
This commit reverts the behavior to the strategy the old archiver code
used: run lstat(), then decide what to do. For normal files, open the
file and then run fstat() on the fd to verify it's still a normal file,
then start reading the data.
The downside is that for normal files we now do two stat() calls
(lstat+fstat) instead of only one. On the upside, this does not start
the watchdog. :)
A user discovered[1] that when the backup finishes during the upload of
an intermediate index, the upload is cancelled and the index never fully
saved, but the snapshot is saved and the backup finalizes without an
error. This lead to a situation where a snapshot references data that is
contained in the repo, but not referenced in any index, leading to
strange error messages.
This commit uses a dedicated context to signal the intermediate index
uploading routine to terminate after the last index has been uploaded.
This way, an upload running when the backup finishes is completed before
the routine terminates and the snapshot is saved.
[1] https://forum.restic.net/t/error-loading-tree-check-prune-and-forget-gives-error-b2-backend/406
Index.Has() is a faster then Index.Lookup() for checking if a blob exists
in the index. As the returned data is never used, this avoids a ton
of allocations.
When looking up a blob in the master index, with several
indexes present in the master index, a significant amount of time
is spent generating errors for each failed lookup. However, these
errors are often used to check if a blob is present, but the contents
are not inspected making the overhead of the error not useful.
Instead, change Index.Lookup (and Index.LookupSize) to instead return
a boolean denoting if the blob was found instead of an error. Also change
all the calls to these functions to handle the new function signature.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 820 897 +9.39%
BenchmarkMasterIndexLookupMultipleIndex-6 12821 2001 -84.39%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 5378 492 -90.85%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 17026 1649 -90.31%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 9 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 59 19 -67.80%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 22 6 -72.73%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 72 16 -77.78%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 160 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 3200 240 -92.50%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 1232 48 -96.10%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 4272 128 -97.00%
At the moment when two items to be saved have the same directory name,
restic only saves the first one to the repo. Let's say we have a
structure like this:
dir1
└── subdir
└── file
dir2
└── subdir
└── file
When restic is run on `dir1/subdir` and `dir2/subdir`, it will only save
the first `subdir`:
$ restic backup dir1/subdir dir2/subdir
[...]
$ restic ls -l latest
drwxr-xr-x 1000 100 0 2017-08-27 20:56:39 /subdir
-rw-r--r-- 1000 100 17 2017-08-27 20:56:39 /subdir/file
That's obviously a bad thing, caused by an early decision to strip the
full path to the files/dirs to save and only leave the last directory.
This commit partly resolves this by handling colliding names and
resolving the conflicts. Restic will now append a counter to the file
(`-123`) until the conflict is resolved. So in the example above, we'll
end up with the following structure:
$ restic ls -l latest
drwxr-xr-x 1000 100 0 2017-08-27 20:56:39 /subdir
-rw-r--r-- 1000 100 17 2017-08-27 20:56:39 /subdir/file
drwxr-xr-x 1000 100 0 2017-08-27 20:56:46 /subdir-1
-rw-r--r-- 1000 100 17 2017-08-27 20:56:46 /subdir-1/file
This partly addresses #549 and closes#1179.
At first I thought that the obvious correction would be to archive the
full path. But it turns out that collisions may still occur: Suppose you
have a file named `foo` in the current directory, and the parent directory
also contains a file `foo`. Archiving these with restic also causes a
collision, since restic strips the `../` from the first file:
$ restic backup ../foo foo
This also happens with `tar`, which does not handle the collision and
will happily archive two files called `foo`.
So, the best way forward is to handle name collisions and archive the
whole path. The latter will be tackled in a separate PR.