As the `Fatal` error type only includes a string, it becomes impossible
to inspect the contained error. This is for a example a problem for the
fuse implementation, which must be able to detect context.Canceled
errors.
Co-authored-by: greatroar <61184462+greatroar@users.noreply.github.com>
For hardlinked files, only the first instance of that file increases the
amount of bytes to restore. All later instances only increase the file
count but not the restore size.
restic must be able to refresh lock files in time. However, large
uploads over slow connections can cause the lock refresh to be stuck
behind the large uploads and thus time out.
The test had a 4% chance of not modified the data read from the
repository, in which case the test would fail. Change the data
manipulation to just modified each read operation.
This adds support for caching already rewritten trees, handling of load
errors and disabling the check that the serialization doesn't lead to
data loss.
The more generic RewriteNode callback replaces the SelectByName and
PrintExclude functions. The main part of this change is a preparation to
allow using the TreeRewriter for the `repair snapshots` command.
The builtin mechanism to capture a stacktrace in Go is to send a SIGQUIT
to the running process. However, this mechanism is not avaiable on
Windows. Thus, tweak the SIGINT handler to dump a stacktrace if the
environment variable `RESTIC_DEBUG_STACKTRACE_SIGINT` is set.
The SemaphoreBackend now uniformly enforces the limit of concurrent
backend operations. In addition, it unifies the parameter validation.
The List() methods no longer uses a semaphore. Restic already never runs
multiple list operations in parallel.
By managing the semaphore in a wrapper backend, the sections that hold a
semaphore token grow slightly. However, the main bottleneck is IO, so
this shouldn't make much of a difference.
The key insight that enables the SemaphoreBackend is that all of the
complex semaphore handling in `openReader()` still happens within the
original call to `Load()`. Thus, getting and releasing the semaphore
tokens can be refactored to happen directly in `Load()`. This eliminates
the need for wrapping the reader in `openReader()` to release the token.
The snapshot filtering internally converts relative paths to absolute
ones to ensure that the parent snapshots selection works for backups of
relative paths.
x/text/width.LookupRune has to re-encode its argument as UTF-8,
while LookupString operates on the UTF-8 directly.
The uint casts get rid of a bounds check.
Benchmark results, with b.ResetTimer introduced first:
name old time/op new time/op delta
TruncateASCII-8 69.7ns ± 1% 55.2ns ± 1% -20.90% (p=0.000 n=20+18)
TruncateUnicode-8 350ns ± 1% 171ns ± 1% -51.05% (p=0.000 n=20+19)
Since 0.15 (#4020), inodes are generated as hashes of names, xor'd with
the parent inode. That means that the inode of a/b/b is
h(a/b/b) = h(a) ^ h(b) ^ h(b) = h(a).
I.e., the grandchild has the same inode as the grandparent. GNU find
trips over this because it thinks it has encountered a loop in the
filesystem, and fails to search a/b/b. This happens more generally when
the same name occurs an even number of times.
Fix this by multiplying the parent by a large prime, so the combining
operation is not longer symmetric in its arguments. This is what the FNV
hash does, which we used prior to 0.15. The hash is now
h(a/b/b) = h(b) ^ p*(h(b) ^ p*h(a))
Note that we already ensure that h(x) is never zero.
Collisions can still occur, but they should be much less likely to occur
within a single path.
Fixes#4253.
Tests in cmd_forget_test.go need the same convenience function
that was implemented in snapshot_policy_test.go.
Function parseDuration(...) was moved to testing.go and renamed to
ParseDurationOrPanic(...).
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes#4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
This method had a buffer argument, but that was nil at all call sites.
That's removed, and instead LoadUnpacked now reuses whatever it
allocates inside its retry loop.
With debug logging enabled this method would take a lock and then
format the lock as a string. Since PR #4022 landed the string
formatting method has also taken the lock, so this deadlocks.
Instead just record the lock ID, as is done elsewhere.
The code always assumed that the upgrade happens in place. Thus writing
the upgrade to a separate file fails, when trying to remove the file
stored at that location.
Added missing call to scanFinished=true.
This was causing the percent and eta to never get
printed for backup progress even after the scan was finished.
The retry backend does not return the original error, if its execution
is interrupted by canceling the context. Thus, we have to manually
ensure that the invalid data error gets returned.
Additionally, use the retry backend for some of the repository tests, as
this is the configuration which will be used by restic.
The retry printed the filename twice:
```
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/04804cba82>): invalid data returned
```
now the warning has changed to
```
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: invalid data returned
```
The StdioWrapper is not used at all by the ProgressPrinters. It is
called a bit earlier than previously. However, as the password prompt
directly accessed stdin/stdout this doesn't cause problems.
Mostly changed the ones that repeat the name of a system call, which is
already contained in os.PathError.Op. internal/fs.Reader had to be
changed to actually return such errors.
TestRepository and its variants always returned no-op cleanup functions.
If they ever do need to do cleanup, using testing.T.Cleanup is easier
than passing these functions around.
We now check for space that is not reserved for the root user on the
remote, and the check is no longer in a defer block because it wouldn't
fire. Some change in the surrounding code may have led the deferred
function to capture the wrong err variable.
Fixes#3336.
IDs.Less can be rewritten as
string(list[i][:]) < string(list[j][:])
Note that this does not copy the ID's.
The Uniq method was no longer used.
The String method has been reimplemented without first copying into a
separate slice of a custom type.
The Test method was only used in exactly one place, namely when trying
to create a new repository it was used to check whether a config file
already exists.
Use a combination of Stat() and IsNotExist() instead.
Since #3940 the rclone backend returns the commands exit code if it
fails to start. The list of expected errors was missing the "file
already closed"-error which can occur if the http test request first
learns about the closed pipe to rclone before noticing the canceled
context.
Go internally makes sure that a file descriptor is unusable once it was
closed, thus this cannot have unintended side effects (like accidentally
reading from the wrong file due to a reused file descriptor).
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.
This commit only mechanically replaces the function calls.
Without comma-ok, the runtime inserts the same check with a similar
enough panic message:
interface conversion: interface {} is nil, not *syscall.Stat_t
The new genericized LRU cache no longer needs to have the IDs separately
allocated:
name old time/op new time/op delta
Add-8 494ns ± 2% 388ns ± 2% -21.46% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Add-8 176B ± 0% 152B ± 0% -13.64% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Add-8 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10)
Hard links to the same file now get the same inode within the FUSE
mount. Also, inode generation is faster and, more importantly, no longer
allocates.
Benchmarked on Linux/amd64. Old means the benchmark with
sink = fs.GenerateDynamicInode(1, sub.node.Name)
instead of calling inodeFromNode. Results:
name old time/op new time/op delta
Inode/no_hard_links-8 137ns ± 4% 34ns ± 1% -75.20% (p=0.000 n=10+10)
Inode/hard_link-8 33.6ns ± 1% 9.5ns ± 0% -71.82% (p=0.000 n=9+8)
name old alloc/op new alloc/op delta
Inode/no_hard_links-8 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
Inode/hard_link-8 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
Inode/no_hard_links-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Inode/hard_link-8 0.00 0.00 ~ (all equal)
In principle, the JSON format of Tree objects is extensible without
requiring a format change. In order to not loose information just play
it safe and reject rewriting trees for which we could loose data.
The lock test creates a lock and checks that it is not stale. However,
it is possible that the lock is refreshed concurrently, which updates
the lock timestamp. Checking the timestamp in `Stale()` without
synchronization results in a data race. Thus add a lock to prevent
concurrent accesses.
The lock test creates a lock and checks that it is not stale. This also
tests whether the corresponding process still exists. However, it is
possible that the lock is refreshed concurrently, which updates the lock
timestamp. Calling `processExists()` with a value receiver, however,
creates an unsynchronized copy of this field. Thus call the method using
a pointer receiver.
In some rare cases files could be created which contain null IDs (all
zero) in their content list. This was caused by a race condition between
growing the `Content` slice and inserting the blob IDs into it. In some
cases the blob ID was written to the old slice, which a short time
afterwards was replaced with a larger copy, that did not yet contain the
blob ID.
Automatically fall back to hiding files if not authorized to permanently
delete files. This allows using restic with an append-only application
key with B2. Thus, an attacker cannot directly delete backups with the
API key used by restic.
To use this feature create an application key without the deleteFiles
capability. It is recommended to restrict the key to just one bucket.
For example using the b2 command line tool:
b2 create-key --bucket <bucketName> <keyName> listBuckets,readFiles,writeFiles,listFiles
Suggested-by: Daniel Gröber <dxld@darkboxed.org>
We previously checked whether the set of snapshots might have changed
based only on their number, which fails when as many snapshots are
forgotten as are added. Check for the SHA-256 of their id's instead.
The status bar got stuck once the first error was reported, the scanner
completed or some file was backed up. Either case sets a flag that the
scanner has started.
This flag is used to hide the progress bar until the flag is set. Due to
an inverted condition, the opposite happened and the status stopped
refreshing once the flag was set.
In addition, the scannerStarted flag was not set when the scanner just
reported progress information.
As the FileSaver is asynchronously waiting for all blobs of a file to be
stored, the number of active files is higher than the number of files
from which restic is reading concurrently. Thus to not confuse users,
only display files in the status from which restic is currently reading.
After reading and chunking all data in a file, the FutureFile still has
to wait until the FutureBlobs are completed. This was done synchronously
which results in blocking the file saver and prevents the next file from
being read.
By replacing the FutureBlob with a callback, it becomes possible to
complete the FutureFile asynchronously.
We always need both values, except in a test, so we don't need to lock
twice and risk scheduling in between.
Also, removed the resetting in Done. This copied a mutex, which isn't
allowed. Static analyzers tend to trip over that.
The channel-based algorithm had grown quite complicated. This is easier
to reason about and likely to be more performant with very many
CompleteBlob calls.
As long as only a small fraction of the data in a repository is
rewritten, the keepBlobs set will be rather small after cleaning it up.
As golang maps do not shrink their memory usage, just copy the contents
over to a new map. However, only copy the map if the cleanup removed at
least half the entries.
The set covers necessary, existing and duplicate blobs. This removes the
duplicate sets used to track whether all necessary blobs also exist.
This reduces the memory usage of prune by about 20-30%.
The RetryBackend tests depend on the mock backend. When the Backend
interface is eventually split from the restic package, this will lead to
a dependency cycle between backend and backend/mock. Thus split the
RetryBackend into a separate package to avoid this problem.
Archiver.Save queries the current time multiple times. This commit
removes one of these calls as they showed up while profiling a backup of
a nearly unchanged dataset containing 3 million files.
The string form was presumably useful before the introduction of
layouts, but right now it just makes call sequences and garbage
collection more expensive (the latter because every string contains
a pointer to be scanned).
if x { return true } return false => return x
fmt.Sprintf("%v", x) => fmt.Sprint(x) or x.String()
The fmt.Sprintf idiom is still used in the SecretString tests, where it
serves security hardening.
ID.UnmarshalJSON accepted non-JSON input with ' as the string delimiter.
Also, the error message for non-hex input was less informative than it
could be and it performed too many checks.
Changed ParseID to keep the error messages consistent.
FindFilteredSnapshots no longer prints errors during snapshot loading on
stderr, but instead passes the error to the callback to allow the caller
to decide on what to do.
In addition, it moves the logic to handle an explicit snapshot list from
the main package to restic.
The helper function uidGidInt used strconv.ParseInt instead of
ParseUint, so it silently ignored some invalid user/group IDs.
Also, improve the error message. "Invalid UID" is more informative than
having "ParseInt" twice (*strconv.NumError displays the function name).
Finally, the user.User struct can be passed by pointer to get reduce
code size.
This package is no longer needed, since we can use the stdlib's
http.NewRequestWithContext.
backend/rclone already did, but it needed a different error check due to
a difference between net/http and ctxhttp.
Also, store the http.Client by value in the REST backend (changed to a
pointer when ctxhttp was introduced) and use errors.WithStack instead
of errors.Wrap where the message was no longer accurate. Errors from
http.NewRequestWithContext will start with "net/http" or "net/url", so
they're easy to identify.
The backup command failed if a directory contains duplicate entries.
Downgrade the severity of this problem from fatal error to a warning.
This allows users to still create a backup.
SaveTree did not use the TreeSaver but rather managed the tree
collection and upload itself. This prevents using the parallelism
offered by the TreeSaver and duplicates all related code. Using the
TreeSaver can provide some speed-ups as all steps within the backup tree
now rely on FutureNodes. This can be especially relevant for backups
with large amounts of explicitly specified files.
The main difference between SaveTree and SaveDir is, that only the
former can save tree blobs in which nodes have a different name than the
actual file on disk. This is the result of resolving name conflicts
between multiple files with the same name. The filename that must be
used within the snapshot is now passed directly to
restic.NodeFromFileInfo. This ensures that a FutureNode already contains
the correct filename.
According to the documentation of exec.Cmd Wait() must not be called
before completing all reads from the pipe returned by StdErrPipe(). Thus
return a context that is canceled once rclone has exited and use that as
a precondition to calling Wait(). This should ensure that all errors
printed to stderr have been copied first.
When rclone fails during the connection setup this currently often
results in a context canceled error. Replace this error with the exit
code from rclone.
When backing up many small files, the unbuffered channels frequently
cause the FileSaver to block when reporting progress information. Thus,
add buffers to these channels to avoid unnecessary scheduling.
As the status information is purely informational, it doesn't matter
that the status reporting shutdown is somewhat racy and could miss a few
final updates.
The only use cases in the code were in errors.IsFatal, backend/b2,
which needs a workaround, and backend.ParseLayout. The last of these
requires all backends to implement error unwrapping in IsNotExist.
All backends except gs already did that.
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.
The checker gets a simpler replacement.
While searching for lock file from concurrently running restic
instances, restic ignored unreadable lock files. These can either be
in fact invalid or just be temporarily unreadable. As it is not really
possible to differentiate between both cases, just err on the side of
caution and consider the repository as already locked.
The code retries searching for other locks up to three times to smooth
out temporarily unreadable lock files.
Restic continued e.g. a backup task even when it failed to renew the
lock or failed to do so in time. For example if a backup client enters
standby during the backup this can allow other operations like `prune`
to run in the meantime (after calling `unlock`). After leaving standby
the backup client will continue its backup and upload indexes which
refer pack files that were removed in the meantime.
This commit introduces a goroutine explicitly monitoring for locks that
are not refreshed in time. To simplify the implementation there's now a
separate goroutine to refresh the lock and monitor for timeouts for each
lock. The monitoring goroutine would now cause the backup to fail as the
client has lost it's lock in the meantime.
The lock refresh goroutines are bound to the context used to lock the
repository initially. The context returned by `lockRepo` is also
cancelled when any of the goroutines exits. This ensures that the
context is cancelled whenever for any reason the lock is no longer
refreshed.
Some backends generate additional files for each existing file, e.g.
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef.sha256
For some commands this leads to an "multiple IDs with prefix" error when
trying to reference a snapshot.
Failing to process data requested from the cache usually indicates a
problem with the returned data. Assume that the cache entry is somehow
damaged and retry downloading it once.
Sparse files contain large regions containing only zero bytes. Checking
that a blob only contains zeros is possible with over 100GB/s for modern
x86 CPUs. Calculating sha256 hashes is only possible with 500MB/s (or
2GB/s using hardware acceleration). Thus we can speed up the hash
calculation for all zero blobs (which always have length
chunker.MinSize) by checking for zero bytes and then using the
precomputed hash.
The all zeros check is only performed for blobs with the minimal chunk
size, and thus should add no overhead most of the time. For chunks which
are not all zero but have the minimal chunks size, the overhead will be
below 2% based on the above performance numbers.
This allows reading sparse sections of files as fast as the kernel can
return data to us. On my system using BTRFS this resulted in about
4GB/s.
The restorer can issue multiple calls to WriteAt in parallel. This can
result in unexpected orderings of the Truncate and WriteAt calls and
sometimes too short restored files.
We can either preallocate storage for a file or sparsify it. This
detects a pack file as sparse if it contains an all zero block or
consists of only one block. As the file sparsification is just an
approximation, hide it behind a `--sparse` parameter.
This writes files by using (*os.File).Truncate, which resolves to the
truncate system call on Unix.
Compared to the naive loop,
for _, b := range p {
if b != 0 {
return false
}
}
the optimized allZero is about 10× faster:
name old time/op new time/op delta
AllZero-8 1.09ms ± 1% 0.09ms ± 1% -92.10% (p=0.000 n=10+10)
name old speed new speed delta
AllZero-8 3.84GB/s ± 1% 48.59GB/s ± 1% +1166.51% (p=0.000 n=10+10)
`restic unlock` now only shows `successfully removed locks` if there were locks to be removed.
In addition, it also reports the number of the removed lock files.
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.
> name old time/op new time/op delta
> MasterIndexEach-16 6.68s ±24% 0.96s ± 2% -85.64% (p=0.008 n=5+5)
This is especially useful if ssh asks for a password or if closing the
initial connection could return an error due to a problematic server
implementation.
bazil/fuse expects us to return context.Canceled to signal that a
syscall was successfully interrupted. Returning a wrapped version of
that error however causes the fuse library to signal an EIO (input/output
error). Thus unwrap context.Canceled errors before returning them.
rclone can exit early for example when the connection to rclone is
relayed for example via ssh: `-o rclone.program='ssh user@example.org
forced-command'`
For backends which are able to atomically replace files, we just can
overwrite the old copy, if it is necessary to retry an upload. This has
the benefit of issuing one operation less and might be beneficial if a
backend storage, due to bugs or similar, could mix up the order of the
upload and delete calls.
When hard deleting the latest file version on B2, this uncovers earlier
versions. If an upload required retries, multiple version might exist
for a file. Thus to reliably delete a file, we have to remove all
versions of it.
Ignored packs were reported as an empty pack by EachByPack. The most
immediate effect of this is that the progress bar for rebuilding the
index reports processing more packs than actually exist.
Previously the buffer was grown incrementally inside `repo.LoadUnpacked`.
But we can do better as we already know how large the index will be.
Allocate a bit more memory to increase the chance that the buffer can be
reused in the future.
Instead of first checking whether a file is in the repository cache and
then opening it, we just can open the file. This saves one stat call. If
the file is in the cache, everything is fine and otherwise the code
follows its normal fallback path.
sort.Sort is not guaranteed to be stable. Go 1.19 has changed the
sorting algorithm which resulted in changes of the sort order. When
comparing snapshots with identical timestamp but different paths and
tags lists, there is not meaningful order among them. So just keep their
order stable.
Cleanly separate the directory presentation and the snapshot directory
structure. SnapshotsDir now translates the dirStruct into a format
usable by the fuse library and contains only minimal special case rules.
All decisions have moved into SnapshotsDirStructure which now creates a
fully preassembled tree data structure.
For large pack sizes we might be only interested in the first and last
blob of a pack file. Thus stream a pack file in multiple parts if the
gaps between requested blobs grow too large.
Also make the errors a bit less verbose by not prepending the operation,
since pkg/xattr already does that. Old errors looked like
Listxattr: xattr.list /myfiles/.zfs/snapshot: invalid argument
pkg/sftp.Client.MkdirAll(d) does a Stat to determine if d exists and is
a directory, then a recursive call to create the parent, so the calls
for data/?? each take three round trips. Doing a Mkdir first should
eliminate two round trips for 255/256 data directories as well as all
but one of the top-level directories.
Also, we can do all of the calls concurrently. This may reintroduce some
of the Stat calls when multiple goroutines try to create the same
parent, but at the default number of connections, that should not be
much of a problem.
FutureBlob now uses a Take() method as a more memory-efficient way to
retrieve the futures result. In addition, futures are now collected
while saving the file. As only a limited number of blobs can be queued
for uploading, for a large file nearly all FutureBlobs already have
their result ready, such that the FutureBlob object just consumes
memory.
There is no real difference between the FutureTree and FutureFile
structs. However, differentiating both increases the size of the
FutureNode struct.
The FutureNode struct is now only 16 bytes large on 64bit platforms.
That way is has a very low overhead if the corresponding file/directory
was not processed yet.
There is a special case for nodes that were reused from the parent
snapshot, as a go channel seems to have 96 bytes overhead which would
result in a memory usage regression.
After the `BlobSaver` job is submitted, the buffer can be released and
reused by another `FileSaver` even before `BlobSaver.Save` returns. That
FileSaver will the change `buf.Data` leading to wrong backup statistics.
Found by `go test -race ./...`:
WARNING: DATA RACE
Write at 0x00c0000784a0 by goroutine 41:
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:176 +0x789
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Previous read at 0x00c0000784a0 by goroutine 29:
github.com/restic/restic/internal/archiver.(*BlobSaver).Save()
/home/michael/Projekte/restic/restic/internal/archiver/blob_saver.go:57 +0x1dd
github.com/restic/restic/internal/archiver.(*BlobSaver).Save-fm()
<autogenerated>:1 +0xac
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:191 +0x855
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
Use only a single not completed pack file to keep the number of open and
active pack files low. The main change here is to defer hashing the pack
file to the upload step. This prevents the pack assembly step to become
a bottleneck as the only task is now to write data to the temporary pack
file.
The tests are cleaned up to no longer reimplement packer manager
functions.
Now with the asynchronous uploaders there's no more benefit from using
more blob savers than we have CPUs. Thus use just one blob saver for
each CPU we are allowed to use.
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
As MergeFinalIndex and index uploads can occur concurrently, it is
necessary for MergeFinalIndex to check whether the IDs for an index were
already set before merging it. Otherwise, we'd loose the ID of an index
which is set _after_ uploading it.
The GlobalOptions struct now embeds a backend.TransportOptions, so it
doesn't need to construct one in open and create. The upload and
download limits are similarly now a struct in internal/limiter that is
embedded in GlobalOptions.
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.
Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
... called backend/sema. I resisted the temptation to call the main
type sema.Phore. Also, semaphores are now passed by value to skip a
level of indirection when using them.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
When given a buf that is big enough for a compressed blob but not its
decompressed contents, the copy at the end of LoadBlob would skip the
last part of the contents.
Fixes#3783.
This isn't doing anything. Channels should get cleaned up by the GC when
the last reference to them disappears, just like all other data
structures. Also inlined BufferPool.Put in Buffer.Release, its only
caller.
fd05037e1a changed the allocation batch
size from 256 to 128 under the assumption that an indexEntry is 60 bytes
on amd64, but it's 64: structs are padded out to a multiple of 8 for
alignment reasons. That means we'd waste no space in malloc even without
the batch allocation, at least on 64-bit machines. While that strategy
cuts the overallocation down dramatically for many small indexes, it also
seems to slow allocation down (Go 1.18, Linux, amd64, -benchtime=2s):
name old time/op new time/op delta
DecodeIndex-8 4.67s ± 5% 4.60s ± 1% ~ (p=0.953 n=10+5)
DecodeIndexParallel-8 4.67s ± 3% 4.60s ± 1% ~ (p=0.953 n=10+5)
IndexHasUnknown-8 37.8ns ± 8% 36.5ns ±14% ~ (p=0.841 n=5+5)
IndexHasKnown-8 38.5ns ±12% 37.7ns ±10% ~ (p=0.968 n=5+5)
IndexAlloc-8 615ms ±18% 607ms ± 1% ~ (p=1.000 n=10+5)
IndexAllocParallel-8 245ms ±11% 285ms ± 6% +16.40% (p=0.001 n=10+5)
MasterIndexAlloc-8 286ms ± 9% 275ms ± 2% ~ (p=1.000 n=10+5)
LoadIndex/v1-8 27.0ms ± 4% 26.8ms ± 1% ~ (p=0.690 n=5+5)
LoadIndex/v2-8 22.4ms ± 1% 22.8ms ± 2% +1.48% (p=0.016 n=5+5)
name old alloc/op new alloc/op delta
IndexAlloc-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.000 n=8+4)
IndexAllocParallel-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.008 n=8+5)
MasterIndexAlloc-8 213MB ± 0% 159MB ± 0% -25.47% (p=0.000 n=10+5)
name old allocs/op new allocs/op delta
IndexAlloc-8 913k ± 0% 2632k ± 0% +188.19% (p=0.008 n=5+5)
IndexAllocParallel-8 913k ± 0% 2632k ± 0% +188.21% (p=0.008 n=5+5)
MasterIndexAlloc-8 318k ± 0% 1172k ± 0% +267.86% (p=0.008 n=5+5)
Instead, this patch sets a batch size of 4, which means no space is
wasted by malloc on 64-bit and very little on 32-bit. It still gets very
close to the savings from not allocating in batches, without requiring
special code for bits.UintSize==64. Benchmark results, again for
Linux/amd64:
name old time/op new time/op delta
DecodeIndex-8 4.67s ± 5% 4.83s ± 9% ~ (p=0.315 n=10+10)
DecodeIndexParallel-8 4.67s ± 3% 4.68s ± 4% ~ (p=0.315 n=10+10)
IndexHasUnknown-8 37.8ns ± 8% 44.5ns ±19% ~ (p=0.095 n=5+5)
IndexHasKnown-8 38.5ns ±12% 36.9ns ± 8% ~ (p=0.690 n=5+5)
IndexAlloc-8 615ms ±18% 628ms ±18% ~ (p=0.218 n=10+10)
IndexAllocParallel-8 245ms ±11% 262ms ± 9% +7.02% (p=0.043 n=10+10)
MasterIndexAlloc-8 286ms ± 9% 287ms ±13% ~ (p=1.000 n=10+10)
LoadIndex/v1-8 27.0ms ± 4% 26.8ms ± 0% ~ (p=1.000 n=5+5)
LoadIndex/v2-8 22.4ms ± 1% 22.5ms ± 0% ~ (p=0.056 n=5+5)
name old alloc/op new alloc/op delta
IndexAlloc-8 446MB ± 0% 446MB ± 0% ~ (p=1.000 n=8+10)
IndexAllocParallel-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.000 n=8+8)
MasterIndexAlloc-8 213MB ± 0% 160MB ± 0% -25.02% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
IndexAlloc-8 913k ± 0% 1333k ± 0% +45.94% (p=0.000 n=8+10)
IndexAllocParallel-8 913k ± 0% 1333k ± 0% +45.94% (p=0.000 n=8+8)
MasterIndexAlloc-8 318k ± 0% 525k ± 0% +64.99% (p=0.000 n=10+10)
The allocation method indexmap.newEntry has also been rewritten in a
form that is a few instructions shorter.
Apparently SMB/CIFS on Linux/macOS returns somewhat random errnos when
trying to sync a windows share which does not support calling fsync for
a directory.
This functionality has gone unused since
4b3dc415ef changed hashing.Reader's only
client to use ioutil.ReadAll on a bufio.Reader wrapping the hashing
Reader.
Reverts bcb852a8d0.
This removes RunWorkers, which had become mere overhead by successive
refactors. It also ensures that each former user of that function
returns any context error that occurs, so failure to complete an
operation is always reported as an error.
Apparently it can take a moment between closing a tempfile marked as
DELETE_ON_CLOSE and it actually being deleted. During that time the file
is inaccessible. Thus just skip deleting the temp file on windows.
Tree packs are cached locally at clients and thus benefit a lot from
being compressed. Ensure this be having prune always repack pack files
containing uncompressed trees.
A compressed index is only about one third the size of an uncompressed
one. Thus increase the number of entries in an index to avoid cluttering
the repository with small indexes.
The config file is not compressed as it should remain readable by older
restic versions such that these can return a proper error.
As the old format for unpacked data does not include a version header,
make use of a trick: The old data is always encoded as JSON. Thus it can
only start with '{' or '['. For any other value the first byte indicates
a versioned format. The version is set to 2 for now. Then the zstd
compressed data follows.
As repack streams packs these occupy one backend connection. Uploading a
new pack also requires a backend connection. To prevent a deadlock
during repack when reaching the backend connections limit, simply limit
the repackWorker count to always leave one connection for uploading.
* Write new file payload to a temp file before touching the original
binary. Minimizes the possibility of failing mid-write and corrupting
the binary.
* On Windows, move the original binary out to a temp file rather than
removing it as the running binary is locked. Fixes issue #2248.
When resolving snapshotIDs in FindFilteredSnapshots either
FindLatestSnapshot or FindSnapshot is called. Both operations issue a
list operation to the backend. When for example passing a long list of
snapshot ids to `forget` this could lead to a large number of list
operations.
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
Fixes#3687. Uses the cast suggested by @MichaelEischer, except that the
contant isn't cast along, because it's untyped and will be converted by
the compiler as necessary.
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
This is quite similar to gitignore. If a pattern is suffixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with `--exclude-file` to cancel the exclusion of some files.
Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep `*.c` in some directory should not use:
~/work
!~/work/*.c
But:
~/work/*
!~/work/*.c
I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:
$HOME/**/*
!$HOME/Documents
!$HOME/code
!$HOME/.emacs.d
!$HOME/games
# [...]
node_modules
*~
*.o
*.lo
*.pyc
# [...]
$HOME/code/linux/*
!$HOME/code/linux/.git
# [...]
There are some limitations for this change:
- Patterns are not mixed accross methods: patterns from file are
handled first and if a file is excluded with this method, it's not
possible to reinclude it with `--exclude !something`.
- Patterns starting with `!` are now interpreted as a negative
pattern. I don't think anyone was relying on that.
- The whole list of patterns is walked for each match. We may
optimize later by exiting early if we know no pattern is starting
with `!`.
Fix#233
Load tree blobs with more than 50MB only from a single goroutine. Very
large tree blobs with for example 400 MB size can otherwise require
roughly 1GB * streamTreeParallelism memory.
Create a temporary file with a sufficiently random name to essentially
avoid any chance of conflicts. Once the upload has finished remove the
temporary suffix. Interrupted upload thus will be ignored by restic.
This ensures that restic won't create lots of new lock files without
deleting them later on.
In some cases a Delete operation on a backend can return a "File does
not exist" error even though the Delete operation succeeded. This can
for example be caused by request retries. This caused restic to forget
about the new lock file and continue trying to remove the old (already
deleted) lock file.
The function supports efficiently loading a specified list of blobs from
a single pack in a streaming fashion. That is there's no need for
temporary files independent of the pack size.
The archiver uses FS.OpenFile, where FS is an instance of the FS
interface. This is different from fs.OpenFile, which uses the OpenFile
method provided by the fs package.
Citing Kerrisk, The Linux Programming Interface:
The O_NOATIME flag is intended for use by indexing and backup
programs. Its use can significantly reduce the amount of disk
activity, because repeated disk seeks back and forth across the
disk are not required to read the contents of a file and to update
the last access time in the file’s i-node[.]
restic used to do this, but the functionality was removed along with the
fadvise call in #670.
Currently, `restic backup` (if a `--parent` is not provided)
will choose the most recent matching snapshot as the parent snapshot.
This makes sense in the usual case,
where we tag the snapshot-being-created with the current time.
However, this doesn't make sense if the user has passed `--time`
and is currently creating a snapshot older than the latest snapshot.
Instead, choose the most recent snapshot
which is not newer than the snapshot-being-created's timestamp,
to avoid any time travel.
Impetus for this change:
I'm using restic for the first time!
I have a number of existing BTRFS snapshots
I am backing up via restic to serve as my initial set of backups.
I initially `restic backup`'d the most recent snapshot to test,
then started backing up each of the other snapshots.
I noticed in `restic cat snapshot <id>` output
that all the remaining snapshots have the most recent as the parent.
The missing eof with http2 when a response included a content-length
header but no data, has been fixed in golang 1.17.3/1.16.10. Therefore
just drop the canary test and schedule it for removal once go 1.18 is
required as minimum version by restic.
Because there is no guarantee that a cleanup of these directories will occur
after the "restic check", we extend the behavior to detect and manage these
specific cache directories and allow their cleanup too.
Package internal/dump has been reworked so its API consists of a single
type Dumper that handles tar and zip formats. Tree loading and node
writing happen concurrently.
When deleting a file, B2 sometimes returns a "500 Service Unavailable"
error but nevertheless correctly deletes the file. Due to retries in
the B2 library blazer, we sometimes also see a "400 File not present"
error. The retries of restic for the delete request then fail with
"404 File with such name does not exist.".
As we have to rely on request retries in a distributed system to handle
temporary errors, also consider a delete request to be successful if the
file is reported as not existing. This should be safe as B2 claims to
provide a strongly consistent bucket listing and thus a missing file
shouldn't mysteriously show up again later on.
restic dump uses bloblru.Cache to keep buffers alive, but also reuses
evicted buffers. That means large buffers may be used to store small
blobs, causing the cache to think it's using less memory than it
actually does.
This can be caused when the test has uploaded four blobs, then queues
two blobs for upload which are delayed. Then a seventh file can be
opened which lead to a test failure.
The rest config normally uses prepareURL to sanitize URLs and ensures
that the URL ends with a slash. However, the test used an URL without a
trailing slash, which after the rest server changes causes test
failures.
For files below 256MB this uses the md5 hash calculated while assembling
the pack file. For larger files the hash for each 100MB part is
calculated on the fly. That hash is also reused as temporary filename.
As restic only uploads encrypted data which includes among others a
random initialization vector, the file hash shouldn't be susceptible to
md5 collision attacks (even though the algorithm is broken).
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
This can be used to check how large a backup is or validate exclusions.
It does not actually write any data to the underlying backend. This is
implemented as a simple overlay backend that accepts writes without
forwarding them, passes through reads, and generally does the minimal
necessary to pretend that progress is actually happening.
Fixes#1542
Example usage:
$ restic -vv --dry-run . | grep add
new /changelog/unreleased/issue-1542, saved in 0.000s (350 B added)
modified /cmd/restic/cmd_backup.go, saved in 0.000s (16.543 KiB added)
modified /cmd/restic/global.go, saved in 0.000s (0 B added)
new /internal/backend/dry/dry_backend_test.go, saved in 0.000s (3.866 KiB added)
new /internal/backend/dry/dry_backend.go, saved in 0.000s (3.744 KiB added)
modified /internal/backend/test/tests.go, saved in 0.000s (0 B added)
modified /internal/repository/repository.go, saved in 0.000s (20.707 KiB added)
modified /internal/ui/backup.go, saved in 0.000s (9.110 KiB added)
modified /internal/ui/jsonstatus/status.go, saved in 0.001s (11.055 KiB added)
modified /restic, saved in 0.131s (25.542 MiB added)
Would add to the repo: 25.892 MiB
Add comment that the check is based on the stdlib HTTP2 client. Refactor
the checks into a function. Return an error if the value in the
Content-Length header cannot be parsed.
Ensure that only snapshots made in the past are taken into account when running restic forget with the within switches (--keep-within, --keep-within- hourly, and friends)
Allow keeping hourly/daily/weekly/monthly/yearly snapshots for a given time period.
This adds the following flags/parameters to restic forget:
--keep-within-hourly duration
--keep-within-daily duration
--keep-within-weekly duration
--keep-within-monthly duration
--keep-within-yearly duration
Includes following changes:
- Add tests for --keep-within-hourly (and friends)
- Add documentation for --keep-within-hourly (and friends)
- Add changelog for --keep-within-hourly (and friends)
The first test function ensures that the workaround works as expected.
And the second test function is intended to fail as soon as the issue
has been fixed in golang to allow us to eventually remove the
workaround.
The golang http client does not return an error when a HTTP2 reply
includes a non-zero content length but does not return any data at all.
This scenario can occur e.g. when using rclone when a file stored in a
backend seems to be accessible but then fails to download.
Failed pack/blob downloads should be retried. For blobs that fail
decryption assume that the pack file is really damaged and try to
restore the remaining blobs.
* Stop prepending the operation name: it's already part of os.PathError,
leading to repetitive errors like "Chmod: chmod /foo/bar: operation not
permitted".
* Use errors.Is to check for specific errors.
Since the fileInfos are returned in a []interface, they're already
allocated on the heap. Making them pointers explicitly means the
compiler doesn't need to generate fileInfo and *fileInfo versions of the
methods on this type. The binary becomes about 7KiB smaller on
Linux/amd64.
mintty on windows always uses pipes to connect stdout between processes
and for the terminal output. The previous implementation always assumed
that stdout connected to a pipe means that stdout is displayed on a
mintty terminal. However, this detection breaks when using pipes to
connect processes and for powershell which uses pipes when redirecting
to a file.
Now the pipe filename is queried and matched against the pattern used by
msys / cygwin when connected to the terminal. In all other cases assume
that a pipe is just a regular pipe.
Previously the progress bar / status update interval used
stdoutIsTerminal to determine whether it is possible to update the
progress bar or not. However, its implementation differed from the
detection within the backup command which included additional checks to
detect the presence of mintty on Windows. mintty behaves like a terminal
but uses pipes for communication.
This adds stdoutCanUpdateStatus() which calls the same terminal detection
code used by backup. This ensures that all commands consistently switch
between interactive and non-interactive terminal mode.
stdoutIsTerminal() now also returns true whenever stdoutCanUpdateStatus()
does so. This is required to properly handle the special case of mintty.
The error returned when finishing the upload of an object was dropped.
This could cause silent upload failures and thus data loss in certain
cases. When a MD5 hash for the uploaded blob is specified, a wrong
hash/damaged upload would return its error via the Close() whose error
was dropped.
The azureAdapter was used directly without a pointer, but the Len()
method was only defined with a pointer receiver (which means Len() is
not present on a azureAdapter{}, only on a pointer to it).
Bugs in the error handling while uploading a file to the backend could
cause incomplete files, e.g. https://github.com/golang/go/issues/42400
which could affect the local backend.
Proactively add sanity checks which will treat an upload as failed if
the reported upload size does not match the actual file size.
Before, the scanner would could files twice if they were included in the
list of backup targets twice, e.g. `restic backup foo foo/bar` would
could the file `foo/bar` twice.
This commit uses the tree structure from the archiver to run the
scanner, so both parts see the same files.
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
When the tomb is created with a canceled context, then the workers
started via `t.Go` exist nearly immediately. Once for the first time all
started goroutines have been stopped, it is not allowed to issue further
calls to `t.Go`. This is a problem when the started goroutines exit
immediately, as for example the first goroutine might already have
stopped before starting the second one, which is not allowed as once the
first goroutines has stopped no goroutines were running.
To fix this race condition the startup and main task of the archiver now
also run within a `t.Go` function. This also allows unifying the error
handling as it is no longer necessary to distinguish between errors
returned by the workers or the saveTree processing. The tomb now just
returns the first error encountered, which should also be the most
descriptive one.
Depending on the used backend, operations started with a canceled
context may fail or not. For example the local backend still works in
large parts when called with a canceled context. Backends transfering
data via http don't work. It is also not possible to retry failed
operations in that state as the RetryBackend will abort with a 'context
canceled' error.
Ensure uniform behavior of all backends by checking for a canceled
context by checking for a canceled context as a first step in the
RetryBackend. This ensures uniform behavior across all backends, as
backends are always wrapped in a RetryBackend.
If the context was canceled then saveTree might receive a treeID or not
depending on the timing. This could cause saveTree to incorrectly return
a nil treeID as valid. Fix this always returning an error when the
context was canceled in the meantime.
A canceled background context lets the blob/tree/fileSavers exit
without reporting an error. The error handling previously replaced
a 'context canceled' error received by the main backup method with
the error reported by the savers. However, in case of a canceled
background context that error is nil, causing restic to loose the
error and save a snapshot with a nil tree.
The counter value needs to be aligned to 64 bit in memory for the
atomic functions to work on some platform (such as 32 bit ARM).
The atomic package says in its documentation:
> These functions require great care to be used correctly. Except for
> special, low-level applications, synchronization is better done with
> channels or the facilities of the sync package.
This commit replaces the atomic functions with a simple sync.Mutex, so
we don't have to care about alignment.
This adds support for the following environment variables, which were
previously missing:
OS_USER_ID User ID for keystone v3 authentication
OS_USER_DOMAIN_ID User domain ID for keystone v3 authentication
OS_PROJECT_DOMAIN_ID Project domain ID for keystone v3 authentication
OS_TRUST_ID Trust ID for keystone v3 authentication
The canUpdateStatus check was simplified in #2608, but it accidentally flipped
the condition. The correct check is as follows: If the output is a pipe then
restic probably runs in mintty/cygwin. In that case it's possible to
update the output status. In all other cases it isn't.
This commit inverts to condition again to offer the previous and correct
behavior.
On shutdown the backup commands waits for the terminal output goroutine
to stop. However while running in the background the goroutine ignored
the canceled context.