Some tests have to explicitly create pack files with blobs that don't
match their ID. For those blobs the builtin verification of the
repository must be disabled.
The method is not available on the restic.Repository interface that is
used for testing. Drop the call as a small amount of additional index
writes is not a problem.
Currently, the cmd/restic package contains a significant amount of code
that modifies repository internals. This code should in the mid-term
move into the repository package.
`writeStatus` also cleans no longer used status lines.
The old code actually cleaned one line too much. However, as that line
was never used it makes no difference.
If a lock could not be loaded, then restic would check all lock files
again. These repeated checks are not useful as the status of a lock file
cannot change unless its ID changes too. Thus, skip already check lock
files on retries.
LoadBlobsFromPack is now part of the repository struct. This ensures
that users of that method don't have to deal will internals of the
repository implementation.
The filerestorer tests now also contain far fewer pack file
implementation details.
To only stream the content of a pack file once, check used StreamPack
with a custom pack load function. This combination was always brittle
and complicates using StreamPack everywhere else. Now that StreamPack
internally uses PackBlobIterator use that primitive instead, which is a
much better fit for what the check command requires.
If client.Do returns an error, then there's no body that has to be
closed. For requests for which we are not interested in the response
body, immediately drain and close the body to make sure it isn't
forgotten later on.
This change in particular adds the missing `Close()` call for the
`List()` command.
It was only used in two places:
- stats: apparently as a minor performance optimization, which is
unlikely to be important
- find: filtered directories would be ignored. However, this
optimization missed that it is possible that two directories have the
exact same content. Such directories would be incorrectly ignored too.
Example:
```
mkdir test test/a test/b
restic backup test
restic find latest test/b
-> incorrectly does not return anything
```
Thus, remove the functionality as it's apparently too complex to use
correctly.
rclone returns a "not found" error if an internal error occurs while
listing a folder. Ignoring this error lets restic erroneously think that
there are no files, which can cause `prune` to wipe the whole
repository.
Writing these blobs to their files can take a long time and consequently
cause the backend connection to time out. Avoid that by retrieving these
blobs separately.
If an error occurred while streaming a pack file, this could result in
passing some of the blobs multiple times to the callback function. This
significantly complicates using StreamPack correctly and is unnecessary.
Retries do not change the content of a blob and thus only deliver the
same result over and over again.
Since Go 1.21, most reparse points are considered as irregular files.
Depending on the underlying driver these can exhibit nearly arbitrary
behavior. When encountering such a file, restic returned an
indecipherable error message: `error: invalid node type ""`.
Add the filepath to the error message and state that the file type is
not supported.
The behavior of the new option should reflect the behavior of normal backups: when the command exit code is zero and there is no output in the stdout, emit a warning but create the snapshot. This commit fixes the integration tests and the ReadCloserCommand struct.
Return with an error containing the stderr of the given command in case it fails. No new snapshot will be created and future prune operations on the repository will remove the unreferenced data.
Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
In order to determine whether to save a snapshot, we need to capture the exit code returned by a command. In order to provide a nice error message, we supply stderr as well.
Signed-off-by: Sebastian Hoß <seb@xn--ho-hia.de>
This removes code that is only used within a backend implementation from
the backend package. The latter now only contains code that also has
external users.
For now, the guide is only shown if the blob content does not match its
hash. The main intended usage is to handle data corruption errors when
using maximum compression in restic 0.16.0
Allow setting custom arguments for the `sftp` backend, by using the
`sftp.args` option. This is similar to the approach already implemented
in the `rclone` backend, to support new arguments without requiring
future code changes for each different SSH argument.
Closes#4241
Store oversized blobs in separate pack files as the blobs is large
enough to warrant its own pack file. This simplifies the garbage
collection of such blobs and keeps the cache smaller, as oversize (tree)
blobs only have to be downloaded if they are actually used.
The old version was taken from an MPL-licensed library. This is a
cleanroom implementation. The code is shorter and it's now explicit that
only Linux ACLs are supported.
The test uses `WithTimeout` to create a context that cancels the List
operation after a given delay. Several backends internally use a derived
child context created using WithCancel.
The cancellation of a context first closes the done channel of the
context (here: the `WithTimeout` context) and _afterwards_ propagates
the cancellation to child contexts (here: the `WithCancel` context).
Therefor if the List implementation uses a child context, then it may
take a moment until that context is also cancelled. Thus give the
context cancellation a moment to propagate.
A stale lock may be refreshed if it continues to exist until after a
replacement lock has been created. This ensures that a repository was
not unlocked in the meantime.
When transferring a repository from S3 to, for example, a local disk
then all empty folders will be missing.
When saving files, the missing intermediate folders are created
automatically. Therefore, missing directories can be ignored by the
`List()` operation.
Linux allows the use of non-`user.` extended attributes on symlinks. One
of the main users of this functionality is SELinux's `security.selinux`
xattr for storing a path's label. By storing symlink xattrs, restic is
now suitable for backing up the root filesystem on Linux distributions
that use SELinux.
This commit adds support for symlink xattrs when backing up data,
restoring data, and mounting snapshots via a fuse mount. All calls to
the xattr library have been updated to the use `L` variants of the
various functions, which always operate on the path given, without
following symlinks.
Fixes: #4375
Signed-off-by: Andrew Gunnerson <accounts+github@chiller3.com>
Conceptually the backend configuration should be validated when creating
or opening the backend, but not when filling in information from
environment variables into the configuration.
This unified construction removes most backend-specific code from
global.go. The backend registry will also enable integration tests to
use custom backends if necessary.
The ETA restic displays was based on a rate computed across the entire
backup operation. Often restic can progress at uneven rates. In the worst
case, restic progresses over most of the backup at a very high rate and
then finds new data to back up. The displayed ETA is then unrealistic and
never adapts.
Restic now estimates the transfer rate based on a sliding window, with the
goal of adapting to observed changes in rate. To avoid wild changes in the
estimate, several heuristics are used to keep the sliding window wide
enough to be relatively stable.
In order to change the backend initialization in `global.go` to be able
to generically call cfg.ApplyEnvironment() for supported backends, the
`interface{}` returned by `ParseConfig` must contain a pointer to the
configuration.
An alternative would be to use reflection to convert the type from
`interface{}(Config)` to `interface{}(*Config)` (from value to pointer
type). However, this would just complicate the type mess further.
Iterating through the indexmap according to the bucket order has the
problem that all indexEntries are accessed in random order which is
rather cache inefficient.
As we already keep a list of all allocated blocks, just iterate through
it. This allows iterating through a batch of indexEntries without random
memory accesses. In addition, the packID will likely remain similar
across multiple blobs as all blobs of a pack file are added as a single
batch.
This data structure reduces the wasted memory to O(sqrt(n)). The
top-layer of the hashed array tree (HAT) also has a size of O(sqrt(n)),
which makes it cache efficient. The top-layer should be small enough to
easily fit into the CPU cache and thus only adds little overhead
compared to directly accessing an index entry via a pointer.
The indexEntry objects are now allocated in a separate array. References
to an indexEntry are now stored as array indices. This has the benefit
of allowing the garbage collector to ignore the indexEntry objects as
these do not contain pointers and are part of a single large allocation.
New and its helpers used to create the cache directories several times
over. They now only do so once. The added test ensures that the cache is
produced in a consistent state when parts are deleted.