Depending on the used backend, operations started with a canceled
context may fail or not. For example the local backend still works in
large parts when called with a canceled context. Backends transfering
data via http don't work. It is also not possible to retry failed
operations in that state as the RetryBackend will abort with a 'context
canceled' error.
Ensure uniform behavior of all backends by checking for a canceled
context by checking for a canceled context as a first step in the
RetryBackend. This ensures uniform behavior across all backends, as
backends are always wrapped in a RetryBackend.
If the context was canceled then saveTree might receive a treeID or not
depending on the timing. This could cause saveTree to incorrectly return
a nil treeID as valid. Fix this always returning an error when the
context was canceled in the meantime.
A canceled background context lets the blob/tree/fileSavers exit
without reporting an error. The error handling previously replaced
a 'context canceled' error received by the main backup method with
the error reported by the savers. However, in case of a canceled
background context that error is nil, causing restic to loose the
error and save a snapshot with a nil tree.
The counter value needs to be aligned to 64 bit in memory for the
atomic functions to work on some platform (such as 32 bit ARM).
The atomic package says in its documentation:
> These functions require great care to be used correctly. Except for
> special, low-level applications, synchronization is better done with
> channels or the facilities of the sync package.
This commit replaces the atomic functions with a simple sync.Mutex, so
we don't have to care about alignment.
This adds support for the following environment variables, which were
previously missing:
OS_USER_ID User ID for keystone v3 authentication
OS_USER_DOMAIN_ID User domain ID for keystone v3 authentication
OS_PROJECT_DOMAIN_ID Project domain ID for keystone v3 authentication
OS_TRUST_ID Trust ID for keystone v3 authentication
The canUpdateStatus check was simplified in #2608, but it accidentally flipped
the condition. The correct check is as follows: If the output is a pipe then
restic probably runs in mintty/cygwin. In that case it's possible to
update the output status. In all other cases it isn't.
This commit inverts to condition again to offer the previous and correct
behavior.
On shutdown the backup commands waits for the terminal output goroutine
to stop. However while running in the background the goroutine ignored
the canceled context.
In #2584 this was changed to use the uid/gid of the root node. This
would be okay for the top-level directory of a snapshot, however, this
change also applied to normal directories within a snapshot. This
change reverts the problematic part and adds a test that directory
attributes are represented correctly.
This code is more strict in what it expects to find in the backend:
depending on the layout, either a directory full of files or a directory
full of such directories.
a gs service account may only have object permissions on an existing
bucket but no bucket create/get permissions.
these service accounts currently are blocked from initialization a
restic repository because restic can not determine if the bucket exists.
this PR updates the logic to assume the bucket exists when the bucket
attribute request results in a permissions denied error.
this way, restic can still initialize a repository if the service
account does have object permissions
fixes: https://github.com/restic/restic/issues/3100
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
By construction these two errors always show up in pairs: 'size could
not be found' is printed when the blob is not found in the repository
index. That blob is also part of the `blobs` array. Later on, check
iterates over that array and checks whether the blob is marked as
existing. Which cannot be the case as that mark is generated by
iterating over the repository index.
The merged warning no longer reports the blob index within a file. That
information could also be derived by printing the affected tree using
`cat` and searching for the blob.
Due to the return if !isFile, the IsDir branch in List was never taken
and subdirectories were traversed recursively.
Also replaced isFile by an IsRegular check, which has been equivalent
since Go 1.12 (golang/go@a2a3dd00c9).
The restic security model includes full trust of the local machine, so
this should not fix any actual security problems, but it's better to be
safe than sorry.
Fixes#2192.
The file permissions included a go specific directory bit which
accidentially forced the usage of the GNU header format. This leads
to problems with 7zip on Windows or when extended attributes are
used.
The VSS support works for 32 and 64-bit windows, this includes a check that
the restic version matches the OS architecture as required by VSS. The backup
operation will fail the user has not sufficient permissions to use VSS.
Snapshotting volumes also covers mountpoints but skips UNC paths.
The io.Reader interface does not support contexts, such that it is
necessary to embed the context into the backendReaderAt struct. This has
the problem that a reader might suddenly stop working when it's
contained context is canceled. However, this is now problem here as the
reader instances never escape the calling function.
The list operation used by RemoveStaleLocks or RemoveAllLocks will
already be canceled by the passed in context. Therefore we can also just
cancel the remove operation as the unlock command won't process all lock
files anyways.
This is no change in behavior as a canceled context did later on cause
the config file creation to fail. Therefore this change just lets the
repository initialization fail a bit earlier.
A pattern part containing "/" is used to mark a path or a pattern as
absolute. However, on Windows the path separator is "\" such that glob
patterns like "?" could match the marker. The code now explicitly skips
the marker when the pattern does not represent an absolute path.
We now use v4 of the module. `backoff.WithMaxRetries` no longer repeats
an operation endlessly when a retry count of 0 is specified. This
required a few fixes for the tests.
The file is already created with the proper permissions, thus the chmod
call is not critical. However, some file systems have don't allow
modifications of the file permissions. Similarly the chmod call in the Remove
operation should not prevent it from working.
Many instances of errors.Wrap in this package would produce messages
like "Open: open <filename>: no such file or directory"; those now omit
the first "Open:" (or "Stat:", or "MkdirAll"). The function readVersion
now appends its own name to the error message, rather than the function
that failed, to make it easier to spot. Other function names (e.g.,
Load) are already added further up in the call chain.
The archiver first called the Select function for a path before checking
whether the Lstat on that path actually worked. The RejectFuncs in
exclude.go worked around this by checking whether they received a nil
os.FileInfo. Checking first is more obvious and requires less code.
When the backup is interrupted for some reason while the scanner is
still active this could lead to a deadlock. Interruptions are triggered
by canceling the context object used by both the backup progress UI and
the scanner. It is possible that a context is canceled between the
respective check in the scanner and it calling the `ReportTotal` method
of the UI. The latter method sends a message to the UI goroutine.
However, a canceled context will also stop that goroutine, which can
cause the channel send operation to block indefinitely.
This is resolved by adding a `closed` channel which is closed once the
UI goroutine is stopped and serves as an escape hatch for reported UI
updates.
This change covers not just the ReportTotal method but all potentially
affected methods of the progress UI implementation.
Makes the following corrections to the "Applying Policy:" output:
- keep the last 1 snapshots snapshots => keep 1 latest snapshots
- keep the last 1 snapshots, 3 hourly, 5 yearly snapshots => keep 1 latest, 3 hourly, 5 yearly snapshots
This allows creating multiple repositories with identical chunker
parameters which is required for working deduplication when copying
snapshots between different repositories.
Previously the directory stats were reported immediately after calling
`SaveDir`. However, as the latter method saves the tree asynchronously
the stats were still initialized to their nil value. The stats are now
reported via a callback similar to the one used for the fileSaver.
The slicing operator `slice[low:high]` default to 0 for the lower bound and
len(slice) for the upper bound when either or both are not specified.
Fix the code to use `cap(slice)` to check for the slice capacity.
If a blob in a pack file can be decrypted successfully but contains data
that results in a different hash than stated in the header pack, then
abort repacking. As both the pack header and the blob are
cryptographically verified this either means than a malicious entity
tampered with the backup or indicates hardware problems on the client.
prune should fail with an error in both cases.
The seen BlobSet always contained a subset of the entries in blobs.
Thus use blobs instead and avoid the memory overhead of the second set.
Suggested-by: Alexander Weiss <alex@weissfam.de>
As the connection to the rclone child process is now closed after an
unexpected subprocess exit, later requests will cause the http2
transport to try to reestablish a new connection. As previously this never
should have happened, the connection called panic in that case. This
panic is now replaced with a simple error message, as it no longer
indicates an internal problem.
- Add Open() functionality to dir
- only access index for blobs when file is read
- Implement NodeOpener and put one-time file stuff there
- Add comment about locking as suggested by bazil.org/fuse
=> Thanks at Michael Eischer for suggesting the last two improvements
Calling `Close()` on the rclone backend sometimes failed during test
execution with 'signal: Broken pipe'. The stdio connection closed both
the stdin and stdout file descriptors at the same moment, therefore
giving rclone no chance to properly send any final http2 data frames.
Now the stdin connection to rclone is closed first and will only be
forcefully closed after a timeout. In case rclone exits before the
timeout then the stdio connection will be closed normally.
restic did not notice when the rclone subprocess exited unexpectedly.
restic manually created pipes for stdin and stdout and used these for the
connection to the rclone subprocess. The process creating a pipe gets
file descriptors for the sender and receiver side of a pipe and passes
them on to the subprocess. The expected behavior would be that reads or
writes in the parent process fail / return once the child process dies
as a pipe would now just have a reader or writer but not both.
However, this never happened as restic kept the reader and writer
file descriptors of the pipes. `cmd.StdinPipe` and `cmd.StdoutPipe`
close the subprocess side of pipes once the child process was started
and close the parent process side of pipes once wait has finished. We
can't use these functions as we need access to the raw `os.File` so just
replicate that behavior.
The test now uses the fact that the sort is stable. It's not guaranteed
to be, but the test is cleaner and more exhaustive. sortCachedPacksFirst
no longer needs a return value.
In the Google Cloud Storage backend, support specifying access tokens
directly, as an alternative to a credentials file. This is useful when
restic is used non-interactively by some other program that is already
authenticated and eliminates the need to store long lived credentials.
The access token is specified in the GOOGLE_ACCESS_TOKEN environment
variable and takes precedence over GOOGLE_APPLICATION_CREDENTIALS.
If a data blob and a tree blob with the same ID (= same content) exist,
then the checker did not report a data or tree blob as unused when the
blob of the other type was still in use.
The `DuplicateTree` flag is necessary to ensure that failures cannot be
swallowed. The old checker implementation ignores errors from LoadTree
if the corresponding tree was already checked.
Backups traverse the file tree in depth-first order and saves trees on
the way back up. This results in tree packs filled in a way comparable
to the reverse Polish notation. In order to check tree blobs in that
order, the treeFilter would have to delay the forwarding of tree nodes
until all children of it are processed which would complicate the
implementation.
Therefore do the next similar thing and traverse the tree in depth-first
order, but process trees already on the way down. The tree blob ids are
added in reverse order to the backlog, which is once again reverted when
removing the ids from the back of the backlog.
The blobRefs map and the processedTrees IDSet are merged to reduce the
memory usage. The blobRefs map now uses separate flags to track blob
usage as data or tree blob. This prevents skipping of trees whose
content is identical to an already processed data blob. A third flag
tracks whether a blob exists or not, which removes the need for the
blobs IDSet.
Even though the checkTreeWorker skips already processed chunks,
filterTrees did queue the same tree blob on every occurence. This
becomes a serious performance bottleneck for larger number of snapshots
that cover mostly the same directories. Therefore decode a tree blob
exactly once.
The benchmark was actually testing the speed of index lookups.
name old time/op new time/op delta
SaveAndEncrypt-8 101ns ± 2% 31505824ns ± 1% +31311591.31% (p=0.000 n=10+10)
name old speed new speed delta
SaveAndEncrypt-8 41.7TB/s ± 2% 0.0TB/s ± 1% -100.00% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
SaveAndEncrypt-8 1.00B ± 0% 20989508.40B ± 0% +2098950740.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SaveAndEncrypt-8 0.00 123.00 ± 0% +Inf% (p=0.000 n=10+9)
(The actual speed is ca. 131MiB/s.)
A side remark to the definition of Index.blob:
Another possibility would have been to use:
blob map[restic.BlobHandle]*indexEntry
This would have led to the following sizes:
key: 32 + 1 = 33 bytes
value: 8 bytes
indexEntry: 8 + 4 + 4 = 16 bytes
each packID: 32 bytes
To save N index entries, we would therefore have needed:
N * OF * (33 + 8) bytes + N * 16 + N * 32 bytes / BP = N * 82 bytes
More precicely, using a pointer instead of a direct entry is the better memory choice if:
OF * 8 bytes + entrysize < OF * entrysize <=> entrysize > 8 bytes * OF/(OF-1)
Under the assumption of OF=1.5, this means using pointers would have been the better choice
if sizeof(indexEntry) > 24 bytes.
- The SaveBlob method now checks for duplicates.
- Moves handling of pending blobs to MasterIndex.
-> also cleans up pending index entries when they are saved in the index
-> when using SaveBlob no need to care about index any longer
- Always check for full index and save it when storing packs.
-> removes the need of an index uploader
-> also removes the verbose "uploaded intermediate index" messages
- The Flush method now also saves the index
- Fix race condition when checking and saving full/non-finalized indexes
This command can only be built on Darwin, FreeBSD and Linux
(and if we upgrade bazil.org/fuse, only FreeBSD and Linux:
https://github.com/bazil/fuse/issues/224).
Listing the few supported operating systems explicitly here makes
porting restic to new platforms easier.
The Save methods of the BlobSaver, FileSaver and TreeSaver return early
on when the archiver is stopped due to an error. For that they select on
both the tomb.Dying() and context.Done() channels, which can lead to a
race condition when the tomb is killed due to an error: The tomb first
closes its Dying channel before canceling all child contexts.
Archiver.SaveDir only aborts its execution once the context was
canceled. When the tomb killing is paused between closing its Dying
channel and canceling the child contexts, this lets the
FileSaver/TreeSaver.Save methods return immediately, however, ScanDir
still reads further files causing the test case to fail.
As a killed tomb always cancels all child contexts and as the Savers
always use a context bound to the tomb, it is sufficient to just use
context.Done() as escape hatch in the Save functions. This fixes the
mismatch between SaveDir and Save.
Adjust the tests to use contexts bound to the tomb for all interactions
with the Savers.
The previous implementation was repeating the implementation that is
found inside of io.WriteString. Simplify by making use of the stdlib's
implementation.
The username and hostname for new keys can be specified with the new
--user and --host flags, respectively. The flags are used only by the
`key add` command and are otherwise ignored.
This allows adding keys with for a desired user and host without having
to run restic as that particular user on that particular host, making
automated key management easier.
Co-authored-by: James TD Smith <ahktenzero@mohorovi.cc>
The method only ever receives *hashing.Writers, which don't implement
io.Closer. These come from packerManager.findPacker and have their
actual writers closed in Repository.savePacker. Moving the closing logic
to hashing.Writer results in "file already closed" errors.
When loading a blob, restic first looks up pack files containing the
blob. To avoid unnecessary work an already cached pack file is preferred.
However, if there is only a single pack file to choose from (which is
the normal case) sorting the one-element list won't change anything.
Therefore avoid the unnecessary cache check in that case.
The previous benchmark spent much of its time allocating RNGs and
generating too many random numbers. It now spends 90% of its time
hashing and half of the rest writing to files.
name old time/op new time/op delta
PackerManager-8 319ms ± 1% 247ms ± 1% -22.48% (p=0.000 n=20+18)
name old speed new speed delta
PackerManager-8 143MB/s ± 1% 213MB/s ± 1% +48.63% (p=0.000 n=10+18)
name old alloc/op new alloc/op delta
PackerManager-8 635kB ± 0% 92kB ± 0% -85.48% (p=0.000 n=10+19)
name old allocs/op new allocs/op delta
PackerManager-8 1.64k ± 0% 1.43k ± 0% -12.76% (p=0.000 n=10+20)
Archivers TestMetadataChanged incorrectly clears the Extended Attributes
from the expected metadata of the temporary file. This is incorrect as on
SELinux enabled filesystem, as the kernel will automaticly add a SElinux
label. However, since ExtendedAttributes{} != ExtendedAttributes{nil} we
still need to clear them if there are no attributes found.
The pool was used improperly, causing more allocations to be
performed than without it.
name old time/op new time/op delta
SaveAndEncrypt-8 36.8ms ± 2% 36.9ms ± 2% ~ (p=0.218 n=10+10)
name old speed new speed delta
SaveAndEncrypt-8 114MB/s ± 2% 114MB/s ± 2% ~ (p=0.218 n=10+10)
name old alloc/op new alloc/op delta
SaveAndEncrypt-8 21.1MB ± 0% 21.0MB ± 0% -0.44% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SaveAndEncrypt-8 79.0 ± 0% 77.0 ± 0% -2.53% (p=0.000 n=10+10)
The `dump`, `find`, `forget`, `ls`, `mount`, `restore`, `snapshots`,
`stats` and `tag` commands will now take into account multiple
`--host` and `-H` flags.
Much simpler implementation that guarantees each required pack
is downloaded only once (and hence does not need to manage
pack cache). Also improves large file restore performance.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
internal/archiver.readdir and internal/fs.ReadDir were unused.
internal/fs.ReadDirNames and internal/archiver.readdirnames were doing
nearly the same thing, except one sorted its output and opened with
fs.O_NOFOLLOW. Both were only used in internal/archiver.
Each of the random test files was split into the same five blobs. The
test fails once the fifth blob is passed on to `SaveBlob`. That is for
certain interleavings of goroutine execution it would be possible for
the test to trigger the testErr just after storing the first file.
The fixed test uses a different file content for each of the nine files
and fails after writing the fourth blob. The file content is also small
enough to ensure that for each file only a single blob is saved. This
guarantees that the test cannot fail before reading the first four
files. FileReadConcurrency = 2 allows up to two files queued for
processing. Therefore the test can at most open the sixth file before it
has to save the fourth file / blob which triggers the testErr.
internal/ui/jsonstatus and termstatus sound similar but are not related
in any way. Instead `internal/ui/backup` and `internal/ui/jsonstatus/status`
are the counterparts. Rename the latter to `internal/ui/json/backup` to
make this clear.
jsonstatus wrote the JSON output without synchronization to the
stdio_wrapper which caused mangling between different status lines.
Use the Print and Error methods of termstatus instead which use a
central goroutine to synchronize output.
I was running "golangci-lint" and found this two warnings
internal/checker/checker.go:135:18: (*Checker).LoadIndex$3 - result 0 (error) is always nil (unparam)
final := func() error {
^
internal/repository/repository.go:457:18: (*Repository).LoadIndex$3 - result 0 (error) is always nil (unparam)
final := func() error {
^
It turns out that these functions are used only in "RunWorkers(...)",
which is used only two times in whole project right after this "final"
functions.
And because these "final" functions always return "nil", I've
descided, that it would be better to remove requriments for "final" func
to return error to avoid magick "return nil" at their end.
In some (rare) cases "fake" UID 51234 may exist in a system running a
test. When this is the case, `cmp.Equal(want, node3)` will fail based on
difference between empty string and an actual username present in a
system.
Fixes github issue #2372
Restic used to quit if the repository password was typed incorrectly once.
Restic will now ask the user again for the repository password if typed incorrectly.
The user will now get three tries to input the correct password before restic quits.
Return valid directory info from Lstat() for parent directories of the
specified filename. Previously only "/" and "." were valid directories.
Also set directory mode as this is checked by archiver.
Closes#2063
Windows does not have a concept of a `change time` in the sense as Unix
has it: the field `CreationTime` of the `Win32FileAttributeData` struct
is not updated when attributes or content is changed. So from now on
we're using the `LastWriteTime` as the `change time` on Windows.
Since I could not remember what the value for `Check` means this commit
renames it to `SameFile`: when set to true, the test should make sure
that `FileChanged` should return false (=file is unmodified).
Sometimes restic gets bogus timestamps which cannot be converted to
JSON, because the stdlib JSON encoder returns an error if the year is
not within [0, 9999]. We now make sure that we at least record _some_
timestamp and cap the year either to 0000 or 9999. Before, restic would
refuse to save the file at all, so this improves the status quo.
This fixes#2174 and #1173
This commit is a followup to the addition of the --group-by flag for the
snapshots command. Adding the grouping code there introduced duplicated
code (the forget command also does grouping). This commit refactors
boths sides to only use shared code.
This commit changes the signatures for repository.LoadAndDecrypt and
utils.LoadAll to allow passing in a []byte as the buffer to use. This
buffer is enlarged as needed, and returned back to the caller for
further use.
In later commits, this allows reducing allocations by reusing a buffer
for multiple calls, e.g. in a worker function.
The `s3.storage-class` option can be passed to restic (using `-o`) to
specify the storage class to be used for S3 objects created by restic.
The storage class is passed as-is to S3, so it needs to be understood by
the API. On AWS, it can be one of `STANDARD`, `STANDARD_IA`,
`ONEZONE_IA`, `INTELLIGENT_TIERING` and `REDUCED_REDUNDANCY`. If
unspecified, the default storage class is used (`STANDARD` on AWS).
You can mix storage classes in the same bucket, and the setting isn't
stored in the restic repository, so be sure to specify it with each
command that writes to S3.
Closes#706
This change allows passing no arguments to rclone, using `-o
rclone.args=""`. It is helpful when running rclone remotely via SSH
using a key with a forced command (via `command=` in `authorized_keys`).
This commit changes the internal file system implementation for reading
data from stdin, it now returns an error when no bytes could be read. I
think it's worth failing in this case, the user instructed restic to
read some data from stdin, and no data was read at all. Maybe it was in
a pipe and some earlier stage failed.
See #2135 for a short discussion.
When restic reads the backup from stdin, the number of bytes processed
was always displayed as zero. The reason is that the UI for the archive
uses the total bytes as returned by the scanner, which is zero for
stdin. So instead we keep track of the real number of bytes processed
and print that at the end.
Closes#2136
Make restic forget --keep-within accept time ranges measured in hours and choose
accordingly which snapshots to keep and which to forget. Add relative tests.
don't create fileInfo structs for empty files. this saves memory.
this also avoids extra serial scan of all fileInfo, which should
make restore faster and more consistent.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
reworked restore error callback to use file location
path instead of much heavier Node. this reduced restore
memory usage by as much as 50% in some of my tests.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
* uses less memory as common prefix is only stored once
* stepping stone for simpler error callback api, which
will allow further memory footprint reduction
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
When the scanner is slower than the actual backup, the tomb cancels the
context passed to Scan(), which then returns ctx.Err(). In the end, the
main function prints an error message that is not helpful ("Context
cancelled") and exits with an error code although no error occurred.
The code now ignores the error in the context and just uses it for
cancellation. The scanner is not supposed to return an error anyway.
Closes#1978
Sometimes, users run restic without retaining the local cache
directories. This was reported several times in the past.
Restic will now print a message whenever a new cache directory is
created from scratch (i.e. it did not exist before), so users have a
chance to recognize when the cache is not kept between different runs of
restic.
This commit adds a command called `self-update` which downloads the
latest released version of restic from GitHub and replacing the current
binary with it. It does not rely on any external program (so it'll work
everywhere), but still verifies the GPG signature using the embedded GPG
public key.
By default, the `self-update` command is hidden behind the `selfupdate`
built tag, which is only set when restic is built using `build.go`. The
reason for this is that downstream distributions will then not include
the command by default, so users are encouraged to use the
platform-specific distribution mechanism.
Adds a SelectByName method to the archive and scanner which only require
the filename as input, and can thus be run before calling lstat on the
file. Can speed up scanning significantly if a lot of filename excludes
are used.
Accessing beyond the end of the file now removes the file from the cache
because it is assumed to be truncated. Usually, this means that the data
is fetched directly from the backend instead.
traverseTree() was meant to call enterDir() whenever a directory is
selected for restore, either explicitly or implicitly (=contains a file
which is to be restored). After restoring a file, leaveDir() is called
in reverse order for all intermediate directories so that the metadata
can be restored.
When a directory is selected implicitly, the metadata for it is
restored. This is different from the previous restorer behavior, which
created implicitly selected intermediate directories with permissions
0700 (only user can read/write it).
This commit changes the behavior back to the old one. Only a directory
is explicitly selected for restore, enterDir()/leaveDir() are called for
it. Otherwise, only visitNode() is called, so visitNode() needs to make
sure the parent directory exists. If the directory is explicitly
included, leaveDir() will then restore the metadata correctly.
When we decide to change the behavior (restore metadata for all
intermediate directories, even if selected implicitly), we should do
that in the selection functions, not here.
This finally resolves#1870
This fixes#1833, which consists of two different bugs:
* The `defer` in `cacheFile()` may remove a channel from the
`inProgress` map although it is not responsible for downloading the
file
* If the download fails, goroutines waiting for the file to be cached
assumed that the file was there, there was no way to signal the
error.
When the archiver is faster than the scanner, restic deadlocks. This
commit adds a `finished` channel to the struct in `ui/backup.go` so that
scanner results are ignored when the archiver is already finished.
Closes#1834
If our ssh process has died, not only the next, but all subsequent
calls to clientError() should indicate the error.
restic output when the ssh process is killed with "kill -9":
Save(<data/afb68adbf9>) returned error, retrying after 253.661803ms: Write: failed to send packet header: write |1: file already closed
Save(<data/afb68adbf9>) returned error, retrying after 580.752212ms: ssh command exited: signal: killed
Save(<data/afb68adbf9>) returned error, retrying after 790.150468ms: ssh command exited: signal: killed
Save(<data/afb68adbf9>) returned error, retrying after 1.769595051s: ssh command exited: signal: killed
[...]
error in cleanup handler: ssh command exited: signal: killed
Before this patch:
Save(<data/de698d934f>) returned error, retrying after 252.84163ms: Write: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 660.236963ms: OpenFile: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 568.049909ms: OpenFile: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 2.428813824s: OpenFile: failed to send packet header: write |1: file already closed
[...]
error in cleanup handler: failed to send packet header: write |1: file already closed
This commit changes how the worker goroutines for saving e.g. blobs
interact. Before, it was possible to get stuck sending an instruction to
archive a file or dir when no worker goroutines were available any more.
This commit introduces a `done` channel for each of the worker pools,
which is set to the channel returned by `tomb.Dying()`, so it is closed
when the first worker returned an error.
This commit changes the archiver so that low-level errors saving data to
the repo are returned to the caller (instead of being handled by the
error callback function). This correctly bubbles up errors like a full
temp file system and makes restic abort early and makes all other worker
goroutines exit.
This now keeps the cursor at the first column of the first status line
so that messages printed to stdout or stderr by some other part of the
progarm will still be visible. The message will overwrite the status
lines, but those are easily reprinted on the next status update.
The previous code tried to be as efficient as possible and only do a
single open() on an item to save, and then fstat() on the fd to find out
what the item is (file, dir, other). For normal files, it would then
start reading the data without opening the file again, so it could not
be exchanged for e.g. a symlink.
This behavior starts the watchdog on my machine when /dev is saved
with restic, and after a few seconds, the machine reboots.
This commit reverts the behavior to the strategy the old archiver code
used: run lstat(), then decide what to do. For normal files, open the
file and then run fstat() on the fd to verify it's still a normal file,
then start reading the data.
The downside is that for normal files we now do two stat() calls
(lstat+fstat) instead of only one. On the upside, this does not start
the watchdog. :)
Previously, the function read from ARGV[1] (hardcoded) rather than the
value passed to it, the command-line argument as it exists in globalOptions.
Resolves#1745
This adds two implementations of the new `FS` interface: One for the local
file system (`Local`) and one for a single file read from an
`io.Reader` (`Reader`).
This change removes the hardcoded Google auth mechanism for the GCS
backend, instead using Google's provided client library to discover and
generate credential material.
Google recommend that client libraries use their common auth mechanism
in order to authorise requests against Google services. Doing so means
you automatically support various types of authentication, from the
standard GOOGLE_APPLICATION_CREDENTIALS environment variable to making
use of Google's metadata API if running within Google Container Engine.
Before this change restic would attempt to JSON decode the error
message resulting in confusing `Decode: invalid character 'B' looking
for beginning of value` messages. Afterwards it will return `List
failed, server response: 400 Bad Request (400)`
This commit fixes a bug introduced in
e9ea268847: When an invalid lock is
encountered (e.g. if the file is empty), the code used to ignore that,
but now returns the error.
Now, invalid files are ignored for the normal lock check, and removed
when `restic unlock --remove-all` is run.
Closes#1652
As mentioned in issue [#1560](https://github.com/restic/restic/pull/1560#issuecomment-364689346)
this changes the signature for `backend.Save()`. It now takes a
parameter of interface type `RewindReader`, so that the backend
implementations or our `RetryBackend` middleware can reset the reader to
the beginning and then retry an upload operation.
The `RewindReader` interface also provides a `Length()` method, which is
used in the backend to get the size of the data to be saved. This
removes several ugly hacks we had to do to pull the size back out of the
`io.Reader` passed to `Save()` before. In the `s3` and `rest` backend
this is actively used.
This is a bug fix: Before, when the worker function fn in List() of the
RetryBackend returned an error, the operation is retried with the next
file. This is not consistent with the documentation, the intention was
that when fn returns an error, this is passed on to the caller and the
List() operation is aborted. Only errors happening on the underlying
backend are retried.
The error leads to restic ignoring exclusive locks that are present in
the repo, so it may happen that a new backup is written which references
data that is going to be removed by a concurrently running `prune`
operation.
The bug was reported by a user here:
https://forum.restic.net/t/restic-backup-returns-0-exit-code-when-already-locked/484
This pulls the header reads into a function that works in terms of the
number of records requested. This preserves the existing logic of
initially reading 15 records and then falling back if that fails.
In the event of a header with more than 15 records, it will read all
records, including the already-seen final 15 records.
Before, all backend implementations were required to return an error if
the file that is to be written already exists in the backend. For most
backends, that means making a request (e.g. via HTTP) and returning an
error when the file already exists.
This is not accurate, the file could have been created between the HTTP
request testing for it, and when writing starts. In addition, apart from
the `config` file in the repo, all other file names have pseudo-random
names with a very very low probability of a collision. And even if a
file name is written again, the way the restic repo is structured this
just means that the same content is placed there again. Which is not a
problem, just not very efficient.
So, this commit relaxes the requirement to return an error when the file
in the backend already exists, which allows reducing the number of API
requests and thereby the latency for remote backends.
During the development of #1524 I discovered that the Google Cloud
Storage backend did not yet use the HTTP transport, so things such as
bandwidth limiting did not work. This commit does the necessary magic to
make the GS library use our HTTP transport.
A user discovered[1] that when the backup finishes during the upload of
an intermediate index, the upload is cancelled and the index never fully
saved, but the snapshot is saved and the backup finalizes without an
error. This lead to a situation where a snapshot references data that is
contained in the repo, but not referenced in any index, leading to
strange error messages.
This commit uses a dedicated context to signal the intermediate index
uploading routine to terminate after the last index has been uploaded.
This way, an upload running when the backup finishes is completed before
the routine terminates and the snapshot is saved.
[1] https://forum.restic.net/t/error-loading-tree-check-prune-and-forget-gives-error-b2-backend/406
The logging in these functions double the time they take to execute.
However, it is only really useful on failures, which are better
reported by the calling functions.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 897 395 -55.96%
BenchmarkMasterIndexLookupMultipleIndex-6 2001 1090 -45.53%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 492 215 -56.30%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 1649 912 -44.69%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 1 -88.89%
BenchmarkMasterIndexLookupMultipleIndex-6 19 1 -94.74%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 6 0 -100.00%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 16 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 96 -40.00%
BenchmarkMasterIndexLookupMultipleIndex-6 240 96 -60.00%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 48 0 -100.00%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 128 0 -100.00%
Index.Has() is a faster then Index.Lookup() for checking if a blob exists
in the index. As the returned data is never used, this avoids a ton
of allocations.
When looking up a blob in the master index, with several
indexes present in the master index, a significant amount of time
is spent generating errors for each failed lookup. However, these
errors are often used to check if a blob is present, but the contents
are not inspected making the overhead of the error not useful.
Instead, change Index.Lookup (and Index.LookupSize) to instead return
a boolean denoting if the blob was found instead of an error. Also change
all the calls to these functions to handle the new function signature.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 820 897 +9.39%
BenchmarkMasterIndexLookupMultipleIndex-6 12821 2001 -84.39%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 5378 492 -90.85%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 17026 1649 -90.31%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 9 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 59 19 -67.80%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 22 6 -72.73%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 72 16 -77.78%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 160 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 3200 240 -92.50%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 1232 48 -96.10%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 4272 128 -97.00%
When setting up the index used for benchmarking, use math/rand instead of
crypto/rand since the generated ids don't need to be evenly distributed,
and not be secure against guessing. As such, use a different random id
function (only available during tests) that uses math/rand instead.
Load pack header length and 15 header entries with single backend
request. This eliminates separate header Load() request for most pack
files and significantly improves index.New() performance.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
This reduces the chance of duplicate blobs, otherwise the tests fail
(make the contents of a blob depend on a pseudo-random number instead of
the size, sizes may be duplicate).
Use result of single repository.List() to find both missing and
orphaned data packs. For 500GB repository this eliminates ~100K
repository.Test() calls and improves check time by >30M in my
environment (~45min before this change and ~7min after).
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
This is a follow-up on fb9729fdb9, which
runs the `ssh` in its own process group and selects that process group
as the foreground group. After the sftp connection is established,
restic switches back to the previous foreground process group.
This allows `ssh` to prompt for the password, but it won't receive
the interrupt signal (SIGINT, ^C) later on, because it is not in the
foreground process group any more, allowing a clean tear down.
When backing up several million files (>14M tested here) with few changes,
a large amount of time is spent failing to find an id in an index and creating
an error to signify this. Since this is checked using the Has method,
which doesn't use this error, this time creating the error is wasted.
Instead, directly check if the given id and type are present in the index.
This also avoids reporting all the packs containing this blob, further
reducing cpu usage.
We added previously a code to fix the issue of chaining
credentials, we do not need this anymore since the
upstream minio-go already has this relevant change.
Before, creating a new repo via REST would use the defaut HTTP client,
which is not a problem unless the server uses HTTPS and a TLS
certificate which isn't signed by a CA in the system's CA store. In this
case, all commands work except the 'init' command, which fails with a
message like "invalid certificate".
chaining failed because chaining provider
was only looking for subsequent credentials
provider after an error. Writer a new
chaining provider which proceeds to fetch
new credentials also under situations where
providers do not return but instead return
no keys at all.
Fixes https://github.com/restic/restic/issues/1422
List().
move comment regarding problematic List() backend api (it's s3's ListObjects
that has a problem, NOT swift's ObjectsWalk).
As per discussion in PR #1399.
This is a fix for the following situation (gh-1188):
List() grabs a semaphore token upon entry, starts a goroutine, and
does not release the token until the routine exits (via a defer).
The goroutine iterates over the results from ListCurrentObjects(),
sending them one at a time to a channel, where they are ultimately
processed by be.Load().
Since be.Load() also needs a token, this will result in deadlock if
b2.connections=1.
This fix changes List() so that the token is only held during the call
to ListCurrentObjects().
Add a RESTIC_PROGRESS_FPS environment variable to limit the interval
at which the progress indicator updates (allowed values: 1-60).
The default rate of 60 FPS can cause high terminal CPU load on some
systems, like iTerm2 on macOS with font anti-aliasing enabled.
Usage:
RESTIC_PROGRESS_FPS=1 restic ...
RESTIC_PROGRESS_FPS=60 restic ...
- be explicit when discarding returned errors from .Close(), etc.
- remove named return values from funcs when naked return not used
- fix some "err" shadowing when redeclaration not needed
Sometimes s3 listobjects for a directory includes an entry for that
directory. The restic s3 backend doesn't expect that and returns
an error.
Symptom is:
ReadDir: invalid key name restic/key/, removing prefix
restic/key/ yielded empty string
I'm not sure when s3 does that; I'm unable to reproduce it myself.
But in any case, it seems correct to ignore that when it happens.
Fixes#1068
By default, the GCS Go packages have an internal "chunk size" of 8MB,
used for blob uploads.
Media().Do() will buffer a full 8MB from the io.Reader (or less if EOF
is reached) then write that full 8MB to the network all at once.
This behavior does not play nicely with --limit-upload, which only
limits the Reader passed to Media. While the long-term average upload
rate will be correctly limited, the actual network bandwidth will be
very spikey.
e.g., if an 8MB/s connection is limited to 1MB/s, Media().Do() will
spend 8s reading from the rate-limited reader (performing no network
requests), then 1s writing to the network at 8MB/s.
This is bad for network connections hurt by full-speed uploads,
particularly when writing 8MB will take several seconds.
Disable resumable uploads entirely by setting the chunk size to zero.
This causes the io.Reader to be passed further down the request stack,
where there is less (but still some) buffering.
My connection is around 1.5MB/s up, with nominal ~15ms ping times to
8.8.8.8.
Without this change, --limit-upload 1024 results in several seconds of
~200ms ping times (uploading), followed by several seconds of ~15ms ping
times (reading from rate-limited reader). A bandwidth monitor reports
this as several seconds of ~1.5MB/s followed by several seconds of
0.0MB/s.
With this change, --limit-upload 1024 results in ~20ms ping times and
the bandwidth monitor reports a constant ~1MB/s.
I've elected to make this change unconditional of --limit-upload because
the resumable uploads shouldn't be providing much benefit anyways, as
restic already uploads mostly small blobs and already has a retry
mechanism.
--limit-download is not affected by this problem, as Get().Download()
returns the real http.Response.Body without any internal buffering.
Updates #1216
This PR adds the ability of chaining the credentials provider,
such that restic as a tool attempts to honor credentials from
multiple different ways.
Currently supported mechanisms are
- static (user-provided)
- IAM profile (only valid inside configured ec2 instances)
- Standard AWS envs (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
- Standard Minio envs (MINIO_ACCESS_KEY, MINIO_SECRET_KEY)
Refer https://github.com/restic/restic/issues/1341
Windows, and to a lesser extent OS X, don't conform to XDG and have
their own preferred locations for caches.
On Windows, use %LOCALAPPDATA%/restic (i.e., ~/AppData/Local/restic). I
can't find authoritative documentation from Microsoft recommending
specifically which of %APPDATA%, %LOCALAPPDATA%, and %TEMP% should be
used for caches, but %LOCALAPPDATA% is where browsers store their
caches, so it seems like a good fit.
On OS X, use ~/Library/Caches/restic, which is recommended by the Apple
documentation. They do suggest using the application "bundle identifier"
as the base folder name, but restic doesn't have one, so I just used
"restic".
If the service account used with restic does not have the
storage.buckets.get permission (in the "Storage Admin" role), Create
cannot use Get to determine if the bucket is accessible.
Rather than always trying to create the bucket on Get error, gracefully
fall back to assuming the bucket is accessible. If it is, restic init
will complete successfully. If it is not, it will fail on a later call.
Here is what init looks like now in different cases.
Service account without "Storage Admin":
Bucket exists and is accessible (this is the case that didn't work
before):
$ ./restic init -r gs:this-bucket-does-exist:/
enter password for new backend:
enter password again:
created restic backend c02e2edb67 at gs:this-bucket-does-exist:/
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
Bucket exists but is not accessible:
$ ./restic init -r gs:this-bucket-does-exist:/
enter password for new backend:
enter password again:
create key in backend at gs:this-bucket-does-exist:/ failed:
service.Objects.Insert: googleapi: Error 403:
my-service-account@myproject.iam.gserviceaccount.com does not have
storage.objects.create access to object this-bucket-exists/keys/0fa714e695c8ecd58cb467cdeb04d36f3b710f883496a90f23cae0315daf0b93., forbidden
Bucket does not exist:
$ ./restic init -r gs:this-bucket-does-not-exist:/
create backend at gs:this-bucket-does-not-exist:/ failed:
service.Buckets.Insert: googleapi: Error 403:
my-service-account@myproject.iam.gserviceaccount.com does not have storage.buckets.create access to bucket this-bucket-does-not-exist., forbidden
Service account with "Storage Admin":
Bucket exists and is accessible: Same
Bucket exists but is not accessible: Same. Previously this would fail
when Create tried to create the bucket. Now it fails when trying to
create the keys.
Bucket does not exist:
$ ./restic init -r gs:this-bucket-does-not-exist:/
enter password for new backend:
enter password again:
created restic backend c3c48b481d at gs:this-bucket-does-not-exist:/
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
This commit adds code to synchronize downloading files to the cache.
Before, requests that came in for files currently downloading would fail
because the file was not completed in the cache. Now, the code waits
until the download is completed.
Closes#1278
This commit adds a function to the cache which can decide to proactively
load the complete pack file and store it in the cache. This is helpful
for pack files containing only tree blobs, as it is likely that the same
file is accessed again in the future.
This commits adds rudimentary support for a cache directory, enabled by
default. The cache directory is created if it does not exist. The cache
is used if there's anything in it, newly created snapshot and index
files are written to the cache automatically.
In the manual, state which standard roles the service account must
have to work correctly, as well as the specific permissions required,
for creating even more specific custom roles.
This was a bit tricky: We start the ssh binary, but we want it to ignore
SIGINT. In contrast, restic itself should process SIGINT and clean up
properly. Before, we used `setsid()` to give the ssh process its own
process group, but that means it cannot prompt the user for a password
because the tty is gone.
So, now we're passing in two functions that ignore SIGINT just before
the ssh process is started and re-install it after start.
The code now bundles tree blobs and data blobs into different pack
files, so we'll end up with pack files that either only contain data or
trees. This is in preparation to adding a cache (#1040), because
tree-only pack files can easily be cached later on.