The previous approach of rewriting all snapshots first, then flushing
the repository data and finally removing old snapshots has the downside
that an interrupted command execution leaves behind broken snapshots as
not all new data is already flushed.
This adds support for caching already rewritten trees, handling of load
errors and disabling the check that the serialization doesn't lead to
data loss.
The more generic RewriteNode callback replaces the SelectByName and
PrintExclude functions. The main part of this change is a preparation to
allow using the TreeRewriter for the `repair snapshots` command.
The files in a tree must be sorted in lexical order. However, this
cannot be guaranteed when appending a filename suffix. For two files
file, file.rep
where "file" is broken, this would result in
file.repaired, file.rep
which is no longer sorted.
In addition, adding a filename suffix is also prone to filename
collisions which would require a rather complex search for a
collision-free name in order to work reliably.
Simplify CLI options:
* Rename "DeleteSnapshots" to "Forget"
* Replace "AddTag" and "Append" with hardcoded values
Change output and snapshot modifications to be more in line with the
"rewrite" command.
The builtin mechanism to capture a stacktrace in Go is to send a SIGQUIT
to the running process. However, this mechanism is not avaiable on
Windows. Thus, tweak the SIGINT handler to dump a stacktrace if the
environment variable `RESTIC_DEBUG_STACKTRACE_SIGINT` is set.
When saving files to the local backend, in some cases the used fsync
calls are slow enough to cause the tests to time out. Thus, increase the
test timeouts as a stopgap measure until we can use the mem backend for
these tests.
The SemaphoreBackend now uniformly enforces the limit of concurrent
backend operations. In addition, it unifies the parameter validation.
The List() methods no longer uses a semaphore. Restic already never runs
multiple list operations in parallel.
By managing the semaphore in a wrapper backend, the sections that hold a
semaphore token grow slightly. However, the main bottleneck is IO, so
this shouldn't make much of a difference.
The key insight that enables the SemaphoreBackend is that all of the
complex semaphore handling in `openReader()` still happens within the
original call to `Load()`. Thus, getting and releasing the semaphore
tokens can be refactored to happen directly in `Load()`. This eliminates
the need for wrapping the reader in `openReader()` to release the token.
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes#4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
The output is now
```
-v, --verbose be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
instead of
```
-v, --verbose n be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
Fixes restic#719
If the option is passed, restic will wait the specified duration of time
and retry locking the repo every 10 seconds (or more often if the total
timeout is relatively small).
- Play nice with json output
- Reduce wait time in lock tests
- Rework timeout last attempt
- Reduce test wait time to 0.1s
- Use exponential back off for the retry lock
- Don't pass gopts to lockRepo functions
- Use global variable for retry sleep setup
- Exit retry lock on cancel
- Better wording for flag help
- Reorder debug statement
- Refactor tests
- Lower max sleep time to 1m
- Test that we cancel/timeout in time
- Use non blocking sleep function
- Refactor into minDuration func
Co-authored-by: Julian Brost <julian@0x4a42.net>
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes#4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
The output is now
```
-v, --verbose be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
instead of
```
-v, --verbose n be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
This method had a buffer argument, but that was nil at all call sites.
That's removed, and instead LoadUnpacked now reuses whatever it
allocates inside its retry loop.
The StdioWrapper is not used at all by the ProgressPrinters. It is
called a bit earlier than previously. However, as the password prompt
directly accessed stdin/stdout this doesn't cause problems.
The maximum for `--verbose=n` is n=2. Internally it is translated into a
scale from 0 to 3. However, the default (without verbose) is 1, thus the
verbosity level can only be increased two times.
Only the repacking of *un*compressed packs reduces the amount of
uncompressed data. Previously the counter even overflowed for fully
compressed repositories.
Commands should use the normal shutdown path. In addition, the Exitf
function was only used by `dump` and `restore` but not any other command
which introduces the risk of inconsistent behavior.
When reporting an error for a tree, the output message can overlap with
the progress bar output, e.g. `error for tree e91ef6fb:napshots`.
The fix only applies for this specific message and does not work on
Windows.
Revert what seems to be a typo introduced as part of the fix for #2041
in 2018 7d0f2eaf24.
`xbuild` does not look like a go build/tag keyword to me, I failed to
find documentation for it and using `go install -tags '!selfupdate' ...`
has no effect, i.e. self-update code is still compiled.
`+build` however works; updating the OpenBSD port/binary package
security/restic to apply this PR works as expected:
```
$ restic help | grep self
$ restic self-update
unknown command "self-update" for "restic"
```
(Using `go:build` now as per restic's style and gofmt.)
Previously, using `restic-0.14.0p1` on OpenBSD/amd64 7.2-current would
check for a newer version and probably attempt replacing the system wide
root-owned executable (on a read-only filesystem) as unprivileged user:
```
$ restic version
restic 0.14.0 compiled with go1.19.2 on openbsd/amd64
$ restic help | grep self
self-update Update the restic binary
$ restic self-update
writing restic to /usr/local/bin/restic
find latest release of restic at GitHub
restic is up to date
```
(It never tried to actually write besaid path; doing so would fail, so
the current message can be considered misleading.)
The scanner process has only cosmetic effect for the progress printer,
and can be disabled without impacting functionality when the user does
not need an estimate of completion.
In many cases the scanner process can provide beneficial priming of
the file system cache, so as general advice it should not be disabled.
However, tests have shown that backup of NFS and fuse based filesystems,
where stat(2) is relatively expensive, can be significantly faster
without the scanner.
The Original field is meant to remember the original snapshot id if e.g.
changing its tags. It was only set by the `rewrite` command if it was
not set previously. However, a rewritten snapshot is potentially rather
different from the original snapshot. Thus just always set the Original
field. This also makes it easier to later on detect and potentially
remove the original snapshots.
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.
This commit only mechanically replaces the function calls.
The old check did not consider files containing case insensitive
excludes. The check is now implemented as a function of the
excludePatternOptions struct to improve cohesion.
The test did not wait for the mount command to fully shutdown all
running goroutines. This caused the go race detector to report a data
race related to lock refreshes.
==================
WARNING: DATA RACE
Write at 0x0000021bdfdb by goroutine 667:
github.com/restic/restic/internal/backend/retry.TestFastRetries()
/restic/restic/internal/backend/retry/testing.go:7 +0x18f
github.com/restic/restic/cmd/restic.withTestEnvironment()
/restic/restic/cmd/restic/integration_helpers_test.go:175 +0x183
github.com/restic/restic/cmd/restic.TestMountSameTimestamps()
/restic/restic/cmd/restic/integration_fuse_test.go:202 +0xac
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1446 +0x216
testing.(*T).Run.func1()
/usr/lib/go/src/testing/testing.go:1493 +0x47
Previous read at 0x0000021bdfdb by goroutine 609:
github.com/restic/restic/internal/backend/retry.(*Backend).retry()
/restic/restic/internal/backend/retry/backend_retry.go:72 +0x9e
github.com/restic/restic/internal/backend/retry.(*Backend).Remove()
/restic/restic/internal/backend/retry/backend_retry.go:149 +0x17d
github.com/restic/restic/internal/cache.(*Backend).Remove()
/restic/restic/internal/cache/backend.go:38 +0x11d
github.com/restic/restic/internal/restic.(*Lock).Unlock()
/restic/restic/internal/restic/lock.go:190 +0x249
github.com/restic/restic/cmd/restic.refreshLocks.func1()
/restic/restic/cmd/restic/lock.go:86 +0xae
runtime.deferreturn()
/usr/lib/go/src/runtime/panic.go:476 +0x32
github.com/restic/restic/cmd/restic.lockRepository.func2()
/restic/restic/cmd/restic/lock.go:61 +0x71
[...]
Goroutine 609 (finished) created at:
github.com/restic/restic/cmd/restic.lockRepository()
/restic/restic/cmd/restic/lock.go:61 +0x488
github.com/restic/restic/cmd/restic.lockRepo()
/restic/restic/cmd/restic/lock.go:25 +0x219
github.com/restic/restic/cmd/restic.runMount()
/restic/restic/cmd/restic/cmd_mount.go:126 +0x1f8
github.com/restic/restic/cmd/restic.testRunMount()
/restic/restic/cmd/restic/integration_fuse_test.go:61 +0x1ce
github.com/restic/restic/cmd/restic.checkSnapshots.func1()
/restic/restic/cmd/restic/integration_fuse_test.go:90 +0x124
==================
Counting the first occurrence of a duplicate blob as used and counting
all other as duplicates, independent of which instance of the blob is
kept, is only accurate if all copies of the blob have the same size. This
is no longer the case for a repository containing both compressed and
uncompressed blobs.
Thus for duplicated blobs first count all instances as duplicates and
then subtract the actually used instance later on.
As long as only a small fraction of the data in a repository is
rewritten, the keepBlobs set will be rather small after cleaning it up.
As golang maps do not shrink their memory usage, just copy the contents
over to a new map. However, only copy the map if the cleanup removed at
least half the entries.
The set covers necessary, existing and duplicate blobs. This removes the
duplicate sets used to track whether all necessary blobs also exist.
This reduces the memory usage of prune by about 20-30%.
The RetryBackend tests depend on the mock backend. When the Backend
interface is eventually split from the restic package, this will lead to
a dependency cycle between backend and backend/mock. Thus split the
RetryBackend into a separate package to avoid this problem.
The comparison of the current time and the last lock refresh were using
seconds represented as integers. As the test only waits for up to one
second, the associated number truncation can cause the test to take
longer than once second and thus to fail.
Switch to nanoseconds to avoid this problem. This also slightly speeds
up the test.
FindFilteredSnapshots no longer prints errors during snapshot loading on
stderr, but instead passes the error to the callback to allow the caller
to decide on what to do.
In addition, it moves the logic to handle an explicit snapshot list from
the main package to restic.
The only use cases in the code were in errors.IsFatal, backend/b2,
which needs a workaround, and backend.ParseLayout. The last of these
requires all backends to implement error unwrapping in IsNotExist.
All backends except gs already did that.
Monotonic timers are paused during standby. Thus these timers won't fire
after waking up. Fall back to periodic polling to detect too large clock
jumps. See https://github.com/golang/go/issues/35012 for a discussion of
go timers during standby.
Restic continued e.g. a backup task even when it failed to renew the
lock or failed to do so in time. For example if a backup client enters
standby during the backup this can allow other operations like `prune`
to run in the meantime (after calling `unlock`). After leaving standby
the backup client will continue its backup and upload indexes which
refer pack files that were removed in the meantime.
This commit introduces a goroutine explicitly monitoring for locks that
are not refreshed in time. To simplify the implementation there's now a
separate goroutine to refresh the lock and monitor for timeouts for each
lock. The monitoring goroutine would now cause the backup to fail as the
client has lost it's lock in the meantime.
The lock refresh goroutines are bound to the context used to lock the
repository initially. The context returned by `lockRepo` is also
cancelled when any of the goroutines exits. This ensures that the
context is cancelled whenever for any reason the lock is no longer
refreshed.
Previously the global context was either accessed via gopts.ctx,
stored in a local variable and then used within that function or
sometimes both. This makes it very hard to follow which ctx or a wrapped
version of it reaches which method.
Thus just drop the context from the globalOptions struct and pass it
explicitly to every command line handler method.
We can either preallocate storage for a file or sparsify it. This
detects a pack file as sparse if it contains an all zero block or
consists of only one block. As the file sparsification is just an
approximation, hide it behind a `--sparse` parameter.
`restic unlock` now only shows `successfully removed locks` if there were locks to be removed.
In addition, it also reports the number of the removed lock files.
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.
> name old time/op new time/op delta
> MasterIndexEach-16 6.68s ±24% 0.96s ± 2% -85.64% (p=0.008 n=5+5)
This results in printing a `(default: $ENV) (default: value)` suffix for
the corresponding options which looks strange. In addition, some of the
environment variables might contain secrets which should not be
displayed.
`init` and `copy` use `--repo2` with two different meaning which has
proven to be confusing for users. `--from-repo` now consistently marks a
source repository from which data is read. `--repo` is now always the
target/destination repository.
After repacking every blob that should be kept must have been repacked.
We have seen a few cases in which a single blob went missing, which
could have been caused by a bitflip somewhere. This sanity check might
help catch some of these cases.
Unused blobs are not a problem but rather expected to exist now that
prune by default does not remove every unused blob. However, the option
has caused questions from users whether a repository is damaged or not,
so just remove that option.
Note that the remaining code is left intact as it is still useful for
our test cases.
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
raw-data summed up the size of the blob plaintexts. However, with
compression this makes little sense as the storage size in the
repository is lower due to compression. Thus sum up the actual size each
blob takes in the repository.
The GlobalOptions struct now embeds a backend.TransportOptions, so it
doesn't need to construct one in open and create. The upload and
download limits are similarly now a struct in internal/limiter that is
embedded in GlobalOptions.
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.
Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
Tree packs are cached locally at clients and thus benefit a lot from
being compressed. Ensure this be having prune always repack pack files
containing uncompressed trees.
The `stats` command checks inodes to not count hardlinked files multiple
times into the restore size. This check applies across all snapshots and
not only within snapshots. As a result the result size was far too low
when calculating it for multiple snapshots and it would vary depending
on the order in which snapshots were listed.
The new option allows prune to operate with nearly no scratch space by only removing
no longer necessary pack files and first deleting the index before
rebuilding it. By first deleting the index it becomes safe to just
delete no longer necessary pack files. However, as a downside there's
now the risk that the repository becomes inaccessible if prune fails.
To recover from that problem a user might have to manually delete the
repository index and then run (a full) `rebuild-index` again.
A compressed index is only about one third the size of an uncompressed
one. Thus increase the number of entries in an index to avoid cluttering
the repository with small indexes.
As an exception prune is still allowed to load the index before
snapshots, as it uses exclusive locks. In case of problems with locking
it is also better to load snapshots created after loading the index, as
this will lead to a prune sanity check failure instead of a broken snapshot.
When resolving snapshotIDs in FindFilteredSnapshots either
FindLatestSnapshot or FindSnapshot is called. Both operations issue a
list operation to the backend. When for example passing a long list of
snapshot ids to `forget` this could lead to a large number of list
operations.
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
During a backup the index is written before the corresponding snapshots.
To ensure that a concurrent/later restic run can read a snapshot's data,
restic thus must first load the snapshots and only afterwards the index.
Otherwise it is not possible to ensure that the loaded index is recent
enough to cover all of the snapshot's data.
Nodes in trees were always printed with a `+` in diff, regardless of
whether or not a dir was added or removed. Let's use the mode we were
passed in printDir().
Closes#3685
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
Removing data based on a policy when the attacker had the opportunity to
add data to your repository comes with some considerations. This is
added to the 060_forget.rst documentation.
That document is also updated to reflect that restic now considers
the current system time while running "forget".
References to the security considerations section are added:
- In `restic forget --help`
- In the threat model (design.rst)
- In the (030) setup section where an append-only setup is referenced
A reference is also to be added to the `rest-server` readme's
append-only paragraph (see my fork).
This commit also resolves a typo (amount->number for countable noun),
changes a password length recommendation into the metric that
actually matters when creating passwords (entropy) since I was editing
these doc files anyway, and updates the outdated copyright year in
`conf.py`.
Some wording in 060_forget (line 21..22) was changed to clarify what
"forget" and "prune" do, to try and avoid the apparent misconception
that "forget" does not remove any data.
There's no point in locking the repository just to list the currently
existing lock files. This won't work for an exclusively locked
repository and is also confusing to users.
Loading any parent tree for these only wastes time and memory.
Fixes#3641, where it was shown that the most recent tree will get
picked.
--parent is now implicitly ignored when --stdin is given.
cleanup handlers run in the order in which they are added. As Go calls
init() functions in lexical order, the cleanup handler from global.go
was registered before that from lock.go, which is the correct order.
Make this order explicit to ensure that this won't break accidentally.
Currently, `restic backup` (if a `--parent` is not provided)
will choose the most recent matching snapshot as the parent snapshot.
This makes sense in the usual case,
where we tag the snapshot-being-created with the current time.
However, this doesn't make sense if the user has passed `--time`
and is currently creating a snapshot older than the latest snapshot.
Instead, choose the most recent snapshot
which is not newer than the snapshot-being-created's timestamp,
to avoid any time travel.
Impetus for this change:
I'm using restic for the first time!
I have a number of existing BTRFS snapshots
I am backing up via restic to serve as my initial set of backups.
I initially `restic backup`'d the most recent snapshot to test,
then started backing up each of the other snapshots.
I noticed in `restic cat snapshot <id>` output
that all the remaining snapshots have the most recent as the parent.
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.
This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.
This commit also addresses the TODO comment in the copyTree function.
Related work:
A more thorough improvement of the restic copy performance can be found
in PR #3513
Closes#3595
Choosing to include `stdoutIsTerminal()` as:
- all other instances with `!opts.JSON` do so
- this likely will not affect anything, especially when autorun
- this seems to not be a meaningful enough summary
to include in auto-backup reports
JSON is still likely not guaranteed to work and this is a suboptimal
solution to this. Ideally, #1804 should refactor all print statements,
and define+document(+handle) when stdoutIsTerminal() should be used.
Else, it may end up more inconsistent and bulky
(duplicate lines, longer files).
Per Amazon's product page [1], S3 is officially called "Amazon S3". The
restic project uses the phrase "AWS S3" in some places. This patch
corrects the product name.
[1]:https://aws.amazon.com/s3/
Further code will also output to the terminal and the bar's cursor
positioning causes its output to overlap with the remaining output in a
racy way.
Fixes: #3344
Package internal/dump has been reworked so its API consists of a single
type Dumper that handles tar and zip formats. Tree loading and node
writing happen concurrently.
Running restic self-update --quiet no longer
prints "writing restic to /usr/local/bin/restic".
The only output printed with -q is failures or
"successfully updated restic to version 0.12.1"
https://github.com/restic/restic/pull/3535
fix test fail: changelog title can't end with `.`
shorten changelog title
After the refactoring status updates were no longer printed in quiet
mode or when the output is not an interactive terminal. However, the
JSON output is often piped to e.g. another program. Thus, don't set the
update frequency to 0 in that case. The status updates are still
disabled for backup --quiet.
This also reduces the status update frequency to 60fps compared to a
potentially much higher value before the refactoring.