After repacking every blob that should be kept must have been repacked.
We have seen a few cases in which a single blob went missing, which
could have been caused by a bitflip somewhere. This sanity check might
help catch some of these cases.
Unused blobs are not a problem but rather expected to exist now that
prune by default does not remove every unused blob. However, the option
has caused questions from users whether a repository is damaged or not,
so just remove that option.
Note that the remaining code is left intact as it is still useful for
our test cases.
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
raw-data summed up the size of the blob plaintexts. However, with
compression this makes little sense as the storage size in the
repository is lower due to compression. Thus sum up the actual size each
blob takes in the repository.
The GlobalOptions struct now embeds a backend.TransportOptions, so it
doesn't need to construct one in open and create. The upload and
download limits are similarly now a struct in internal/limiter that is
embedded in GlobalOptions.
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.
Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
Tree packs are cached locally at clients and thus benefit a lot from
being compressed. Ensure this be having prune always repack pack files
containing uncompressed trees.
The `stats` command checks inodes to not count hardlinked files multiple
times into the restore size. This check applies across all snapshots and
not only within snapshots. As a result the result size was far too low
when calculating it for multiple snapshots and it would vary depending
on the order in which snapshots were listed.
The new option allows prune to operate with nearly no scratch space by only removing
no longer necessary pack files and first deleting the index before
rebuilding it. By first deleting the index it becomes safe to just
delete no longer necessary pack files. However, as a downside there's
now the risk that the repository becomes inaccessible if prune fails.
To recover from that problem a user might have to manually delete the
repository index and then run (a full) `rebuild-index` again.
A compressed index is only about one third the size of an uncompressed
one. Thus increase the number of entries in an index to avoid cluttering
the repository with small indexes.
As an exception prune is still allowed to load the index before
snapshots, as it uses exclusive locks. In case of problems with locking
it is also better to load snapshots created after loading the index, as
this will lead to a prune sanity check failure instead of a broken snapshot.
When resolving snapshotIDs in FindFilteredSnapshots either
FindLatestSnapshot or FindSnapshot is called. Both operations issue a
list operation to the backend. When for example passing a long list of
snapshot ids to `forget` this could lead to a large number of list
operations.
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
During a backup the index is written before the corresponding snapshots.
To ensure that a concurrent/later restic run can read a snapshot's data,
restic thus must first load the snapshots and only afterwards the index.
Otherwise it is not possible to ensure that the loaded index is recent
enough to cover all of the snapshot's data.
Nodes in trees were always printed with a `+` in diff, regardless of
whether or not a dir was added or removed. Let's use the mode we were
passed in printDir().
Closes#3685
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
Removing data based on a policy when the attacker had the opportunity to
add data to your repository comes with some considerations. This is
added to the 060_forget.rst documentation.
That document is also updated to reflect that restic now considers
the current system time while running "forget".
References to the security considerations section are added:
- In `restic forget --help`
- In the threat model (design.rst)
- In the (030) setup section where an append-only setup is referenced
A reference is also to be added to the `rest-server` readme's
append-only paragraph (see my fork).
This commit also resolves a typo (amount->number for countable noun),
changes a password length recommendation into the metric that
actually matters when creating passwords (entropy) since I was editing
these doc files anyway, and updates the outdated copyright year in
`conf.py`.
Some wording in 060_forget (line 21..22) was changed to clarify what
"forget" and "prune" do, to try and avoid the apparent misconception
that "forget" does not remove any data.
There's no point in locking the repository just to list the currently
existing lock files. This won't work for an exclusively locked
repository and is also confusing to users.
Loading any parent tree for these only wastes time and memory.
Fixes#3641, where it was shown that the most recent tree will get
picked.
--parent is now implicitly ignored when --stdin is given.
cleanup handlers run in the order in which they are added. As Go calls
init() functions in lexical order, the cleanup handler from global.go
was registered before that from lock.go, which is the correct order.
Make this order explicit to ensure that this won't break accidentally.
Currently, `restic backup` (if a `--parent` is not provided)
will choose the most recent matching snapshot as the parent snapshot.
This makes sense in the usual case,
where we tag the snapshot-being-created with the current time.
However, this doesn't make sense if the user has passed `--time`
and is currently creating a snapshot older than the latest snapshot.
Instead, choose the most recent snapshot
which is not newer than the snapshot-being-created's timestamp,
to avoid any time travel.
Impetus for this change:
I'm using restic for the first time!
I have a number of existing BTRFS snapshots
I am backing up via restic to serve as my initial set of backups.
I initially `restic backup`'d the most recent snapshot to test,
then started backing up each of the other snapshots.
I noticed in `restic cat snapshot <id>` output
that all the remaining snapshots have the most recent as the parent.
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.
This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.
This commit also addresses the TODO comment in the copyTree function.
Related work:
A more thorough improvement of the restic copy performance can be found
in PR #3513
Closes#3595
Choosing to include `stdoutIsTerminal()` as:
- all other instances with `!opts.JSON` do so
- this likely will not affect anything, especially when autorun
- this seems to not be a meaningful enough summary
to include in auto-backup reports
JSON is still likely not guaranteed to work and this is a suboptimal
solution to this. Ideally, #1804 should refactor all print statements,
and define+document(+handle) when stdoutIsTerminal() should be used.
Else, it may end up more inconsistent and bulky
(duplicate lines, longer files).
Per Amazon's product page [1], S3 is officially called "Amazon S3". The
restic project uses the phrase "AWS S3" in some places. This patch
corrects the product name.
[1]:https://aws.amazon.com/s3/
Further code will also output to the terminal and the bar's cursor
positioning causes its output to overlap with the remaining output in a
racy way.
Fixes: #3344
Package internal/dump has been reworked so its API consists of a single
type Dumper that handles tar and zip formats. Tree loading and node
writing happen concurrently.
Running restic self-update --quiet no longer
prints "writing restic to /usr/local/bin/restic".
The only output printed with -q is failures or
"successfully updated restic to version 0.12.1"
https://github.com/restic/restic/pull/3535
fix test fail: changelog title can't end with `.`
shorten changelog title
After the refactoring status updates were no longer printed in quiet
mode or when the output is not an interactive terminal. However, the
JSON output is often piped to e.g. another program. Thus, don't set the
update frequency to 0 in that case. The status updates are still
disabled for backup --quiet.
This also reduces the status update frequency to 60fps compared to a
potentially much higher value before the refactoring.
* PrintProgress no longer does unnecessary Sprintf calls, and performs
fewer allocations in general
* newProgressMax's callback checks whether the terminal supports
line updates once instead of once per call
* the callback looks up the terminal width once per call instead of
twice (on Windows)
* the status shortening now uses the Unicode-aware version from
internal/ui/termstatus (future-proofing)
This can be used to check how large a backup is or validate exclusions.
It does not actually write any data to the underlying backend. This is
implemented as a simple overlay backend that accepts writes without
forwarding them, passes through reads, and generally does the minimal
necessary to pretend that progress is actually happening.
Fixes#1542
Example usage:
$ restic -vv --dry-run . | grep add
new /changelog/unreleased/issue-1542, saved in 0.000s (350 B added)
modified /cmd/restic/cmd_backup.go, saved in 0.000s (16.543 KiB added)
modified /cmd/restic/global.go, saved in 0.000s (0 B added)
new /internal/backend/dry/dry_backend_test.go, saved in 0.000s (3.866 KiB added)
new /internal/backend/dry/dry_backend.go, saved in 0.000s (3.744 KiB added)
modified /internal/backend/test/tests.go, saved in 0.000s (0 B added)
modified /internal/repository/repository.go, saved in 0.000s (20.707 KiB added)
modified /internal/ui/backup.go, saved in 0.000s (9.110 KiB added)
modified /internal/ui/jsonstatus/status.go, saved in 0.001s (11.055 KiB added)
modified /restic, saved in 0.131s (25.542 MiB added)
Would add to the repo: 25.892 MiB
Allow keeping hourly/daily/weekly/monthly/yearly snapshots for a given time period.
This adds the following flags/parameters to restic forget:
--keep-within-hourly duration
--keep-within-daily duration
--keep-within-weekly duration
--keep-within-monthly duration
--keep-within-yearly duration
Includes following changes:
- Add tests for --keep-within-hourly (and friends)
- Add documentation for --keep-within-hourly (and friends)
- Add changelog for --keep-within-hourly (and friends)
If a pack file is missing try to determine the contained pack ids based
on the repository index. This helps with assessing the damage to a
repository before running `rebuild-index`.
Just passing the list of blobs to packsToBlobs would also work in most
cases, however, it could cause unexpected results when multiple pack
files have the same prefix. Forget found prefixes to prevent this.
Apparently readahead was disabled by default. Enable readahead with the
Linux default size of 128kB. Larger values seem to have no effect.
This can speed up reading from the fuse mount by at least factor 5.
Speedup for a 1G random file stored in a local repository:
(Only one result shown, but times were quite stable, restarted restic
after each command)
$ dd if=/dev/urandom bs=1M count=1024 of=rand
$ shasum -a 256 tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d tmp/rand
before: $ time shasum -a 256 fuse/snapshots/latest/tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d fuse/snapshots/latest/tmp/rand
real 0m18.294s
user 0m4.522s
sys 0m3.305s
before: $ time cat fuse/snapshots/latest/tmp/rand > /dev/null
real 0m14.924s
user 0m0.000s
sys 0m4.625s
after: $ time shasum -a 256 fuse/snapshots/latest/tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d fuse/snapshots/latest/tmp/rand
real 0m6.106s
user 0m3.115s
sys 0m0.182s
after: $ time cat fuse/snapshots/latest/tmp/rand > /dev/null
real 0m3.096s
user 0m0.017s
sys 0m0.241s
This patch adds a `--latest` option to limit snapshots list to the n
last snapshots. It is very similar to the `--last` one but does not
limit to one entry. It also deprecates the `--last` flag usage in
favor of `--latest 1`
Output example:
$ restic snapshots --latest 2
repository 0d3eb989 opened successfully, password is correct
ID Time Host Tags Paths
------------------------------------------------------------
5a33bdcc 2020-12-14 12:30:00 local /home
73887d8e 2020-12-15 12:30:00 local /home
------------------------------------------------------------
2 snapshots
Signed-off-by: Sébastien Gross <seb•ɑƬ•chezwam•ɖɵʈ•org>
Previously the progress bar / status update interval used
stdoutIsTerminal to determine whether it is possible to update the
progress bar or not. However, its implementation differed from the
detection within the backup command which included additional checks to
detect the presence of mintty on Windows. mintty behaves like a terminal
but uses pipes for communication.
This adds stdoutCanUpdateStatus() which calls the same terminal detection
code used by backup. This ensures that all commands consistently switch
between interactive and non-interactive terminal mode.
stdoutIsTerminal() now also returns true whenever stdoutCanUpdateStatus()
does so. This is required to properly handle the special case of mintty.
The `init` and `copy` commands can now use `--repository-file2` flag and
the `$RESTIC_REPOSITORY_FILE2` environment variable.
This also fixes the conflict with the `--repository-file` and `--repo2`
flag.
Tests are added for the initSecondaryGlobalOpts function.
This adds a NOK function to the test helper functions. This NOK tests if
err is not nil, and otherwise fail the test.
With the NOK function a couple of sad paths are tested in the
initSecondaryGlobalOpts function.
In total the tests checks wether the following are passed correct:
- Password
- PasswordFile
- Repo
- RepositoryFile
The following situation must return an error to pass the test:
- no Repo or RepositoryFile defined
- Repo and RepositoryFile defined both
This avoids problems when for some reason the JSON encoding changes.
This also ensures forward compatibility with future restic versions
which might e.g. add new fields to the tree metadata.
This commit changes the error message so that a list of file names is
printed. Before, just the raw map was printed, which is not a great user
interface.
For example `restic find --show-pack-id --blob f78dc991 5b9e4366 ddd8c7d4`
would previously only expand one blob if all of them belong to the same
file.
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
When a file system is mounted at a directory, lstat() returns attributes
of the root node of the mounted file system, including the device ID of
the other file system. The previous code used when --one-file-system is
specified excluded the directory itself because of that.
This commit changes the code so that mountpoints are kept as empty
directories, its attributes set to the root note of the mounted file
system. The behavior mimics `tar`, which does the same.
Note that this fix only solves the statistics problem, if
all duplicates are marked for repacking.
If not all duplicates are marked for repacking, we lack the
information which
The situation that not all duplicates are marked for repacking can occur
when using the `max-repack-size` option
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
Add a callback to the PruneOptions struct which calculates the number of
bytes allowed to be unused after prune is done. This way, the logic is
closer to the option parsing code.
Also, add an explicit option `unlimited` for the use case when storage
does not matter but bandwidth and time do. Internally, this sets the
maximum number of unused bytes to MaxUint64.
Rework the documentation slightly so that no more "packs" are
mentioned and it talks about "files" instead.
Make it clear in the documentation that the percentage given to
`--max-unused` is relative to the whole repository size after pruning is
done. If specified, it must be below 100%, otherwise the repository
would contain 100% of unused data, which is pointless.
I had a hard time coming up with the correct formula to calculate the
maximum number of unused bytes based on the number of used bytes. For a
fraction `p` (0 ≤ p < 1), a repo with `u` bytes used, and the number of
unused bytes `x` the following holds:
x ≤ p * (u+x)
⇔ x ≤ p*u + p*x
⇔ x - p*x ≤ p*u
⇔ x * (1-p) ≤ p*u
⇔ x ≤ p/(1-p) * u
The VSS support works for 32 and 64-bit windows, this includes a check that
the restic version matches the OS architecture as required by VSS. The backup
operation will fail the user has not sufficient permissions to use VSS.
Snapshotting volumes also covers mountpoints but skips UNC paths.
The io.Reader interface does not support contexts, such that it is
necessary to embed the context into the backendReaderAt struct. This has
the problem that a reader might suddenly stop working when it's
contained context is canceled. However, this is now problem here as the
reader instances never escape the calling function.
Now that lockRepo receives a context, it is possible that it is canceled
before a lock was created. Thus `unlockRepo` must be able to handle this
case.
The archiver first called the Select function for a path before checking
whether the Lstat on that path actually worked. The RejectFuncs in
exclude.go worked around this by checking whether they received a nil
os.FileInfo. Checking first is more obvious and requires less code.
This removes the requirement on `restic self-update --output` to point
to a path of an existing file, to overwrite. In case the specified
path does exist we still want to verify that it's a regular file,
rather than a directory or a device, which gets overwritten.
We also want to verify that a path to a new file exists within an
existing directory. The alternative being running into that issue
after the actual download, etc has completed.
While at it I also replace `errors.Errorf` with the more appropriately
verbose `errors.Fatalf`.
Resolves#2491
As an alternative to -r, this allows to read the repository URL
from a file in order to prevent certain types of information leaks,
especially for URLs containing credentials.
Fixes#1458, fixes#2900.
This allows creating multiple repositories with identical chunker
parameters which is required for working deduplication when copying
snapshots between different repositories.
When the diff calculation compares two trees with identical id then no
differences between them can ever show up. Optimize for that case by
simply traversing the tree only once to collect all referenced blobs for
a proper calculation of added and removed blobs.
Just skipping the common subtrees is not possible as this would skew the
results if the added or removed blobs are shared with one of the
subtrees.
There was an issue that prevented the dump command from working
correctly when either:
* `/` contained multiple nodes (e.g. `restic backup /`)
* dumping a file in the first sublevel was attempted (e.g. `/foo`)
The help messages suggested that the `ls` command work without
explicitly passing a snapshot ID. However, this was never the case:
without a snapshot ID the command just failed with the error
`Ignoring "", it is not a snapshot id`.
Fixes#2299
Use the `Original` field of the copied snapshot to store a persistent
snapshot ID. This can either be the ID of the source snapshot if
`Original` was not yet set or the previous value stored in the
`Original` field. In order to still copy snapshots modified using the
tags command the source snapshot is compared to all snapshots in the
destination repository which have the same persistent ID. Snapshots are
only considered equal if all fields except `Original` and `Parent`
match. That way modified snapshots are still copied while avoiding
duplicate copies at the same time.
The standard UNIX-style ordering of command-line arguments places
optional flags before other positional arguments. All of restic's
commands support this ordering, but some of the usage strings showed the
flags after the positional arguments (which restic also parses just
fine). This change updates the doc strings to reflect the standard
ordering.
Because the `restic help` command comes directly from Cobra, there does
not appear to be a way to update the argument ordering in its usage
string, so it maintains the non-standard ordering (positional arguments
before optional flags).
Add a copy command to copy snapshots between repositories. It allows the user
to specify a destination repository, password, password-file, password-command
or key-hint to supply the necessary details to open the destination repository.
You need to supply a list of snapshots to copy, snapshots which already exist
in the destination repository will be skipped.
Note, when using the network this becomes rather slow, as it needs to read the
blocks, decrypt them using the source key, then encrypt them again using the
destination key before finally writing them out to the destination repository.
The old behavior was problematic in the context of rebuild-index as it
could leave old, possibly invalid index files behind without returning a
fatal error.
Prune calls rebuildIndex before removing any data from the repository.
For this use case failing to delete an old index MUST be treated as a
fatal error. Otherwise the index could still contain an old index file
that refers to blobs/packs that were later on deleted by prune. Later
backup runs will assume that the affected blobs already exist in the
repository which results in a backup which misses data.
The previous check only approximately verified whether all required
blobs were found. However, after forgetting a few snapshots the
repository contains lots of unused blobs whose number can be sufficient
to make up for missing packs.
When coupled with a malfunctioning backend that temporarily returns broken
data this could cause restic to regard the corresponding packs as
invalid and thereby delete data that's still in use. This change lets
restic play it safe and refuse to delete anything if data is missing.
Do not lock the repository if --no-lock global flag is set. This allows
to mount repositories which are archived on a read only system.
Signed-off-by: Sébastien Gross <seb•ɑƬ•chezwam•ɖɵʈ•org>
The seen BlobSet always contained a subset of the entries in blobs.
Thus use blobs instead and avoid the memory overhead of the second set.
Suggested-by: Alexander Weiss <alex@weissfam.de>
If a data blob and a tree blob with the same ID (= same content) exist,
then the checker did not report a data or tree blob as unused when the
blob of the other type was still in use.
The backup command used to return a zero exit code as long as a snapshot
could be created successfully, even if some of the source files could not
be read (in which case the snapshot would contain the rest of the files).
This made it hard for automation/scripts to detect failures/incomplete
backups by looking at the exit code. Restic now returns the following exit
codes for the backup command:
- 0 when the command was successful
- 1 when there was a fatal error (no snapshot created)
- 3 when some source data could not be read (incomplete snapshot created)
That site might not have supported https:// when those links were
originally added. It does now.
Also dropping the _spec.html_ ending of the url, there being a `<link
rel="canonical" ...>` tag suggesting that that no longer being the
preferred address.
cmd/restic/globals.go already provides Printf, Println and Warnf wrapper
which get their output streams from the globalOptions object. This
allows for stream replacements when testing.
- The SaveBlob method now checks for duplicates.
- Moves handling of pending blobs to MasterIndex.
-> also cleans up pending index entries when they are saved in the index
-> when using SaveBlob no need to care about index any longer
- Always check for full index and save it when storing packs.
-> removes the need of an index uploader
-> also removes the verbose "uploaded intermediate index" messages
- The Flush method now also saves the index
- Fix race condition when checking and saving full/non-finalized indexes
errors.Fatalf wraps a error and just keeps an error message as a string.
This prevents the `restic.IsAlreadyLocked(err)` check from working as
the error is no longer an ErrAlreadyLocked.
Just add an additional remark to the error using `errors.WithMessage`.