Commit graph

142 commits

Author SHA1 Message Date
Michael Eischer
fbcbd5318c repository: extract LoadTree/SaveTree
The repository has no real idea what a Tree is. So these methods never
belonged there.
2022-07-17 13:11:28 +02:00
Michael Eischer
6f53ecc1ae adapt workers based on whether an operation is CPU or IO-bound
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.

Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
2022-07-03 12:19:26 +02:00
Michael Eischer
120ccc8754 repository: Rework blob saving to use an async pack uploader
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.

Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.

To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
2022-07-02 22:42:34 +02:00
Michael Eischer
5e0f1c3cef check: remove dead code 2022-07-02 19:28:57 +02:00
Michael Eischer
0df022fa6d check: Print full ids
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
2022-07-02 19:28:57 +02:00
Alexander Neumann
99634c0936 Return real size from SaveBlob 2022-07-02 18:55:12 +02:00
MichaelEischer
fdc53a9d32
Merge pull request #3787 from MichaelEischer/refactor-repository
repository: (Mostly) index-related cleanups
2022-07-02 18:54:04 +02:00
Michael Eischer
a77d5c4d11 repository: index saving belongs into the MasterIndex 2022-07-02 18:38:56 +02:00
greatroar
a0fa9c6e9f Revert "restic prune: Merge three loops over the index"
This reverts commit 8bdfcf779f.
Should fix #3809. Also needed to make #3290 apply cleanly.
2022-06-30 15:27:34 +02:00
MichaelEischer
19581dbc18
Merge pull request #3786 from greatroar/prune
restic prune: Merge three loops over the index
2022-06-18 16:54:50 +02:00
greatroar
8bdfcf779f restic prune: Merge three loops over the index
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.

Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
2022-06-18 10:40:33 +02:00
greatroar
f92ecf13c9 all: Move away from pkg/errors, easy cases
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:

* check for io.EOF with a straight ==. That value should not be wrapped,
  and the chunker (whose error is checked in the cases changed) does not
  wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
  when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
  alreadyLockedError to match the stdlib convention that error type
  names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
2022-06-14 08:36:38 +02:00
Michael Eischer
5815f727ee checker: convert error type to use pointer-receivers 2022-05-09 22:31:30 +02:00
Alexander Neumann
8b11b86383 Add option global --compression 2022-04-30 11:34:10 +02:00
Michael Eischer
7b9ae91e04 copy: Load snapshots before indexes 2022-04-09 12:27:25 +02:00
Michael Eischer
3d29083e60 copy/find/ls/recover/stats: Memorize snapshot listing before index
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
2022-04-09 12:26:30 +02:00
Michael Eischer
a773cb6527 pack: cleanup header size calculation 2022-03-28 22:09:49 +02:00
Michael Eischer
6408686973 repository: Simplify Blob equality check 2022-03-28 22:09:49 +02:00
Michael Eischer
f78bd14e28 repository: Remove pack implementation details from MasterIndex 2022-03-28 22:09:49 +02:00
Michael Eischer
4b3dc415ef checker: cleanup header extraction 2022-02-12 20:18:25 +01:00
Michael Eischer
930a00ad54 checker: reuse bufio reader 2022-02-12 20:18:25 +01:00
Michael Eischer
f1e58e7c7f checker: rewrite ReadData to stream packs 2022-02-12 20:18:25 +01:00
Michael Eischer
9aa2eff384 Add plumbing to calculate backend specific file hash for upload
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
2021-08-04 22:17:46 +02:00
Alexander Neumann
04ca69cc78 Address issues reported by golint 2021-01-30 20:45:57 +01:00
Alexander Neumann
3c753c071c errcheck: More error handling 2021-01-30 20:02:37 +01:00
Alexander Neumann
16313bfcc9 errcheck: Add error check for MergeFinalIndexes() 2021-01-30 20:02:37 +01:00
Alexander Neumann
bdfedf1f5b
Merge pull request #3173 from MichaelEischer/unify-index-loading
Unify index loading
2021-01-28 13:50:42 +01:00
Michael Eischer
e2b0072441 check: add progress bar to the tree structure check 2021-01-28 11:10:50 +01:00
Michael Eischer
258ce0c1e5 parallel: report progress for StreamTrees
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
2021-01-28 11:08:43 +01:00
Michael Eischer
6e03f80ca2 check: Split the parallelized tree loader into a reusable component
The actual code change is minimal
2021-01-28 11:08:43 +01:00
Michael Eischer
1d7bb01a6b check: Cleanup tree loading and switch to use errgroup
The helper methods are now wired up in the Structure method.
2021-01-28 11:08:43 +01:00
Alexander Weiss
2a1add7538 check: remove file size counter 2020-12-23 02:34:31 +01:00
Michael Eischer
96904f8972 check: extract parallel index loading 2020-12-22 22:36:18 +01:00
Alexander Weiss
26f85779be Parallelize ForAllSnapshots 2020-12-06 05:09:58 +01:00
Alexander Weiss
aa7a5f19c2 Use BlobHandle in index methods 2020-11-22 20:41:12 +01:00
Alexander Weiss
a851c53cbe Use PackSize in checker 2020-11-21 22:13:54 +01:00
Alexander Weiss
c3ddde9e7d Return hdrSize in ListPack 2020-11-21 22:13:54 +01:00
Michael Eischer
1f43cac12d check: Only track data blobs when unused blobs should be reported
This improves the memory usage of check a lot as it now only has to
track tree blobs when run using the default parameters.
2020-11-15 18:43:07 +01:00
Michael Eischer
6da66c15d8 check: Simplify referenced blob tracking
The result is identical as long as the context in not canceled. However,
in that case the result is incomplete anyways.
2020-11-15 18:42:55 +01:00
Michael Eischer
3500f9490c check: Simplify blob status tracking
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
2020-11-15 18:42:42 +01:00
Michael Eischer
b8c7543a55 check: Merge 'size could not be found' and 'not found in index' errors
By construction these two errors always show up in pairs: 'size could
not be found' is printed when the blob is not found in the repository
index. That blob is also part of the `blobs` array. Later on, check
iterates over that array and checks whether the blob is marked as
existing. Which cannot be the case as that mark is generated by
iterating over the repository index.

The merged warning no longer reports the blob index within a file. That
information could also be derived by printing the affected tree using
`cat` and searching for the blob.
2020-11-15 18:41:50 +01:00
Alexander Weiss
17bb77b1f9 check: Also check blob length and offset 2020-11-14 00:42:49 +01:00
Alexander Weiss
80dcfca191 check: Check sizes computed from index and pack header 2020-11-14 00:42:49 +01:00
MichaelEischer
46d31ab86d
Merge pull request #3058 from greatroar/counter
Replace restic.Progress with new progress.Counter (fixes two race conditions)
2020-11-09 22:19:09 +01:00
Alexander Weiss
239931578c check: check index for packs that are read 2020-11-09 17:28:14 +01:00
greatroar
21b787a4d1 Stop Counters where they're constructed and started 2020-11-09 13:03:31 +01:00
greatroar
ddca699cd2 Replace restic.Progress with new progress.Counter
This fixes two race conditions while cleaning up the code.
2020-11-09 12:12:35 +01:00
Alexander Weiss
b44ecde8b0 Fix setting of ID in DecodeIndex 2020-10-17 09:12:58 +02:00
MichaelEischer
4ba237bb93
Merge pull request #3019 from greatroar/refactor-decodeindex
Refactor index decoding
2020-10-15 23:22:33 +02:00
greatroar
b27375f5ce defer close(ch) outside repository.RunWorkers 2020-10-14 15:50:16 +02:00
greatroar
27db3ec262 Refactor index decoding
Decoding old-format indices no longer requires loading and decrypting
twice.
2020-10-13 20:47:50 +02:00
Michael Eischer
c458e114d4 pass context to Find / FindSnapshot
This allows proper interruption of restic while it searches for
snapshots or key files.
2020-10-09 22:37:56 +02:00
Michael Eischer
4784540f04 repository: Simplify worker group code 2020-09-05 10:07:16 +02:00
aawsome
0fed6a8dfc
Use "pack file" instead of "data file" (#2885)
- changed variable names, especially changed DataFile into PackFile
- changed in some comments
- always use "pack file" in docu
2020-08-16 11:16:38 +02:00
MichaelEischer
34181b13a2
Merge pull request #2328 from MichaelEischer/no-repeated-checks
Fix duplicate tree checks within `restic check`
2020-07-22 22:08:02 +02:00
Alexander Weiss
e388d962a5 Merge final indexes together for faster index access 2020-07-22 21:54:02 +02:00
Michael Eischer
9b0e718852 checker: Test that blob types are not confused 2020-07-20 23:43:47 +02:00
Michael Eischer
ddf0b8cd0b checker: Properly distinguish between data and tree blobs
If a data blob and a tree blob with the same ID (= same content) exist,
then the checker did not report a data or tree blob as unused when the
blob of the other type was still in use.
2020-07-20 22:58:39 +02:00
Michael Eischer
2d0c138c9b checker: Test that check only decodes trees once
The `DuplicateTree` flag is necessary to ensure that failures cannot be
swallowed. The old checker implementation ignores errors from LoadTree
if the corresponding tree was already checked.
2020-07-20 22:51:53 +02:00
Michael Eischer
ef325ffc02 checker: Cleanup error handling code
This change only moves code around but does not result in any change in
behavior.
2020-07-20 22:51:53 +02:00
Michael Eischer
7a165f32a9 checker: Traverse trees in depth-first order
Backups traverse the file tree in depth-first order and saves trees on
the way back up. This results in tree packs filled in a way comparable
to the reverse Polish notation.  In order to check tree blobs in that
order, the treeFilter would have to delay the forwarding of tree nodes
until all children of it are processed which would complicate the
implementation.

Therefore do the next similar thing and traverse the tree in depth-first
order, but process trees already on the way down. The tree blob ids are
added in reverse order to the backlog, which is once again reverted when
removing the ids from the back of the backlog.
2020-07-20 22:51:53 +02:00
Michael Eischer
36c69e3ca7 checker: Unify blobs, processed trees and referenced blobs map
The blobRefs map and the processedTrees IDSet are merged to reduce the
memory usage. The blobRefs map now uses separate flags to track blob
usage as data or tree blob. This prevents skipping of trees whose
content is identical to an already processed data blob. A third flag
tracks whether a blob exists or not, which removes the need for the
blobs IDSet.
2020-07-20 22:51:47 +02:00
Michael Eischer
35d8413639 checker: Remove dead index map 2020-07-20 22:37:31 +02:00
Michael Eischer
c66a0e408c checker: Reduce cost of debug log
Avoid duplicate allocation of the Subtree list.
2020-07-20 22:37:31 +02:00
Michael Eischer
70f4c014ef checker: Decode identical tree nodes only once
Even though the checkTreeWorker skips already processed chunks,
filterTrees did queue the same tree blob on every occurence. This
becomes a serious performance bottleneck for larger number of snapshots
that cover mostly the same directories. Therefore decode a tree blob
exactly once.
2020-07-20 22:37:31 +02:00
Michael Eischer
f0d8710611 Add benchmark for checker scaling with snapshot count 2020-07-20 22:37:31 +02:00
Alexander Bruyako
da48b925ff remove unnecessary error return
I was running "golangci-lint" and found this two warnings

internal/checker/checker.go:135:18: (*Checker).LoadIndex$3 - result 0 (error) is always nil (unparam)
        final := func() error {
                        ^
internal/repository/repository.go:457:18: (*Repository).LoadIndex$3 - result 0 (error) is always nil (unparam)
        final := func() error {
                        ^

It turns out that these functions are used only in "RunWorkers(...)",
which is used only two times in whole project right after this "final"
functions.
And because these "final" functions always return "nil", I've
descided, that it would be better to remove requriments for "final" func
to return error to avoid magick "return nil" at their end.
2020-01-27 18:28:21 +03:00
Alexander Neumann
7304738872 check: Reduce default parallelism from 40 to 5 2019-04-13 13:38:39 +02:00
Alexander Neumann
66efa425bf Reuse buffer in worker functions 2019-04-13 13:38:39 +02:00
Alexander Neumann
e046428c94 Replace FilesInParallel with an errgroup.Group 2019-04-13 13:38:39 +02:00
Chris Howie
1688713400 Add key hinting (#2097) 2018-11-25 09:13:18 -05:00
Alexander Neumann
bfa18ee8ec DownloadAndHash: Check error returned by Load() 2018-10-28 21:28:56 +01:00
Alexander Neumann
754482fe6c checker: Disable size check for now 2018-07-15 21:52:38 +02:00
Alexander Neumann
38926d8576 Use new archiver code in tests 2018-04-25 14:42:45 +02:00
Alexander Neumann
83ca08245b checker: Check metadata size and blob sizes 2018-04-22 11:37:05 +02:00
Alexander Neumann
1c1fede399 Improve error message for orphaned pack files 2018-04-07 10:07:54 +02:00
Alexander Neumann
e68a7fea8a check: Allow filling the cache during check
Closes #1665
2018-04-01 13:59:27 +02:00
Alexander Neumann
99f7fd74e3 backend: Improve Save()
As mentioned in issue [#1560](https://github.com/restic/restic/pull/1560#issuecomment-364689346)
this changes the signature for `backend.Save()`. It now takes a
parameter of interface type `RewindReader`, so that the backend
implementations or our `RetryBackend` middleware can reset the reader to
the beginning and then retry an upload operation.

The `RewindReader` interface also provides a `Length()` method, which is
used in the backend to get the size of the data to be saved. This
removes several ugly hacks we had to do to pull the size back out of the
`io.Reader` passed to `Save()` before. In the `s3` and `rest` backend
this is actively used.
2018-03-03 15:49:44 +01:00
Igor Fedorenko
07d080830e Add --read-data-subset flag to check command
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
2018-02-18 23:31:27 -05:00
Igor Fedorenko
ab040d8811 Introduced repository.DownloadAndHash helper
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
2018-02-16 21:13:11 -05:00
Igor Fedorenko
d58ae43317 Reworked Backend.Load API to retry errors during ongoing download
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
2018-02-16 21:12:14 -05:00
Alexander Neumann
cccb2fc7e7 Merge pull request #1583 from restic/close-open-backend-files
Close backend files in case of errors
2018-01-26 21:57:28 +01:00
Alexander Neumann
909d9273cc Close backend files in case of errors 2018-01-25 21:05:57 +01:00
Alexander Neumann
663c57ab4d debug: Remove manual Str() call Log() 2018-01-25 20:49:41 +01:00
Alexander Neumann
b0c6e53241 Fix calls to repo/backend.List() everywhere 2018-01-21 21:15:09 +01:00
Igor Fedorenko
231076fa4a checker: Optimize checker.Packs()
Use result of single repository.List() to find both missing and
orphaned data packs. For 500GB repository this eliminates ~100K
repository.Test() calls and improves check time by >30M in my
environment (~45min before this change and ~7min after).

Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
2018-01-18 20:50:39 -05:00
Alexander Neumann
931e6ed2ac Use Seal/Open everywhere 2017-11-01 10:30:40 +01:00
George Armhold
bcdebfb84e small cleanup:
- be explicit when discarding returned errors from .Close(), etc.
- remove named return values from funcs when naked return not used
- fix some "err" shadowing when redeclaration not needed
2017-10-25 12:03:55 -04:00
Tobias Klein
087c3fe1dc tests updated 2017-09-09 13:26:35 +02:00
Alexander Neumann
23c903074c Move restic package to internal/restic 2017-07-24 17:43:32 +02:00
Alexander Neumann
6caeff2408 Run goimports 2017-07-23 14:21:03 +02:00
Alexander Neumann
83d1a46526 Moves files 2017-07-23 14:19:13 +02:00