Commit graph

131 commits

Author SHA1 Message Date
Michael Eischer
ab9077bc13 replace usages of backend.Remove() with repository.RemoveUnpacked()
RemoveUnpacked will eventually block removal of all filetypes other than
snapshots. However, getting there requires a major refactor to provide
some components with privileged access.
2024-05-18 21:38:31 +02:00
Michael Eischer
74d90653e0 check: use ReadFull to load pack header in checkPack
This ensures that the pack header is actually read completely.
Previously, for a truncated file it was possible to only read a part of
the header, as backend.Load(...) is not guaranteed to return as many
bytes as requested by the length parameter.
2024-05-18 21:28:54 +02:00
Michael Eischer
8f8d872a68 fix compatibility with go 1.19 2024-05-18 21:28:54 +02:00
Michael Eischer
ff0744b3af check: test checkPack retries 2024-05-18 21:28:54 +02:00
Michael Eischer
3ff063e913 check: verify pack a second time if broken 2024-05-18 21:28:54 +02:00
Michael Eischer
e401af07b2 check: fix error message formatting 2024-05-18 21:28:54 +02:00
Michael Eischer
ffe5439149
Merge pull request #4605 from MichaelEischer/better-restorer-error-handling
Rework repository.StreamPacks & better restorer error handling
2024-05-01 16:37:41 +02:00
Michael Eischer
940a3159b5 let index.Each() and pack.Size() return error on canceled context
This forces a caller to actually check that the function did complete.
2024-04-22 22:39:32 +02:00
Michael Eischer
666a0b0bdb repository: streamPack: replace streaming with chunked download
Due to the interface of streamPack, we cannot guarantee that operations
progress fast enough that the underlying connections remains open. This
introduces partial failures which massively complicate the error
handling.

Switch to a simpler approach that retrieves the pack in chunks of 32MB.
If a blob is larger than this limit, then it is downloaded separately.

To avoid multiple copies in memory, an auxiliary interface
`discardReader` is introduced that allows directly accessing the
downloaded byte slices, while still supporting the streaming used by the
`check` command.
2024-04-22 21:21:23 +02:00
Michael Eischer
7ba5e95a82 check: allow tests to only verify pack&index integrity 2024-04-14 13:45:04 +02:00
Michael Eischer
dc441c57a7 repository: unify repository initialization in tests
Tests should use a helper from internal/repository/testing.go to
construct a Repository object.
2024-03-28 23:17:02 +01:00
Michael Eischer
ed4a4f8748 check: exclude inaccessible files from the repair pack suggestion 2024-02-12 20:25:15 +01:00
Michael Eischer
4073299a7c check: fix missing error if blob is invalid 2024-02-12 20:20:13 +01:00
Michael Eischer
22a3cea1b3 checker: wrap all pack errors in ErrPackData 2024-02-12 20:19:32 +01:00
Alexander Neumann
c0514dd8ba Fix linter errors (except for tests) 2024-02-10 22:58:10 +01:00
Michael Eischer
246559e654 check: cleanup s3 legacy detection 2024-01-27 13:02:04 +01:00
Michael Eischer
bfb56b78e1 replace some usages of restic.Repository with more specific interface
This should eventually make it easier to test the code.
2024-01-27 13:02:02 +01:00
Michael Eischer
22d0c3f8dc check: Use PackBlobIterator instead of StreamPack
To only stream the content of a pack file once, check used StreamPack
with a custom pack load function. This combination was always brittle
and complicates using StreamPack everywhere else. Now that StreamPack
internally uses PackBlobIterator use that primitive instead, which is a
much better fit for what the check command requires.
2024-01-19 21:40:36 +01:00
Andrea Gelmini
241916d55b
Fix typos 2023-12-06 13:11:55 +01:00
Michael Eischer
c7b770eb1f convert MemorizeList to be repository based
Ideally, code that uses a repository shouldn't directly interact with
the underlying backend. Thus, move MemorizeList one layer up.
2023-10-25 23:01:35 +02:00
Michael Eischer
1b8a67fe76 move Backend interface to backend package 2023-10-25 23:00:18 +02:00
Michael Eischer
a28940ea29 check: Suggest usage of restic repair packs for corrupted blobs
For now, the guide is only shown if the blob content does not match its
hash. The main intended usage is to handle data corruption errors when
using maximum compression in restic 0.16.0
2023-10-23 18:36:28 +02:00
Michael Eischer
91aef00df3 check: add index loading progress bar 2023-10-01 19:55:29 +02:00
Michael Eischer
3fd0ad7448 repository: list index files only once 2023-10-01 19:53:26 +02:00
Michael Eischer
c0627dc80d check: Fix flaky TestCheckerModifiedData
The test had a 4% chance of not modified the data read from the
repository, in which case the test would fail. Change the data
manipulation to just modified each read operation.
2023-05-01 17:18:19 +02:00
Michael Eischer
90fb6f70b4
Merge pull request #4089 from greatroar/errors
Clean up error handling further
2022-12-24 10:41:56 +01:00
greatroar
1678392a6d checker: Make ErrLegacyLayout a value, not a type 2022-12-17 09:41:07 +01:00
greatroar
c0b5ec55ab repository: Remove empty cleanup functions in tests
TestRepository and its variants always returned no-op cleanup functions.
If they ever do need to do cleanup, using testing.T.Cleanup is easier
than passing these functions around.
2022-12-11 11:06:25 +01:00
Michael Eischer
ff7ef5007e Replace most usages of ioutil with the underlying function
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.

This commit only mechanically replaces the function calls.
2022-12-02 19:36:43 +01:00
Michael Eischer
a3113c6097 restic: Change FindSnapshot functions to return the snapshot 2022-10-15 13:34:04 +02:00
greatroar
09c14f33c8 internal/checker: Pass Error.Error pointer receiver 2022-10-14 14:13:32 +02:00
Michael Eischer
2e3f1c08c5 repository: split index into a separate package 2022-10-08 21:15:34 +02:00
Michael Eischer
ddcf549eba repository: remove IsMixedPack and add replacement for checker
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.

The checker gets a simpler replacement.
2022-10-03 12:03:59 +02:00
Michael Eischer
1ebd57247a repository: optimize MasterIndex.Each
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.

> name                old time/op  new time/op  delta
> MasterIndexEach-16   6.68s ±24%   0.96s ± 2%  -85.64%  (p=0.008 n=5+5)
2022-09-24 12:21:59 +02:00
Michael Eischer
0a6fa602c8 add option for setting min pack size 2022-08-05 23:47:12 +02:00
Michael Eischer
04e49924fb checker: Fix S3 legacy layout detection 2022-07-23 11:19:32 +02:00
Michael Eischer
fcb3ddf181 check: Complain about usage of s3 legacy layout 2022-07-23 11:19:32 +02:00
Michael Eischer
8b8bd4e8ac check: complain about mixed pack files 2022-07-23 11:19:32 +02:00
Michael Eischer
89d3ce852b repository: extract Load/StoreJSONUnpacked
A Load/Store method for each data type is much clearer. As a result the
repository no longer needs a method to load / store json.
2022-07-17 13:22:00 +02:00
Michael Eischer
fbcbd5318c repository: extract LoadTree/SaveTree
The repository has no real idea what a Tree is. So these methods never
belonged there.
2022-07-17 13:11:28 +02:00
Michael Eischer
6f53ecc1ae adapt workers based on whether an operation is CPU or IO-bound
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.

Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
2022-07-03 12:19:26 +02:00
Michael Eischer
120ccc8754 repository: Rework blob saving to use an async pack uploader
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.

Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.

To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
2022-07-02 22:42:34 +02:00
Michael Eischer
5e0f1c3cef check: remove dead code 2022-07-02 19:28:57 +02:00
Michael Eischer
0df022fa6d check: Print full ids
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
2022-07-02 19:28:57 +02:00
Alexander Neumann
99634c0936 Return real size from SaveBlob 2022-07-02 18:55:12 +02:00
MichaelEischer
fdc53a9d32
Merge pull request #3787 from MichaelEischer/refactor-repository
repository: (Mostly) index-related cleanups
2022-07-02 18:54:04 +02:00
Michael Eischer
a77d5c4d11 repository: index saving belongs into the MasterIndex 2022-07-02 18:38:56 +02:00
greatroar
a0fa9c6e9f Revert "restic prune: Merge three loops over the index"
This reverts commit 8bdfcf779f.
Should fix #3809. Also needed to make #3290 apply cleanly.
2022-06-30 15:27:34 +02:00
MichaelEischer
19581dbc18
Merge pull request #3786 from greatroar/prune
restic prune: Merge three loops over the index
2022-06-18 16:54:50 +02:00
greatroar
8bdfcf779f restic prune: Merge three loops over the index
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.

Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
2022-06-18 10:40:33 +02:00