Commit graph

257 commits

Author SHA1 Message Date
Michael Eischer
7266f07c87 repository: StreamPack in parts if there are too large gaps
For large pack sizes we might be only interested in the first and last
blob of a pack file. Thus stream a pack file in multiple parts if the
gaps between requested blobs grow too large.
2022-08-05 23:48:36 +02:00
Michael Eischer
1b076cda97 rename option to --pack-size 2022-08-05 23:47:43 +02:00
Kyle Brennan
1e3f05c3f1 repository: prevent header overfill 2022-08-05 23:47:12 +02:00
Michael Eischer
0a6fa602c8 add option for setting min pack size 2022-08-05 23:47:12 +02:00
Michael Eischer
73053674d9 repository: Test fallback to existing blobs 2022-07-30 17:37:07 +02:00
Michael Eischer
623770eebb repository: try to recover from invalid blob while repacking
If a blob that should be kept is invalid, Repack will now try to request
the blob using LoadBlob. Only return an error if that fails.
2022-07-30 17:37:07 +02:00
MichaelEischer
443cc49afd
Merge pull request #3830 from MichaelEischer/cleanup-repo
Extract Load/SaveTree/JSONUnpacked from repository
2022-07-23 10:46:13 +02:00
Michael Eischer
9729e6d7ef backend: extract readerat from restic package 2022-07-17 15:29:09 +02:00
Michael Eischer
8c11fc3ec9 crypto: move crypto buffer helpers 2022-07-17 13:42:23 +02:00
Michael Eischer
89d3ce852b repository: extract Load/StoreJSONUnpacked
A Load/Store method for each data type is much clearer. As a result the
repository no longer needs a method to load / store json.
2022-07-17 13:22:00 +02:00
Michael Eischer
fbcbd5318c repository: extract LoadTree/SaveTree
The repository has no real idea what a Tree is. So these methods never
belonged there.
2022-07-17 13:11:28 +02:00
Lorenz Bausch
d6e3c7f28e
Wording: change repo to repository 2022-07-08 20:05:35 +02:00
Michael Eischer
6f53ecc1ae adapt workers based on whether an operation is CPU or IO-bound
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.

Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
2022-07-03 12:19:26 +02:00
Michael Eischer
753e56ee29 repository: Limit to a single pending pack file
Use only a single not completed pack file to keep the number of open and
active pack files low. The main change here is to defer hashing the pack
file to the upload step. This prevents the pack assembly step to become
a bottleneck as the only task is now to write data to the temporary pack
file.

The tests are cleaned up to no longer reimplement packer manager
functions.
2022-07-02 22:42:34 +02:00
Michael Eischer
120ccc8754 repository: Rework blob saving to use an async pack uploader
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.

Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.

To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
2022-07-02 22:42:34 +02:00
Michael Eischer
04c23fa95d rebuild-index: correctly rebuild index for mixed packs
For mixed packs, data and tree blobs were stored in separate index
entries. This results in warning from the check command and maybe other
problems.
2022-07-02 19:24:02 +02:00
Michael Eischer
a6e9e08034 Account for pack header overhead at each entry
This will miss the pack header crypto overhead and the length field,
which only amount to a few bytes per pack file.
2022-07-02 18:55:58 +02:00
Alexander Neumann
99634c0936 Return real size from SaveBlob 2022-07-02 18:55:12 +02:00
MichaelEischer
fdc53a9d32
Merge pull request #3787 from MichaelEischer/refactor-repository
repository: (Mostly) index-related cleanups
2022-07-02 18:54:04 +02:00
Michael Eischer
ec7c9ce88b drop unused repository.Loader interface 2022-07-02 18:39:59 +02:00
Michael Eischer
2cd7e90ad1 repository: cleanup 2022-07-02 18:39:59 +02:00
Michael Eischer
c1a8fa4290 repository: remove unused packIDToIndex field 2022-07-02 18:39:59 +02:00
Michael Eischer
e68c3a4e62 repository: simplify CreateIndexFromPacks 2022-07-02 18:39:59 +02:00
Michael Eischer
1974ad7ce2 repository: hide MasterIndex.FinalizeFullIndexes / FinalizeNotFinalIndexes 2022-07-02 18:39:59 +02:00
Michael Eischer
ef53ca4a5a repository: remove MasterIndex.All() 2022-07-02 18:39:59 +02:00
Michael Eischer
bf81bf0795 repository: Properly set id for finalized index
As MergeFinalIndex and index uploads can occur concurrently, it is
necessary for MergeFinalIndex to check whether the IDs for an index were
already set before merging it. Otherwise, we'd loose the ID of an index
which is set _after_ uploading it.
2022-07-02 18:39:59 +02:00
Michael Eischer
e0a7852b8b repository: remove unused (Master)Index.Count 2022-07-02 18:39:58 +02:00
Michael Eischer
8ef2968f28 repository: remove unused index.ListPack 2022-07-02 18:39:12 +02:00
Michael Eischer
e4f20dea61 repository: inline index.encode 2022-07-02 18:39:12 +02:00
Michael Eischer
fe5a8e137a repository: remove unused index.Store 2022-07-02 18:39:12 +02:00
Michael Eischer
628ae799ca repository: make flushPacks private 2022-07-02 18:39:12 +02:00
Michael Eischer
ed8aa15376 repository: add Save method to MasterIndex interface 2022-07-02 18:38:56 +02:00
Michael Eischer
a77d5c4d11 repository: index saving belongs into the MasterIndex 2022-07-02 18:38:56 +02:00
MichaelEischer
2c893fe43c
Merge pull request #3798 from greatroar/errors
all: Move away from pkg/errors, easy cases
2022-06-17 19:01:40 +02:00
greatroar
f92ecf13c9 all: Move away from pkg/errors, easy cases
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:

* check for io.EOF with a straight ==. That value should not be wrapped,
  and the chunker (whose error is checked in the cases changed) does not
  wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
  when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
  alreadyLockedError to match the stdlib convention that error type
  names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
2022-06-14 08:36:38 +02:00
Jayson Wang
f144920ed5 fix handling of maxKeys in SearchKey 2022-06-12 14:19:06 +02:00
greatroar
c9557b2822 internal/repository: Fix LoadBlob + fuzz test
When given a buf that is big enough for a compressed blob but not its
decompressed contents, the copy at the end of LoadBlob would skip the
last part of the contents.

Fixes #3783.
2022-06-06 17:02:28 +02:00
MichaelEischer
b2a2e5f727
Merge pull request #3753 from greatroar/indexmap-alloc
repository: Re-tune indexmap allocation strategy
2022-05-14 15:44:08 +02:00
greatroar
5141228e0c repository: Re-tune indexmap allocation strategy
fd05037e1a changed the allocation batch
size from 256 to 128 under the assumption that an indexEntry is 60 bytes
on amd64, but it's 64: structs are padded out to a multiple of 8 for
alignment reasons. That means we'd waste no space in malloc even without
the batch allocation, at least on 64-bit machines. While that strategy
cuts the overallocation down dramatically for many small indexes, it also
seems to slow allocation down (Go 1.18, Linux, amd64, -benchtime=2s):

    name                   old time/op    new time/op    delta
    DecodeIndex-8             4.67s ± 5%     4.60s ± 1%      ~     (p=0.953 n=10+5)
    DecodeIndexParallel-8     4.67s ± 3%     4.60s ± 1%      ~     (p=0.953 n=10+5)
    IndexHasUnknown-8        37.8ns ± 8%    36.5ns ±14%      ~     (p=0.841 n=5+5)
    IndexHasKnown-8          38.5ns ±12%    37.7ns ±10%      ~     (p=0.968 n=5+5)
    IndexAlloc-8              615ms ±18%     607ms ± 1%      ~     (p=1.000 n=10+5)
    IndexAllocParallel-8      245ms ±11%     285ms ± 6%   +16.40%  (p=0.001 n=10+5)
    MasterIndexAlloc-8        286ms ± 9%     275ms ± 2%      ~     (p=1.000 n=10+5)
    LoadIndex/v1-8           27.0ms ± 4%    26.8ms ± 1%      ~     (p=0.690 n=5+5)
    LoadIndex/v2-8           22.4ms ± 1%    22.8ms ± 2%    +1.48%  (p=0.016 n=5+5)

    name                   old alloc/op   new alloc/op   delta
    IndexAlloc-8              446MB ± 0%     446MB ± 0%    -0.00%  (p=0.000 n=8+4)
    IndexAllocParallel-8      446MB ± 0%     446MB ± 0%    -0.00%  (p=0.008 n=8+5)
    MasterIndexAlloc-8        213MB ± 0%     159MB ± 0%   -25.47%  (p=0.000 n=10+5)

    name                   old allocs/op  new allocs/op  delta
    IndexAlloc-8               913k ± 0%     2632k ± 0%  +188.19%  (p=0.008 n=5+5)
    IndexAllocParallel-8       913k ± 0%     2632k ± 0%  +188.21%  (p=0.008 n=5+5)
    MasterIndexAlloc-8         318k ± 0%     1172k ± 0%  +267.86%  (p=0.008 n=5+5)

Instead, this patch sets a batch size of 4, which means no space is
wasted by malloc on 64-bit and very little on 32-bit. It still gets very
close to the savings from not allocating in batches, without requiring
special code for bits.UintSize==64. Benchmark results, again for
Linux/amd64:

    name                   old time/op    new time/op    delta
    DecodeIndex-8             4.67s ± 5%     4.83s ± 9%     ~     (p=0.315 n=10+10)
    DecodeIndexParallel-8     4.67s ± 3%     4.68s ± 4%     ~     (p=0.315 n=10+10)
    IndexHasUnknown-8        37.8ns ± 8%    44.5ns ±19%     ~     (p=0.095 n=5+5)
    IndexHasKnown-8          38.5ns ±12%    36.9ns ± 8%     ~     (p=0.690 n=5+5)
    IndexAlloc-8              615ms ±18%     628ms ±18%     ~     (p=0.218 n=10+10)
    IndexAllocParallel-8      245ms ±11%     262ms ± 9%   +7.02%  (p=0.043 n=10+10)
    MasterIndexAlloc-8        286ms ± 9%     287ms ±13%     ~     (p=1.000 n=10+10)
    LoadIndex/v1-8           27.0ms ± 4%    26.8ms ± 0%     ~     (p=1.000 n=5+5)
    LoadIndex/v2-8           22.4ms ± 1%    22.5ms ± 0%     ~     (p=0.056 n=5+5)

    name                   old alloc/op   new alloc/op   delta
    IndexAlloc-8              446MB ± 0%     446MB ± 0%     ~     (p=1.000 n=8+10)
    IndexAllocParallel-8      446MB ± 0%     446MB ± 0%   -0.00%  (p=0.000 n=8+8)
    MasterIndexAlloc-8        213MB ± 0%     160MB ± 0%  -25.02%  (p=0.000 n=10+9)

    name                   old allocs/op  new allocs/op  delta
    IndexAlloc-8               913k ± 0%     1333k ± 0%  +45.94%  (p=0.000 n=8+10)
    IndexAllocParallel-8       913k ± 0%     1333k ± 0%  +45.94%  (p=0.000 n=8+8)
    MasterIndexAlloc-8         318k ± 0%      525k ± 0%  +64.99%  (p=0.000 n=10+10)

The allocation method indexmap.newEntry has also been rewritten in a
form that is a few instructions shorter.
2022-05-11 21:22:14 +02:00
MichaelEischer
df554e5f69
Merge pull request #3748 from greatroar/runworkers
repository: Remove RunWorkers, report ctx.Err()
2022-05-11 19:38:46 +02:00
greatroar
2e0f1f5113 repository: Remove RunWorkers, report ctx.Err()
This removes RunWorkers, which had become mere overhead by successive
refactors. It also ensures that each former user of that function
returns any context error that occurs, so failure to complete an
operation is always reported as an error.
2022-05-10 22:26:00 +02:00
Michael Eischer
ae7e51382a Fix error on temp file deletion on windows
Apparently it can take a moment between closing a tempfile marked as
DELETE_ON_CLOSE and it actually being deleted. During that time the file
is inaccessible. Thus just skip deleting the temp file on windows.
2022-05-09 22:43:26 +02:00
Michael Eischer
cf5cb673fb repository: Use existing method to collect pack ids 2022-04-30 19:14:21 +02:00
Michael Eischer
b335cb6285 repository: Refactor index IDs collection 2022-04-30 19:14:21 +02:00
Michael Eischer
4b01b06f2f repository: Test compressed blobs in StreamPack 2022-04-30 11:34:10 +02:00
Michael Eischer
ec2b25565a repository: test uncompressedLength field and index example 2022-04-30 11:34:10 +02:00
Michael Eischer
9ffb8920f1 repository: run blackbox tests using old and new repo version 2022-04-30 11:34:10 +02:00
Michael Eischer
abe5935693 repository: unify repository version-specific initialization
Mark the master index as compressed also when initializing a new
repository. This is only relevant for testing.
2022-04-30 11:34:10 +02:00
Alexander Neumann
8776031f96 Leave allocating slices to the decompress code 2022-04-30 11:34:10 +02:00
Alexander Neumann
5eb05a0afe Configure zstd encoder/decoder 2022-04-30 11:34:10 +02:00