Commit graph

523 commits

Author SHA1 Message Date
Connor Findlay
d6e76a22a8 backend/azure: Handle Container SAS/SAT
Ignore AuthorizationFailure caused by using a container level SAS/SAT
token when calling GetProperties during the Create() call. This is because the
GetProperties call expects an Account Level token, and the container
level token simply lacks the appropriate permissions. Supressing the
Authorization Failure is OK, because if the token is actually invalid,
this is caught elsewhere when we try to actually use the token to do
work.
2024-10-18 21:59:03 +02:00
Damien Clark
8c1d6a50c1 cache: fix race condition in cache cleanup
Fix multiple restic processes executing concurrently and racing to remove obsolete snapshots.

Co-authored-by: Michael Eischer <michael.eischer@fau.de>
2024-10-18 21:47:59 +02:00
Michael Eischer
1f4c9d2806 cache: correctly ignore files whose filename is no ID
this can for example be the case for temporary files created by the
backend implementation.
2024-08-31 16:50:06 +02:00
Michael Eischer
8828c76f92 rest: improve handling of HTTP2 goaway
The HTTP client can only retry HTTP2 requests after receiving a GOAWAY
response if it can rewind the body. As we use a custom data type,
explicitly provide an implementation of `GetBody`.
2024-08-30 12:46:07 +02:00
Michael Eischer
64d628bd75 make timeout for slow requests configurable 2024-08-30 12:45:20 +02:00
Michael Eischer
8206cd19c8 backend/retry: don't trip circuit breaker if context is canceled
When the context used for a load operation is canceled, then the result
is always an error independent of whether the file could be retrieved
from the backend. Do not false positively trip the circuit breaker in
this case.

The old behavior was problematic when trying to lock a repository. When
`Lock.checkForOtherLocks` listed multiple lock files in parallel and one
of them fails to load, then all other loads were canceled. This
cancelation was remembered by the circuit breaker, such that locking
retries would fail.
2024-08-30 12:45:20 +02:00
Srigovind Nayak
0ca9355bc0 cache: add test for the automated cache clear to cache backend 2024-08-30 12:43:13 +02:00
Srigovind Nayak
0cf1737289 cache: check for context cancellation before clearing cache 2024-08-30 12:43:13 +02:00
Srigovind Nayak
fac1d9fea1 cache: backend add List method and a cache clear functionality
* removes files which are no longer in the repository, including index files, snapshot files and pack files from the cache.

cache: fix ids set initialisation with NewIDSet()
2024-08-30 12:43:13 +02:00
Michael Eischer
61e1f4a916 backend: return correct error on upload/request timeout 2024-08-30 12:37:10 +02:00
Michael Eischer
fa35e72214 backend: tweak timeouts to make watchdog timeout test less flaky 2024-08-10 19:08:03 +02:00
Michael Eischer
4266dca1b6 cache: fix confusing debug log 2024-08-03 18:51:38 +02:00
Michael Eischer
ae1cb889dd Add more checks for canceled contexts 2024-07-31 19:30:47 +02:00
Michael Eischer
94fdca08c4 return exit code 10 if repository does not exist 2024-07-10 21:46:26 +02:00
Michael Eischer
f74e70cc36 s3: forbid anonymous authentication unless explicitly requested 2024-07-10 20:10:27 +02:00
Michael Eischer
4b364940aa s3: use http client with configured timeouts for s3 IAM communication
The default client has no timeouts configured opening network
connections. Thus, if 169.254.169.254 is inaccessible, then the client
would wait for until the operating system gives up, which will take
several minutes.
2024-07-07 11:32:40 +02:00
Michael Eischer
a2a2401a68 s3: prevent repeated credential queries with anonymous authentication 2024-07-07 11:31:04 +02:00
Viktor Szépe
ac00229386 Fix typos 2024-07-03 20:02:06 +02:00
Michael Eischer
3f878aa8e7
Merge pull request #4845 from greatroar/errors
Fix error handling bug + clean up error messages
2024-06-07 17:07:07 +00:00
Michael Eischer
0a70bbcea5
Merge pull request #4844 from MichaelEischer/improve-timeout-error
backend: Improve timeout error message
2024-06-07 19:05:39 +02:00
Leo R. Lundgren
b2bbbe805f azure: Improve error message in azure.Create() 2024-06-03 23:37:17 +02:00
Michael Eischer
660679c2f6
Merge pull request #4835 from MichaelEischer/fix-list-cancel
backend/retry: do not log final error if context was canceled
2024-06-01 19:17:30 +02:00
Michael Eischer
db2398f35b backend: increase request progress timeout to 5 minutes
Apparently, 2 minutes are too short in some cases and can result in
canceled List requests.
2024-06-01 19:01:51 +02:00
Michael Eischer
6bf3d4859f backend: improve error on http request timeout
Now yields a "request timeout" error instead of "context canceled".
2024-06-01 18:52:39 +02:00
greatroar
4a874000b7 gs: Replace some errors.Wrap calls
The first one in Create is already a WithStack error. The rest were
referencing code that hasn't existed for quite some time. Note that
errors from Google SDKs tends to start with "google:" or "googleapi:".

Also, use restic/internal/errors.
2024-06-01 15:11:06 +02:00
Michael Eischer
38654a3bd7 backend/retry: do not log final error if context was canceled
Calls to `List(ctx, ...)` are usually stopped by canceling the context
once no further entries are required by the caller. Thus, don't log the
final error if the used context was canceled.
2024-05-30 18:48:52 +02:00
Srigovind Nayak
de7b418bbe http: allow custom User-Agent for outgoing HTTP requests 2024-05-30 15:38:06 +02:00
Michael Eischer
8e5d7d719c cache: move to backend package 2024-05-24 23:04:06 +02:00
Michael Eischer
860b595a8b backend: increase watchdog test timeout for deflaking 2024-05-24 21:33:17 +02:00
Michael Eischer
e4a48085ae backend/retry: feature flag new retry behavior 2024-05-24 20:24:02 +02:00
Michael Eischer
98709a4372 retry: reduce total number of retries
Retries in restic try to solve two main problems:
- retry a temporarily failed operation
- tolerate temporary network interruptions

The first problem only requires a few retries, whereas the last one benefits
primarily from spreading the requests over a longer duration.

Increasing the default multiplier and the initial interval works for
both cases. The first few retries only take a few seconds, while later
retries quickly reach the maximum interval of one minute. This ensures
that the total number of retries issued by restic will remain at around
21 retries for a 15 minute period. As the concurrency in restic is
bounded, retries drastically reduce the number of requests sent to a
backend. This helps to prevent overloading the backend.
2024-05-24 20:24:02 +02:00
Michael Eischer
512cd6ef07 retry: ensure that there's always at least one retry
Previously, if an operation failed after 15 minutes, then it would never
be retried. This means that large backend requests are more unreliable
than smaller ones.
2024-05-24 20:24:02 +02:00
Michael Eischer
a60ee9b764 retry: limit retries based on elapsed time not count
Depending on how long an operation takes to fail, the total retry
duration can currently vary between 1.5 and 15 minutes. In particular
for temporarily interrupted network connections, the former timeout is
too short. Thus always use a limit of 15 minutes.
2024-05-24 20:24:02 +02:00
Michael Eischer
a3633cad9e retry: explicitly log failed requests
This simplifies finding the request in the log output that cause an
operation to fail.
2024-05-24 20:24:02 +02:00
Michael Eischer
c56ecec9aa azure: deduplicate cli and default credentials case 2024-05-18 22:15:54 +02:00
Maik Riechert
355f520936 Azure: add option to force use of CLI credential 2024-05-18 22:15:54 +02:00
Michael Eischer
0c1ba6d95d backend: remove unused Location method 2024-05-18 21:38:31 +02:00
Michael Eischer
1d6d3656b0 repository: move backend.LoadAll to repository.LoadRaw
LoadRaw also includes improved context cancellation handling similar to the
implementation in repository.LoadUnpacked.

The removed cache backend test will be added again later on.
2024-05-18 21:26:00 +02:00
Michael Eischer
47232bf8b0 backend: move LimitReadCloser to util package
The helper is only intended for usage by backend implementations.
2024-05-18 21:26:00 +02:00
Michael Eischer
53d15bcd1b retry: add circuit breaker to load method
If a file exhausts its retry attempts, then it is likely not accessible
the next time. Thus, immediately fail all load calls for that file to
avoid useless retries.
2024-05-18 19:59:26 +02:00
Michael Eischer
394c8ca3ed rest/rclone/s3/sftp/swift: move short file detection behind feature gate
These backends tend to use a large variety of server implementations.
Some of those implementations might prove problematic with the new
checks.
2024-05-18 19:59:26 +02:00
Michael Eischer
aeb7eb245c retry: do not retry permanent errors
This is currently gated behind a feature flag as some unexpected
interactions might show up in the wild.
2024-05-18 19:59:26 +02:00
Michael Eischer
bf8cc59889 Use generic backend-error-redesign feature flag instead of http-timeouts
An individual flag for each change of the backend error handling would
be too finegrained. Thus, add a generic flag.
2024-05-18 19:54:52 +02:00
Michael Eischer
4740528a0b backend: add tests for IsPermanentError 2024-05-18 19:54:52 +02:00
Michael Eischer
6a85df7297 backend: add IsPermanentError() method to interface 2024-05-18 19:54:52 +02:00
Michael Eischer
cfc420664a mem: stricter handling of out of bounds requests 2024-05-18 19:54:52 +02:00
Michael Eischer
d40f23e716 azure/b2/gs/s3/swift: adapt cloud backend 2024-05-18 19:54:51 +02:00
Michael Eischer
e793c002ec local: stricter handling of short files 2024-05-18 19:54:21 +02:00
Michael Eischer
b4895ebd76 rest: rework error reporting and report too short files 2024-05-18 19:54:21 +02:00
Michael Eischer
eaa3f81d6b sftp: check for truncated files without an extra backend request 2024-05-18 19:54:21 +02:00