Uploading 100 files of each 1 MB took 20 seconds before. With above fix it takes around 2 seconds now.
10x time improvement in line with pacer's sleep reduction from 100ms to 10ms
Before this change when uploading files bigger than 1TiB, the chunk
calculator would work out that the chunk size needed to be bigger than
the default 100 MiB to fit within the 10,000 parts limit.
However the uploader was still using the memory pool for the old chunk
size and this caused errors like
panic: runtime error: slice bounds out of range [:122683392] with capacity 100663296
The fix for this is to make a temporary pool with the larger chunk
size and use it during the upload of the large file.
See: https://forum.rclone.org/t/rclone-cannot-complete-upload-to-b2-restarts-upload-frequently/35617/
Before this change we were sending webdav requests to the go http
FileServer. In go1.20 these (rightly) started returning errors which
caused the tests to fail.
The test has been changed to properly mock up an About query and
response so an end to end test of adding headers is possible.
Passwords for encrypted libraries are kept in memory in the server
and flushed after an hour.
This MR fixes an issue when the library password expires after 1 hour.
rclone sync erroneously deleted folders renamed to a different case on
crypts where directory name encryption was disabled and the underlying
remote was case insensitive.
Example: Renaming the folder Test to tEST before a sync to a crypt having
remote=OneDrive:crypt and directory_name_encryption=false could result in
the folder and all its content being deleted. The following sync would
correctly create the tEST folder and upload all of the content.
Additional tests have revealed other potential issues when using
filename_encryption=off or directory_name_encryption=false on case
insensitive remotes. The documentation has been updated to warn about
potential problems when using these combinations.
This error was caused by rclone supplying an empty
`x-ms-blob-public-access:` header when creating a container for
private access, rather than omitting it completely.
This is a valid way of specifying containers should be private, but if
the storage account has the flag "Blob public access" unset then it
gives "409 Public access is not permitted on this storage account".
This patch fixes the problem by only supplying the header if the
access is set.
Fixes#6645
Storj switched to a single global s3 endpoint backed by a BGP routing.
We want to stop advertizing the former regional endpoints and have the
global one as the only option.
Before this change, when a new object was created s3 returns its
versionID (on a versioned bucket) and rclone recorded it in the
object.
This means that when rclone came to delete the object it would delete
it with the versionID.
However it is common to forbid actions with versionIDs on buckets so
as to preserve the historical record and these operations would fail
whereas they succeeded in pre-v1.60.0 versions.
This patch fixes the problem by not recording versions of objects
supplied by the S3 API on upload unless `--s3-versions` or
`--s3-version-at` is used. This makes rclone behave as it did before
v1.60.0 when version support was introduced.
See: https://forum.rclone.org/t/s3-and-intermittent-403-errors-with-file-renames-and-drag-and-drop-operations-in-windows-explorer/34773
This patch implements --use-server-modtime for the Azureblob backend.
It does this by not reading the time from the metadata if the global
flag is set.
When the SDK was upgraded it started delivering metadata where the
keys were not in lower case as per the old SDK.
Rclone normalises the case of the keys for storage in the Object, but
the directory marker check was being done with the unnormalised keys
as it needs to be done before the Object is created.
This fixes the directory marker check to do a case insensitive compare
of the metadata keys.
Before this change, we were taking the version ID straight from the
XML blob returned by the SDK and thus pinning the XML into memory
which bulked up the average memory per object from about 400 bytes to
4k.
Copying the string fixes the excess memory usage.
This reverts commit 4f386a1ccd.
It turns out that Alibaba OSS does support list v2 and the detection
code was wrong.
This means that users of the gov version of Alibaba will have to add
`list_version 1` to their config files.
See #6600
In this commit
ab849b3613 s3: fix listing loop when using v2 listing on v1 server
The ContinuationToken was tested for existence, but it is the
NextContinuationToken that we are interested in.
See: #6600
Previously it was limited to plain ASCII (0-9, A-Z, a-z).
Implemented by adding \p{L}\p{N} alongside the \w in the regex,
even though these overlap it means we can be sure it is 100%
backwards compatible.
Fixes#6618
This was caused by
a9bd0c8de6 s3: reduce memory consumption for s3 objects
Which assumed that the StorageClass would always be set, but it isn't
set for Versions.
The updates the authentication to include
- Auth from the environment
1. Environment Variables
2. Managed Service Identity Credentials
3. Azure CLI credentials (as used by the az tool)
- Account and Shared Key
- SAS URL
- Service principal with client secret
- Service principal with certificate
- User with username and password
- Managed Service Identity Credentials
And rationalises the auth order.
Normally rclone will check the container exists before uploading if it
hasn't listed the container yet.
Often rclone will be running with a limited set of permissions which
means rclone can't create the container anyway, so this stops the
check.
This will save a transaction.
This commit switches from using the old Azure go modules
github.com/Azure/azure-pipeline-go/pipeline
github.com/Azure/azure-storage-blob-go/azblob
github.com/Azure/go-autorest/autorest/adal
To the new SDK
github.com/Azure/azure-sdk-for-go/
This stops rclone using deprecated code and enables the full range of
authentication with Azure.
See #6132 and #5284
Before this change, rclone would enter a listing loop if it used v2
listing on a v1 server and the list exceeded 1000 items.
This change detects the problem and gives the user a helpful message.
Fixes#6600
Copying the storageClass string instead of using a pointer to the original string.
This prevents the Go garbage collector from keeping large amounts of
XMLNode structs and references in memory, created by xmlutil.XMLToStruct()
from the aws-sdk-go.
This commit uses the MLST command (where available) to get the status
for single files rather than listing the parent directory and looking
for the file. This makes actions such as using `--files-from` much quicker.
* use getEntry to lookup remote files when supported
* findItem now expects the full path directly
It makes the expected argument similar to the getInfo method, the
difference now is that one is returning a FileInfo whereas
the other is returning an ftp Entry.
Fixes#6225
Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
Before this fix a chain compress -> crypt -> s3 was giving errors
BadDigest: The Content-MD5 you specified did not match what we received.
This was because the crypt backend was encrypting the underlying local
object to calculate the hash rather than the contents of the metadata
stream.
It did this because the crypt backend incorrectly identified the
object as a local object.
This fixes the problem by making sure the crypt backend does not
unwrap anything but fs.OverrideRemote objects.
See: https://forum.rclone.org/t/not-encrypting-or-compressing-before-upload/32261/10
Before this change we were putting connections into the connection
pool which had a local context in.
This meant that when the operation had finished the context was
cancelled and the connection became unusable.
See: https://forum.rclone.org/t/failed-to-sync-context-canceled/34017/
Before this change, when using -server-side-across-configs rclone
would direct Move/Copy/DirMove to the destination server.
However this should be directed to the source server. This is a little
unclear in the RFC, but the name of the parameter "Destination:" seems
clear and this is how dCache and Rucio have implemented it.
See: https://forum.rclone.org/t/webdav-copy-request-implemented-incorrectly/34072/
In this commit
8d1fff9a82 local: obey file filters in listing to fix errors on excluded files
We introduced the concept of local backend filters.
Unfortunately the filters were being applied before we had resolved
the symlink to point to a directory. This meant that symlinks pointing
to directories were filtered out when they shouldn't have been.
This was fixed by moving the filter check until after the symlink had
been resolved.
See: https://forum.rclone.org/t/copy-links-not-following-symlinks-on-1-60-0/34073/7
Fish is different from POSIX-based Unix shells such as bash,
and a bracketed variable references like we use for the
auto-detection echo command is not supported. The command
will return with zero exit code but produce no output on
stdout. There is a message on stderr, but we don't log it
due to the zero exit code:
fish: Variables cannot be bracketed. In fish, please use {$ShellId}.
Fixes#6552
Before this change if we copied files of unknown size, then they lost
their metadata.
This was particularly noticeable using --s3-decompress.
This change adds metadata to Rcat and RcatSized and changes Copy to
pass the metadata in when it calls Rcat for an unknown sized input.
Fixes#6546
Before this change, some files were giving this error when downloaded
from Cloudflare and other providers.
ERROR corrupted on transfer: sizes differ NNN vs MMM
This is because these providers auto gzips the object when rclone
wasn't expecting it to. (AWS does not gzip objects without their being
uploaded gzipped).
This patch adds a quirk to for fix the problem and a flag to control
it. The quirk `might_gzip` is set to `true` for all providers except
AWS.
See: https://forum.rclone.org/t/s3-error-corrupted-on-transfer-sizes-differ-nnn-vs-mmm/33694/Fixes: #6533
Before this fix it was impossible to stop rclone generating an
X-Amx-Acl: header which is incompatible with GCS with uniform access
control and is generally deprecated at AWS.
The API endpoint GetBucketLocation requires
top level permission.
If we do an authenticated head request to a bucket, the bucket location will be returned in the HTTP headers.
Fixes#5066
Before this change if --user-server-modtime was in use the ModTime
could change for an object as we receive it accurate to the nearest ms
in listings, but only accurate to the nearest second in HEAD and GET
requests.
Normally AWS returns the milliseconds as .000 in listings, but if
versions are in use it may not. Storj S3 also seems to return
milliseconds.
This patch tries to keep the maximum precision in the last modified
time, so it doesn't update a last modified time with a truncated
version if the times were the same to the nearest second.
See: https://forum.rclone.org/t/cache-fingerprint-miss-behavior-leading-to-false-positive-stalen-cache/33404/
Before this change rclone used statx() to read the metadata for files
from the local filesystem when `-M` was in use.
Unfortunately statx() was only introduced in kernel 4.11 which was
released in April 2017 so there are current systems (eg Centos 7)
still on kernel versions which don't support statx().
This patch checks to see if statx() is available and if it isn't, it
falls back to using fstatat() which was introduced in Linux 2.6.16
which is guaranteed for all Go versions.
See: https://forum.rclone.org/t/metadata-from-linux-local-s3-failed-to-copy-failed-to-read-metadata-from-source-object-function-not-implemented/33233/
In https://github.com/jlaffaye/ftp/commit/212daf295f the upstream FTP
library changed the way adding your own dialer works which meant that
connections when using explicit FTP were failing.
This patch reworks our connection code to bring it into the
expectations of the library.
Before this fix, if an error ocurred reading the metadata, it could be
set as nil and then used, causing a crash.
This fix changes the readMetadata function so it returns an error, and
the error is always set if the metadata returned is nil.
If mkdir fails then before this change it would have thrown an
error.
After this change, if the error indicated that the directory
already exists then the error is not returned to the user.
This fixes a race condition when two rclone threads are trying to
create the same directory.
In this commit
8d1fff9a82 local: obey file filters in listing to fix errors on excluded files
We started using filters in the local backend so the user could short
circuit troublesome files/directories at a low level.
However this caused a number of integration tests to fail. This turned
out to be in backends wrapping the local backend. For example the
combine backend test failed because it changes the paths passed to the
local backend so they no longer match the paths in the current filter.
To fix this, a new feature flag `FilterAware` was added and the
UseFilter context flag is only passed to backends which support it. As
the wrapping backends don't support the flag, this fixes the problems
in the integration tests.
In future the wrapping backends could modify the active filters to
match the path modifications and then they could set the FilterAware
flag.
See #6376
Before this fix, the chunksize calculator was using the previous size
of the object, not the new size of the object to calculate the chunk
sizes.
This meant that uploading a replacement object which needed a new
chunk size would fail, using too many parts.
This fix fixes the calculator to take the size explicitly.
Before this change, if rclone was run with `-M` on a filesystem
without xattr support, it would error out.
This patch makes rclone detect the not supported errors and disable
xattrs from then on. It prints one ERROR level message about this.
See: https://forum.rclone.org/t/metadata-update-local-s3/32277/7
Before this change, if an object compressed with "Content-Encoding:
gzip" was downloaded, a length and hash mismatch would occur since the
go runtime automatically decompressed the object on download.
If --s3-decompress is set, this change erases the length and hash on
compressed objects so they can be downloaded successfully, at the cost
of not being able to check the length or the hash of the downloaded
object.
If --s3-decompress is not set the compressed files will be downloaded
as-is providing compressed objects with intact size and hash
information.
See #2658
Before this fix, the dropbox backend wasn't decoding the file names
received in changenotify events into rclone standard format.
This meant that changenotify events for filenames which had encoded
characters were failing to be decrypted properly if wrapped in crypt.
See: https://forum.rclone.org/t/rclone-vfs-cache-says-file-name-too-long/31535
Before this patch backends could be shutdown when they fell out of the
cache when they were in use with combine. This was particularly
noticeable with the dropbox backend which gave this error when
uploading files after the backend was Shutdown.
Failed to copy: upload failed: batcher is shutting down
This patch gets the combine remote to pin them until it is finished.
See: https://forum.rclone.org/t/rclone-combine-upload-failed-batcher-is-shutting-down/32168
Previously, with standard auth, the username would be stored in config - but only after
entering the non-standard device/mountpoint sequence during config (a feature introduced
with #5926). Regardless of that, rclone always requests the username from the api at
startup (NewFS).
In #6270 (commit 9dbed02329) this was changed to always
store username in config (consistency), and then also use it to avoid the repeated
customer info request in NewFs (performance). But, as reported in #6309, it did not work
with legacy auth, where user enters username manually, if user entered an email address
instead of the internal username required for api requests. This change was therefore
recently reverted.
The current commit takes another step back to not store the username in config during
the non-standard device/mountpoint config sequence (consistentcy). The username will
now only be stored in config when using legacy auth, where it is an input parameter.
Extend the shouldRetry function by also checking for the quotaExceeded
reason, and since this function appeared to be untested, add a test case
for the existing errors and this new one.
Fixes#615
In
22abd785eb s3: implement reading and writing of metadata #111
The reading information of objects was refactored to use the
s3.HeadObjectOutput structure.
Unfortunately the code branch with `--s3-no-head` was not tested
otherwise this panic would have been discovered.
This shows that this is path is not integration tested, so this adds a
new integration test.
Fixes#6322
`FS.cacheExpiry` is accessed through sync/atomic.
According to the documentation, "On ARM, 386, and 32-bit MIPS, it is
the caller's responsibility to arrange for 64-bit alignment of 64-bit
words accessed atomically. The first word in a variable or in an
allocated struct, array, or slice can be relied upon to be 64-bit
aligned."
Before commit 1d2fe0d856 this field was
aligned, but then a new field was added to the structure, causing the
test suite to panic on linux/386.
No other field is used with sync/atomic, so `cacheExpiry` can just be
placed at the beginning of the stuct to ensure it is always aligned.
By default these will be downloaded compressed.
This changes the default of the previous commit
2781f8e2f1 gcs: Fix download of "Content-Encoding: gzip" compressed objects
But will fit in better with the metadata framework when copying
gzip-encoded objects from backend to backend.
Existing version did save username in config, but only when entering the custom
device/mountpoint sequence in config. Regardless of that, it did always look up the
username at startup with an api request.
This commit improves it so that the username will always be stored in config,
and when using standard authentication it picks it from the login token instead of
requesting it from the remote api, and also in fs constructor it picks it from config
instead of requesting it from remote api (again).
The SDK doesn't wrap errors in a Go standard way so they can't be
unwrapped and tested for - eg fatal error.
The code looks for a Serialization or RequestError and returns the
unwrapped underlying error if possible.
This fixes the fs/operations integration tests checking for fatal
errors being returned.
In this commit
e5974ac4b0 s3: use PutObject from the aws SDK to upload single part objects
rclone was made to upload objects to s3 using PUT requests rather than
using signed uploads.
However this change missed the fact that there is a supported way to
do this in the SDK using the SetStreamingBody method on the Request.
This therefore reverts a lot of the previous commit to do with making
an unsigned connection and other complication and uses the SDK
facility.
Before this fix GetFreeSpace returned math.MaxInt64 for remotes which
don't support reading free space, however this is used in various
comparison routines as a too large value, meaning that remotes of size
math.MaxInt64 were never being selected.
This fixes GetFreeSpace to return math.MaxInt64 - 1 so then can be selected.
It also fixes GetUsedSpace the same way however as the default for not
supported was 0 this was very unlikely to have ever caused a problem.
By default, rclone always requests read and write permissions. No matter what settings you configure in the AAD application. This option allows to explicitly request readonly permissions
Migrated read only option to access scope option and set disable_site_permission option to hidden.
Windows shells like cmd and powershell needs to use different quoting/escaping
of strings and paths than the unix shell, and also absolute paths must be fixed
by removing leading slash that the POSIX formatted paths have
(e.g. /C:/Users does not work in shell, it must be converted to C:/Users).
Tries to autodetect shell type (cmd, powershell, unix) on first use.
Implemented default builtin powershell functions for hashsum and about when remote
shell is powershell.
See #5763Fixes#5758
This adjusts
rclone backend drives -o config drive:
So that it also emits a config section called `AllDrives` which uses
the combine backend to make a backend which combines all the shared
drives into one.
It also makes sure that all the shared drive names are valid rclone
config names, deduplicating if necessary.
Fixes#4506
Before this change, if an object compressed with "Content-Encoding:
gzip" was downloaded, a length and hash mismatch would occur since the
as the go runtime automatically decompressed the object on download.
This change erases the length and hash on compressed objects so they
can be downloaded successfully, at the cost of not being able to check
the length or the hash of the downloaded object.
This also adds the --gcs-download-compressed flag to allow the
compressed files to be downloaded as-is providing compressed objects
with intact size and hash information.
Fixes#2658
Before this fix, if uploading to a union consisting of all bucket
based remotes (eg s3), uploads failed with:
Failed to copy: object not found
This was because the union backend was relying on parent directories
being created to work out which files to upload. If all the upstreams
were bucket based backends which can't hold empty directories, no
directories were created and the upload failed.
This fixes the problem by returning the upstreams used when creating
the directory for the upload, rather than searching for them again
after they've been created.
This will also make the union backend a little more efficient.
Fixes#6170
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The "relative" argument was missing when Put'ing a file. This
sets an incorrect object entry in the cache, leading to the file being
unreadable when using mount functionality.
Fixes#6151
Before this change rclone used presigned requests to upload single
part objects. This was because of a limitation in the SDK which didn't
allow non seekable io.Readers to be passed in.
This is incompatible with some S3 backends, and rclone wasn't adding
the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` header which was
incompatible with other S3 backends.
The SDK now allows for this so rclone can use PutObject directly.
This sets the `X-Amz-Content-Sha256: UNSIGNED-PAYLOAD` flag on the PUT
request. However rclone will add a `Content-Md5` header if at all
possible so the body data is still protected.
Note that the old behaviour can still be configured if required with
the `use_presigned_request` config parameter.
Fixes#5422
Uses b2_list_file_versions to retrieve all file versions, and returns
the one that was active at the specified time
This is especially useful in combination with other backup tools, such
as restic, which may use rclone as a backend.
Jottacloud have several different apis and endpoints using a mix of different
timestamp formats. In existing code the list operation (after the recent liststream
implementation) uses format from golang's time.RFC3339 constant. Uploads (using the
allocate api) uses format from a hard coded constant with value identical to golang's
time.RFC3339. And then we have the classic JFS time format, which is similar to RFC3339
but not identical, using a different constant. Also the naming is a bit confusing,
since the term api is used both as a generic term and also as a reference to the
newer format used in the api subdomain where the allocate endpoint is located.
This commit refactors these things a bit.
Now using the utility function for deduplication that was newly implemented to
fix an issue with server-side copy. This function uses the original, and generic,
"jfs" api (and its "cphash" feature), instead of the newer "allocate" api dedicated
for uploads. Both apis support similar deduplication functionaly that we rely on for
the SetModTime operation. One advantage of using the jfs variant is that the allocate
api is specialized for uploads, an initial request performs modtime-only changes and
deduplication if possible but if not possible it creates an incomplete file revision
and returns a special url to be used with a following request to upload missing content.
In the SetModTime function we only sent the first request, using metadata from existing
remote file but different timestamps, which lead to a modtime-only change. If, for some
reason, this should fail it would leave the incomplete revision behind. Probably not
a problem, but the jfs implementation used with this commit is simpler and
a more "standalone" request which either succeeds or fails without expecting additional
requests.
A strange feature (probably bug) in the api used by the server-side copy implementation
in Jottacloud backend is that if the destination file is in trash, the copy request
succeeds but the destination will still be in trash! When this situation occurs in
rclone, the copy command will fail with "Failed to copy: object not found" because
rclone verifies that the file info in the response from the copy request is valid,
and since it is marked as deleted it is treated as invalid.
This commit works around this problem by looking for this situation in the response
from the copy operation, and send an additional request to a built-in deduplication
endpoint that will restore the file from trash.
Fixes#6112
The existing code in rclone set the value "offline_access+openid",
when encoded in body it will become "offline_access%2Bopenid". I think
this is wrong. Probably an artifact of "double urlencoding" mixup -
either in rclone or in the jottacloud cli tool version it was sniffed
from? It does work, though. The token received will have scopes "email
offline_access" in it, and the same is true if I change to only
sending "offline_access" as scope.
If a proper space delimited list of "offline_access openid"
is used in the request, the response also includes openid scope:
"openid email offline_access". I think this is more correct and this
patch implements this.
See: #6107
Adds a configuration option to the GCS backend to allow skipping the
check if a bucket exists before copying an object to it, much like
f406dbb added for S3.
Before this change the cache backend was passing -1 into
rate.NewLimiter to mean unlimited transactions per second.
In a recent update this immediately returns a rate limit error as
might be expected.
This patch uses rate.Inf as indicated by the docs to signal no limits
are required.
Before this change the 206 responses from putio Range requests were being
returned as errors.
This change checks for 200 and 206 in the GET response now.
Before this fix, rclone retries chunks of multipart uploads. However
if they had been partially received dropbox would reply with an
incorrect_offset error which rclone was ignoring.
This patch parses the new offset from the error response and uses it
to adjust the data that rclone sends so it is the same as what dropbox
is expecting.
See: https://forum.rclone.org/t/dropbox-rate-limiting-for-upload/29779
This commit switches Google Cloud Storage from the drive pacer to the
s3 pacer. The main difference between them is that the s3 pacer does
not limit transactions in the non-error case. This is appropriate for
a cloud storage backend where you pay for each transaction.
Before this fix `NewObject` could return a wrapped `fs.Object(nil)`
which caused a crash. This was caused by `wrapObject` returning a
`nil` `*Object` which was cast into an `fs.Object`.
This changes the interface of `wrapObject` so it returns an
`fs.Object` instead of a `*Object` and an error which must be checked.
This forces the callers to return a `nil` object rather than an
`fs.Object(nil)`.
See: https://forum.rclone.org/t/panic-in-hasher-when-mounting-with-vfs-cache-and-not-synced-data-in-the-cache/29697/11
Having a replace directive in go.mod causes "go get
github.com/rclone/rclone" to fail as it discussed in this Go issue:
https://github.com/golang/go/issues/44840
This is apparently how the Go team want go.mod to work, so this commit
hard forks github.com/jlaffaye/ftp into github.com/rclone/ftp so we
can remove the `replace` directive from the go.mod file.
Fixes#5810