docs: add faq section explaining why rclone changes fullwidth characters in file names
This commit is contained in:
parent
33bf9b4923
commit
91cdaffcc1
3 changed files with 121 additions and 22 deletions
|
@ -209,3 +209,21 @@ The most common cause of rclone using lots of memory is a single
|
||||||
directory with thousands or millions of files in. Rclone has to load
|
directory with thousands or millions of files in. Rclone has to load
|
||||||
this entirely into memory as rclone objects. Each rclone object takes
|
this entirely into memory as rclone objects. Each rclone object takes
|
||||||
0.5k-1k of memory.
|
0.5k-1k of memory.
|
||||||
|
|
||||||
|
### Rclone changes fullwidth Unicode punctuation marks in file names
|
||||||
|
|
||||||
|
For example: On a Windows system, you have a file with name `Test:1.jpg`,
|
||||||
|
where `:` is the Unicode fullwidth colon symbol. When using rclone
|
||||||
|
to copy this to your Google Drive, you will notice that the file
|
||||||
|
gets renamed to `Test:1.jpg`, where `:` is the regular (halfwidth) colon.
|
||||||
|
|
||||||
|
The reason for such renames is the way rclone handles different
|
||||||
|
[restricted filenames](/overview/#restricted-filenames) on different
|
||||||
|
cloud storage systems. It tries to avoid ambiguous file names as
|
||||||
|
much and allow moving files between many cloud storage systems
|
||||||
|
transparently, by replacing invalid characters with similar looking
|
||||||
|
Unicode characters when transferring to one storage system, and replacing
|
||||||
|
back again when transferring to a different storage system where the
|
||||||
|
original characters are supported. When the same Unicode characters
|
||||||
|
are intentionally used in file names, this replacement strategy leads
|
||||||
|
to unwanted renames. Read more [here](/overview/#restricted-filenames-caveats).
|
||||||
|
|
|
@ -46,6 +46,10 @@ Local file system at .: Replacing invalid UTF-8 characters in "gro\xdf"
|
||||||
|
|
||||||
#### Restricted characters
|
#### Restricted characters
|
||||||
|
|
||||||
|
With the local backend, restrictions on the characters that are usable in
|
||||||
|
file or directory names depend on the operating system. To check what
|
||||||
|
rclone will replace by default on your system, run `rclone help flags local-encoding`.
|
||||||
|
|
||||||
On non Windows platforms the following characters are replaced when
|
On non Windows platforms the following characters are replaced when
|
||||||
handling file names.
|
handling file names.
|
||||||
|
|
||||||
|
|
|
@ -138,7 +138,8 @@ Some cloud storage systems might have restrictions on the characters
|
||||||
that are usable in file or directory names.
|
that are usable in file or directory names.
|
||||||
When `rclone` detects such a name during a file upload, it will
|
When `rclone` detects such a name during a file upload, it will
|
||||||
transparently replace the restricted characters with similar looking
|
transparently replace the restricted characters with similar looking
|
||||||
Unicode characters.
|
Unicode characters. To handle the different sets of restricted characters
|
||||||
|
for different backends, rclone uses something it calls [encoding](#encoding).
|
||||||
|
|
||||||
This process is designed to avoid ambiguous file names as much as
|
This process is designed to avoid ambiguous file names as much as
|
||||||
possible and allow to move files between many cloud storage systems
|
possible and allow to move files between many cloud storage systems
|
||||||
|
@ -150,14 +151,60 @@ to ensure correct formatting and not necessarily the actual name used
|
||||||
on the cloud storage.
|
on the cloud storage.
|
||||||
|
|
||||||
This transformation is reversed when downloading a file or parsing
|
This transformation is reversed when downloading a file or parsing
|
||||||
`rclone` arguments.
|
`rclone` arguments. For example, when uploading a file named `my file?.txt`
|
||||||
For example, when uploading a file named `my file?.txt` to Onedrive
|
to Onedrive, it will be displayed as `my file?.txt` on the console, but
|
||||||
will be displayed as `my file?.txt` on the console, but stored as
|
stored as `my file?.txt` to Onedrive (the `?` gets replaced by the similar
|
||||||
`my file?.txt` (the `?` gets replaced by the similar looking `?`
|
looking `?` character, the so-called "fullwidth question mark").
|
||||||
character) to Onedrive.
|
The reverse transformation allows to read a file `unusual/name.txt`
|
||||||
The reverse transformation allows to read a file`unusual/name.txt`
|
from Google Drive, by passing the name `unusual/name.txt` on the command line
|
||||||
from Google Drive, by passing the name `unusual/name.txt` (the `/` needs
|
(the `/` needs to be replaced by the similar looking `/` character).
|
||||||
to be replaced by the similar looking `/` character) on the command line.
|
|
||||||
|
#### Caveats {#restricted-filenames-caveats}
|
||||||
|
|
||||||
|
The filename encoding system works well in most cases, at least
|
||||||
|
where file names are written in English or similar languages.
|
||||||
|
You might not even notice it: It just works. In some cases it may
|
||||||
|
lead to issues, though. E.g. when file names are written in Chinese,
|
||||||
|
or Japanese, where it is always the Unicode fullwidth variants of the
|
||||||
|
punctuation marks that are used.
|
||||||
|
|
||||||
|
On Windows, the characters `:`, `*` and `?` are examples of restricted
|
||||||
|
characters. If these are used in filenames on a remote that supports it,
|
||||||
|
Rclone will transparently convert them to their fullwidth Unicode
|
||||||
|
variants `*`, `?` and `:` when downloading to Windows, and back again
|
||||||
|
when uploading. This way files with names that are not allowed on Windows
|
||||||
|
can still be stored.
|
||||||
|
|
||||||
|
However, if you have files on your Windows system originally with these same
|
||||||
|
Unicode characters in their names, they will be included in the same conversion
|
||||||
|
process. E.g. if you create a file in your Windows filesystem with name
|
||||||
|
`Test:1.jpg`, where `:` is the Unicode fullwidth colon symbol, and use
|
||||||
|
rclone to upload it to Google Drive, which supports regular `:` (halfwidth
|
||||||
|
question mark), rclone will replace the fullwidth `:` with the
|
||||||
|
halfwidth `:` and store the file as `Test:1.jpg` in Google Drive. Since
|
||||||
|
both Windows and Google Drive allows the name `Test:1.jpg`, it would
|
||||||
|
probably be better if rclone just kept the name as is in this case.
|
||||||
|
|
||||||
|
With the opposite situation; if you have a file named `Test:1.jpg`,
|
||||||
|
in your Google Drive, e.g. uploaded from a Linux system where `:` is valid
|
||||||
|
in file names. Then later use rclone to copy this file to your Windows
|
||||||
|
computer you will notice that on your local disk it gets renamed
|
||||||
|
to `Test:1.jpg`. The original filename is not legal on Windows, due to
|
||||||
|
the `:`, and rclone therefore renames it to make the copy possible.
|
||||||
|
That is all good. However, this can also lead to an issue: If you already
|
||||||
|
had a *different* file named `Test:1.jpg` on Windows, and then use rclone
|
||||||
|
to copy either way. Rclone will then treat the file originally named
|
||||||
|
`Test:1.jpg` on Google Drive and the file originally named `Test:1.jpg`
|
||||||
|
on Windows as the same file, and replace the contents from one with the other.
|
||||||
|
|
||||||
|
Its virtually impossible to handle all cases like these correctly in all
|
||||||
|
situations, but by customizing the [encoding option](#encoding), changing the
|
||||||
|
set of characters that rclone should convert, you should be able to
|
||||||
|
create a configuration that works well for your specific situation.
|
||||||
|
See also the [example](/overview/#encoding-example-windows) below.
|
||||||
|
|
||||||
|
(Windows was used as an example of a file system with many restricted
|
||||||
|
characters, and Google drive a storage system with few.)
|
||||||
|
|
||||||
#### Default restricted characters {#restricted-characters}
|
#### Default restricted characters {#restricted-characters}
|
||||||
|
|
||||||
|
@ -230,7 +277,7 @@ names in a different encoding than UTF-8 or UTF-16, like latin1. See the
|
||||||
|
|
||||||
#### Encoding option {#encoding}
|
#### Encoding option {#encoding}
|
||||||
|
|
||||||
Most backends have an encoding options, specified as a flag
|
Most backends have an encoding option, specified as a flag
|
||||||
`--backend-encoding` where `backend` is the name of the backend, or as
|
`--backend-encoding` where `backend` is the name of the backend, or as
|
||||||
a config parameter `encoding` (you'll need to select the Advanced
|
a config parameter `encoding` (you'll need to select the Advanced
|
||||||
config in `rclone config` to see it).
|
config in `rclone config` to see it).
|
||||||
|
@ -240,17 +287,17 @@ such a way as to preserve the maximum number of characters (see
|
||||||
above).
|
above).
|
||||||
|
|
||||||
However this can be incorrect in some scenarios, for example if you
|
However this can be incorrect in some scenarios, for example if you
|
||||||
have a Windows file system with characters such as `*` and `?` that
|
have a Windows file system with Unicode fullwidth characters
|
||||||
you want to remain as those characters on the remote rather than being
|
`*`, `?` or `:`, that you want to remain as those characters on the
|
||||||
translated to `*` and `?`.
|
remote rather than being translated to regular (halfwidth) `*`, `?` and `:`.
|
||||||
|
|
||||||
The `--backend-encoding` flags allow you to change that. You can
|
The `--backend-encoding` flags allow you to change that. You can
|
||||||
disable the encoding completely with `--backend-encoding None` or set
|
disable the encoding completely with `--backend-encoding None` or set
|
||||||
`encoding = None` in the config file.
|
`encoding = None` in the config file.
|
||||||
|
|
||||||
Encoding takes a comma separated list of encodings. You can see the
|
Encoding takes a comma separated list of encodings. You can see the
|
||||||
list of all available characters by passing an invalid value to this
|
list of all possible values by passing an invalid value to this
|
||||||
flag, e.g. `--local-encoding "help"` and `rclone help flags encoding`
|
flag, e.g. `--local-encoding "help"`. The command `rclone help flags encoding`
|
||||||
will show you the defaults for the backends.
|
will show you the defaults for the backends.
|
||||||
|
|
||||||
| Encoding | Characters |
|
| Encoding | Characters |
|
||||||
|
@ -263,7 +310,7 @@ will show you the defaults for the backends.
|
||||||
| Ctl | All control characters 0x00-0x1F |
|
| Ctl | All control characters 0x00-0x1F |
|
||||||
| Del | DEL 0x7F |
|
| Del | DEL 0x7F |
|
||||||
| Dollar | `$` |
|
| Dollar | `$` |
|
||||||
| Dot | `.` |
|
| Dot | `.` or `..` as entire string |
|
||||||
| DoubleQuote | `"` |
|
| DoubleQuote | `"` |
|
||||||
| Hash | `#` |
|
| Hash | `#` |
|
||||||
| InvalidUtf8 | An invalid UTF-8 character (e.g. latin1) |
|
| InvalidUtf8 | An invalid UTF-8 character (e.g. latin1) |
|
||||||
|
@ -283,6 +330,8 @@ will show you the defaults for the backends.
|
||||||
| Slash | `/` |
|
| Slash | `/` |
|
||||||
| SquareBracket | `[`, `]` |
|
| SquareBracket | `[`, `]` |
|
||||||
|
|
||||||
|
##### Encoding example: FTP
|
||||||
|
|
||||||
To take a specific example, the FTP backend's default encoding is
|
To take a specific example, the FTP backend's default encoding is
|
||||||
|
|
||||||
--ftp-encoding "Slash,Del,Ctl,RightSpace,Dot"
|
--ftp-encoding "Slash,Del,Ctl,RightSpace,Dot"
|
||||||
|
@ -300,14 +349,42 @@ to the existing ones, giving:
|
||||||
|
|
||||||
This can be specified using the `--ftp-encoding` flag or using an `encoding` parameter in the config file.
|
This can be specified using the `--ftp-encoding` flag or using an `encoding` parameter in the config file.
|
||||||
|
|
||||||
Or let's say you have a Windows server but you want to preserve `*`
|
##### Encoding example: Windows
|
||||||
and `?`, you would then have this as the encoding (the Windows
|
|
||||||
encoding minus `Asterisk` and `Question`).
|
|
||||||
|
|
||||||
Slash,LtGt,DoubleQuote,Colon,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot
|
As a nother example, take a Windows system where there is a file with
|
||||||
|
name `Test:1.jpg`, where `:` is the Unicode fullwidth colon symbol.
|
||||||
|
When using rclone to copy this to a remote which supports `:`,
|
||||||
|
the regular (halfwidth) colon (such as Google Drive), you will notice
|
||||||
|
that the file gets renamed to `Test:1.jpg`.
|
||||||
|
|
||||||
This can be specified using the `--local-encoding` flag or using an
|
To avoid this you can change the set of characters rclone should convert
|
||||||
`encoding` parameter in the config file.
|
for the local filesystem, using command-line argument `--local-encoding`.
|
||||||
|
Rclone's default behavior on Windows corresponds to
|
||||||
|
|
||||||
|
```
|
||||||
|
--local-encoding "Slash,LtGt,DoubleQuote,Colon,Question,Asterisk,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot"
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to use fullwidth characters `:`, `*` and `?` in your filenames
|
||||||
|
without rclone changing them when uploading to a remote, then set the same as
|
||||||
|
the default value but without `Colon,Question,Asterisk`:
|
||||||
|
|
||||||
|
```
|
||||||
|
--local-encoding "Slash,LtGt,DoubleQuote,Pipe,BackSlash,Ctl,RightSpace,RightPeriod,InvalidUtf8,Dot"
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, you can disable the conversion of any characters with `--local-encoding None`.
|
||||||
|
|
||||||
|
Instead of using command-line argument `--local-encoding`, you may also set it
|
||||||
|
as [environment variable](/docs/#environment-variables) `RCLONE_LOCAL_ENCODING`,
|
||||||
|
or [configure](/docs/#configure) a remote of type `local` in your config,
|
||||||
|
and set the `encoding` option there.
|
||||||
|
|
||||||
|
The risk by doing this is that if you have a filename with the regular (halfwidth)
|
||||||
|
`:`, `*` and `?` in your cloud storage, and you try to download
|
||||||
|
it to your Windows filesystem, this will fail. These characters are not
|
||||||
|
valid in filenames on Windows, and you have told rclone not to work around
|
||||||
|
this by converting them to valid fullwidth variants.
|
||||||
|
|
||||||
### MIME Type ###
|
### MIME Type ###
|
||||||
|
|
||||||
|
|
Loading…
Add table
Reference in a new issue