forked from TrueCloudLab/rclone
HDFS (Hadoop Distributed File System) implementation - #42
This includes an HDFS docker image to use with the integration tests. Co-authored-by: Ivan Andreev <ivandeex@gmail.com> Co-authored-by: Nick Craig-Wood <nick@craig-wood.com>
This commit is contained in:
parent
768e4c4735
commit
71edc75ca6
26 changed files with 906 additions and 0 deletions
|
@ -120,6 +120,7 @@ WebDAV or S3, that work out of the box.)
|
|||
{{< provider name="Google Cloud Storage" home="https://cloud.google.com/storage/" config="/googlecloudstorage/" >}}
|
||||
{{< provider name="Google Drive" home="https://www.google.com/drive/" config="/drive/" >}}
|
||||
{{< provider name="Google Photos" home="https://www.google.com/photos/about/" config="/googlephotos/" >}}
|
||||
{{< provider name="HDFS" home="https://hadoop.apache.org/" config="/hdfs/" >}}
|
||||
{{< provider name="HTTP" home="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol" config="/http/" >}}
|
||||
{{< provider name="Hubic" home="https://hubic.com/" config="/hubic/" >}}
|
||||
{{< provider name="Jottacloud" home="https://www.jottacloud.com/en/" config="/jottacloud/" >}}
|
||||
|
|
|
@ -36,6 +36,7 @@ See the following for detailed instructions for
|
|||
* [Google Cloud Storage](/googlecloudstorage/)
|
||||
* [Google Drive](/drive/)
|
||||
* [Google Photos](/googlephotos/)
|
||||
* [HDFS](/hdfs/)
|
||||
* [HTTP](/http/)
|
||||
* [Hubic](/hubic/)
|
||||
* [Jottacloud / GetSky.no](/jottacloud/)
|
||||
|
|
199
docs/content/hdfs.md
Normal file
199
docs/content/hdfs.md
Normal file
|
@ -0,0 +1,199 @@
|
|||
---
|
||||
title: "HDFS Remote"
|
||||
description: "Remote for Hadoop Distributed Filesystem"
|
||||
---
|
||||
|
||||
{{< icon "fa fa-globe" >}} HDFS
|
||||
-------------------------------------------------
|
||||
|
||||
[HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) is a
|
||||
distributed file-system, part of the [Apache Hadoop](https://hadoop.apache.org/) framework.
|
||||
|
||||
Paths are specified as `remote:` or `remote:path/to/dir`.
|
||||
|
||||
Here is an example of how to make a remote called `remote`. First run:
|
||||
|
||||
rclone config
|
||||
|
||||
This will guide you through an interactive setup process:
|
||||
|
||||
```
|
||||
No remotes found - make a new one
|
||||
n) New remote
|
||||
s) Set configuration password
|
||||
q) Quit config
|
||||
n/s/q> n
|
||||
name> remote
|
||||
Type of storage to configure.
|
||||
Enter a string value. Press Enter for the default ("").
|
||||
Choose a number from below, or type in your own value
|
||||
[skip]
|
||||
XX / Hadoop distributed file system
|
||||
\ "hdfs"
|
||||
[skip]
|
||||
Storage> hdfs
|
||||
** See help for hdfs backend at: https://rclone.org/hdfs/ **
|
||||
|
||||
hadoop name node and port
|
||||
Enter a string value. Press Enter for the default ("").
|
||||
Choose a number from below, or type in your own value
|
||||
1 / Connect to host namenode at port 8020
|
||||
\ "namenode:8020"
|
||||
namenode> namenode.hadoop:8020
|
||||
hadoop user name
|
||||
Enter a string value. Press Enter for the default ("").
|
||||
Choose a number from below, or type in your own value
|
||||
1 / Connect to hdfs as root
|
||||
\ "root"
|
||||
username> root
|
||||
Edit advanced config? (y/n)
|
||||
y) Yes
|
||||
n) No (default)
|
||||
y/n> n
|
||||
Remote config
|
||||
--------------------
|
||||
[remote]
|
||||
type = hdfs
|
||||
namenode = namenode.hadoop:8020
|
||||
username = root
|
||||
--------------------
|
||||
y) Yes this is OK (default)
|
||||
e) Edit this remote
|
||||
d) Delete this remote
|
||||
y/e/d> y
|
||||
Current remotes:
|
||||
|
||||
Name Type
|
||||
==== ====
|
||||
hadoop hdfs
|
||||
|
||||
e) Edit existing remote
|
||||
n) New remote
|
||||
d) Delete remote
|
||||
r) Rename remote
|
||||
c) Copy remote
|
||||
s) Set configuration password
|
||||
q) Quit config
|
||||
e/n/d/r/c/s/q> q
|
||||
```
|
||||
|
||||
This remote is called `remote` and can now be used like this
|
||||
|
||||
See all the top level directories
|
||||
|
||||
rclone lsd remote:
|
||||
|
||||
List the contents of a directory
|
||||
|
||||
rclone ls remote:directory
|
||||
|
||||
Sync the remote `directory` to `/home/local/directory`, deleting any excess files.
|
||||
|
||||
rclone sync -i remote:directory /home/local/directory
|
||||
|
||||
### Setting up your own HDFS instance for testing
|
||||
|
||||
You may start with a [manual setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)
|
||||
or use the docker image from the tests:
|
||||
|
||||
If you want to build the docker image
|
||||
|
||||
```
|
||||
git clone https://github.com/rclone/rclone.git
|
||||
cd rclone/fstest/testserver/images/test-hdfs
|
||||
docker build --rm -t rclone/test-hdfs .
|
||||
```
|
||||
|
||||
Or you can just use the latest one pushed
|
||||
|
||||
```
|
||||
docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs
|
||||
```
|
||||
|
||||
**NB** it need few seconds to startup.
|
||||
|
||||
For this docker image the remote needs to be configured like this:
|
||||
|
||||
```
|
||||
[remote]
|
||||
type = hdfs
|
||||
namenode = 127.0.0.1:8020
|
||||
username = root
|
||||
```
|
||||
|
||||
You can stop this image with `docker kill rclone-hdfs` (**NB** it does not use volumes, so all data
|
||||
uploaded will be lost.)
|
||||
|
||||
### Modified time
|
||||
|
||||
Time accurate to 1 second is stored.
|
||||
|
||||
### Checksum
|
||||
|
||||
No checksums are implemented.
|
||||
|
||||
### Usage information
|
||||
|
||||
You can use the `rclone about remote:` command which will display filesystem size and current usage.
|
||||
|
||||
### Restricted filename characters
|
||||
|
||||
In addition to the [default restricted characters set](/overview/#restricted-characters)
|
||||
the following characters are also replaced:
|
||||
|
||||
| Character | Value | Replacement |
|
||||
| --------- |:-----:|:-----------:|
|
||||
| : | 0x3A | : |
|
||||
|
||||
Invalid UTF-8 bytes will also be [replaced](/overview/#invalid-utf8).
|
||||
|
||||
### Limitations
|
||||
|
||||
- No server-side `Move` or `DirMove`.
|
||||
- Checksums not implemented.
|
||||
|
||||
{{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hdfs/hdfs.go then run make backenddocs" >}}
|
||||
### Standard Options
|
||||
|
||||
Here are the standard options specific to hdfs (Hadoop distributed file system).
|
||||
|
||||
#### --hdfs-namenode
|
||||
|
||||
hadoop name node and port
|
||||
|
||||
- Config: namenode
|
||||
- Env Var: RCLONE_HDFS_NAMENODE
|
||||
- Type: string
|
||||
- Default: ""
|
||||
- Examples:
|
||||
- "namenode:8020"
|
||||
- Connect to host namenode at port 8020
|
||||
|
||||
#### --hdfs-username
|
||||
|
||||
hadoop user name
|
||||
|
||||
- Config: username
|
||||
- Env Var: RCLONE_HDFS_USERNAME
|
||||
- Type: string
|
||||
- Default: ""
|
||||
- Examples:
|
||||
- "root"
|
||||
- Connect to hdfs as root
|
||||
|
||||
### Advanced Options
|
||||
|
||||
Here are the advanced options specific to hdfs (Hadoop distributed file system).
|
||||
|
||||
#### --hdfs-encoding
|
||||
|
||||
This sets the encoding for the backend.
|
||||
|
||||
See: the [encoding section in the overview](/overview/#encoding) for more info.
|
||||
|
||||
- Config: encoding
|
||||
- Env Var: RCLONE_HDFS_ENCODING
|
||||
- Type: MultiEncoder
|
||||
- Default: Slash,Colon,Del,Ctl,InvalidUtf8,Dot
|
||||
|
||||
{{< rem autogenerated options stop >}}
|
|
@ -28,6 +28,7 @@ Here is an overview of the major features of each cloud storage system.
|
|||
| Google Cloud Storage | MD5 | Yes | No | No | R/W |
|
||||
| Google Drive | MD5 | Yes | No | Yes | R/W |
|
||||
| Google Photos | - | No | No | Yes | R |
|
||||
| HDFS | - | Yes | No | No | - |
|
||||
| HTTP | - | No | No | No | R |
|
||||
| Hubic | MD5 | Yes | No | No | R/W |
|
||||
| Jottacloud | MD5 | Yes | Yes | No | R |
|
||||
|
@ -341,6 +342,7 @@ upon backend specific capabilities.
|
|||
| Google Cloud Storage | Yes | Yes | No | No | No | Yes | Yes | No [#2178](https://github.com/rclone/rclone/issues/2178) | No | No |
|
||||
| Google Drive | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
||||
| Google Photos | No | No | No | No | No | No | No | No | No | No |
|
||||
| HDFS | Yes | No | No | No | No | No | Yes | No | Yes | Yes |
|
||||
| HTTP | No | No | No | No | No | No | No | No [#2178](https://github.com/rclone/rclone/issues/2178) | No | Yes |
|
||||
| Hubic | Yes † | Yes | No | No | No | Yes | Yes | No [#2178](https://github.com/rclone/rclone/issues/2178) | Yes | No |
|
||||
| Jottacloud | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes |
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue