backend/internetarchive: add support for Internet Archive

This adds support for Internet Archive (archive.org) Items.
This commit is contained in:
Lesmiscore 2022-04-12 18:38:44 +09:00 committed by Nick Craig-Wood
parent 211dbe9aee
commit 598364ad0f
11 changed files with 1382 additions and 0 deletions

View file

@ -42,6 +42,7 @@ Rclone *("rsync for cloud storage")* is a command-line program to sync files and
* HDFS (Hadoop Distributed Filesystem) [:page_facing_up:](https://rclone.org/hdfs/) * HDFS (Hadoop Distributed Filesystem) [:page_facing_up:](https://rclone.org/hdfs/)
* HTTP [:page_facing_up:](https://rclone.org/http/) * HTTP [:page_facing_up:](https://rclone.org/http/)
* Hubic [:page_facing_up:](https://rclone.org/hubic/) * Hubic [:page_facing_up:](https://rclone.org/hubic/)
* Internet Archive [:page_facing_up:](https://rclone.org/internetarchive/)
* Jottacloud [:page_facing_up:](https://rclone.org/jottacloud/) * Jottacloud [:page_facing_up:](https://rclone.org/jottacloud/)
* IBM COS S3 [:page_facing_up:](https://rclone.org/s3/#ibm-cos-s3) * IBM COS S3 [:page_facing_up:](https://rclone.org/s3/#ibm-cos-s3)
* Koofr [:page_facing_up:](https://rclone.org/koofr/) * Koofr [:page_facing_up:](https://rclone.org/koofr/)

View file

@ -22,6 +22,7 @@ import (
_ "github.com/rclone/rclone/backend/hdfs" _ "github.com/rclone/rclone/backend/hdfs"
_ "github.com/rclone/rclone/backend/http" _ "github.com/rclone/rclone/backend/http"
_ "github.com/rclone/rclone/backend/hubic" _ "github.com/rclone/rclone/backend/hubic"
_ "github.com/rclone/rclone/backend/internetarchive"
_ "github.com/rclone/rclone/backend/jottacloud" _ "github.com/rclone/rclone/backend/jottacloud"
_ "github.com/rclone/rclone/backend/koofr" _ "github.com/rclone/rclone/backend/koofr"
_ "github.com/rclone/rclone/backend/local" _ "github.com/rclone/rclone/backend/local"

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,17 @@
// Test internetarchive filesystem interface
package internetarchive_test
import (
"testing"
"github.com/rclone/rclone/backend/internetarchive"
"github.com/rclone/rclone/fstest/fstests"
)
// TestIntegration runs integration tests against the remote
func TestIntegration(t *testing.T) {
fstests.Run(t, &fstests.Opt{
RemoteName: "TestIA:lesmi-rclone-test/",
NilObject: (*internetarchive.Object)(nil),
})
}

View file

@ -48,6 +48,7 @@ docs = [
"hdfs.md", "hdfs.md",
"http.md", "http.md",
"hubic.md", "hubic.md",
"internetarchive.md",
"jottacloud.md", "jottacloud.md",
"koofr.md", "koofr.md",
"mailru.md", "mailru.md",

View file

@ -127,6 +127,7 @@ WebDAV or S3, that work out of the box.)
{{< provider name="HDFS" home="https://hadoop.apache.org/" config="/hdfs/" >}} {{< provider name="HDFS" home="https://hadoop.apache.org/" config="/hdfs/" >}}
{{< provider name="HTTP" home="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol" config="/http/" >}} {{< provider name="HTTP" home="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol" config="/http/" >}}
{{< provider name="Hubic" home="https://hubic.com/" config="/hubic/" >}} {{< provider name="Hubic" home="https://hubic.com/" config="/hubic/" >}}
{{< provider name="Internet Archive" home="https://archive.org/" config="/internetarchive/" >}}
{{< provider name="Jottacloud" home="https://www.jottacloud.com/en/" config="/jottacloud/" >}} {{< provider name="Jottacloud" home="https://www.jottacloud.com/en/" config="/jottacloud/" >}}
{{< provider name="IBM COS S3" home="http://www.ibm.com/cloud/object-storage" config="/s3/#ibm-cos-s3" >}} {{< provider name="IBM COS S3" home="http://www.ibm.com/cloud/object-storage" config="/s3/#ibm-cos-s3" >}}
{{< provider name="Koofr" home="https://koofr.eu/" config="/koofr/" >}} {{< provider name="Koofr" home="https://koofr.eu/" config="/koofr/" >}}

View file

@ -50,6 +50,7 @@ See the following for detailed instructions for
* [HDFS](/hdfs/) * [HDFS](/hdfs/)
* [HTTP](/http/) * [HTTP](/http/)
* [Hubic](/hubic/) * [Hubic](/hubic/)
* [Internet Archive](/internetarchive/)
* [Jottacloud](/jottacloud/) * [Jottacloud](/jottacloud/)
* [Koofr](/koofr/) * [Koofr](/koofr/)
* [Mail.ru Cloud](/mailru/) * [Mail.ru Cloud](/mailru/)

View file

@ -0,0 +1,222 @@
---
title: "Internet Archive"
description: "Rclone docs for Internet Archive"
---
# {{< icon "fa fa-archive" >}} Internet Archive
The Internet Archive backend utilizes Items on [archive.org](https://archive.org/)
Refer to [IAS3 API documentation](https://archive.org/services/docs/api/ias3.html) for the API this backend uses.
Paths are specified as `remote:bucket` (or `remote:` for the `lsd`
command.) You may put subdirectories in too, e.g. `remote:item/path/to/dir`.
Once you have made a remote (see the provider specific section above)
you can use it like this:
Unlike S3, listing up all items uploaded by you isn't supported.
Make a new item
rclone mkdir remote:item
List the contents of a item
rclone ls remote:item
Sync `/home/local/directory` to the remote item, deleting any excess
files in the item.
rclone sync -i /home/local/directory remote:item
## Notes
Because of Internet Archive's architecture, it enqueues write operations (and extra post-processings) in a per-item queue. You can check item's queue at https://catalogd.archive.org/history/item-name-here . Because of that, all uploads/deletes will not show up immediately and takes some time to be available.
The per-item queue is enqueued to an another queue, Item Deriver Queue. [You can check the status of Item Deriver Queue here.](https://catalogd.archive.org/catalog.php?whereami=1) This queue has a limit, and it may block you from uploading, or even deleting. You should avoid uploading a lot of small files for better behavior.
You can optionally wait for the server's processing to finish, by setting non-zero value to `wait_archive` key.
By making it wait, rclone can do normal file comparison.
Make sure to set a large enough value (e.g. `30m0s` for smaller files) as it can take a long time depending on server's queue.
## Configuration
Here is an example of making an internetarchive configuration.
Most applies to the other providers as well, any differences are described [below](#providers).
First run
rclone config
This will guide you through an interactive setup process.
```
No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> remote
Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
XX / InternetArchive Items
\ (internetarchive)
Storage> internetarchive
Option access_key_id.
IAS3 Access Key.
Leave blank for anonymous access.
You can find one here: https://archive.org/account/s3.php
Enter a value. Press Enter to leave empty.
access_key_id> XXXX
Option secret_access_key.
IAS3 Secret Key (password).
Leave blank for anonymous access.
Enter a value. Press Enter to leave empty.
secret_access_key> XXXX
Edit advanced config?
y) Yes
n) No (default)
y/n> y
Option endpoint.
IAS3 Endpoint.
Leave blank for default value.
Enter a string value. Press Enter for the default (https://s3.us.archive.org).
endpoint>
Option front_endpoint.
Host of InternetArchive Frontend.
Leave blank for default value.
Enter a string value. Press Enter for the default (https://archive.org).
front_endpoint>
Option disable_checksum.
Don't store MD5 checksum with object metadata.
Normally rclone will calculate the MD5 checksum of the input before
uploading it so it can ask the server to check the object against checksum.
This is great for data integrity checking but can cause long delays for
large files to start uploading.
Enter a boolean value (true or false). Press Enter for the default (true).
disable_checksum> true
Option encoding.
The encoding for the backend.
See the [encoding section in the overview](/overview/#encoding) for more info.
Enter a encoder.MultiEncoder value. Press Enter for the default (Slash,Question,Hash,Percent,Del,Ctl,InvalidUtf8,Dot).
encoding>
Edit advanced config?
y) Yes
n) No (default)
y/n> n
--------------------
[remote]
type = internetarchive
access_key_id = XXXX
secret_access_key = XXXX
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
```
{{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/internetarchive/internetarchive.go then run make backenddocs" >}}
### Standard options
Here are the standard options specific to internetarchive (Internet Archive).
#### --internetarchive-access-key-id
IAS3 Access Key.
Leave blank for anonymous access.
You can find one here: https://archive.org/account/s3.php
Properties:
- Config: access_key_id
- Env Var: RCLONE_INTERNETARCHIVE_ACCESS_KEY_ID
- Type: string
- Required: false
#### --internetarchive-secret-access-key
IAS3 Secret Key (password).
Leave blank for anonymous access.
Properties:
- Config: secret_access_key
- Env Var: RCLONE_INTERNETARCHIVE_SECRET_ACCESS_KEY
- Type: string
- Required: false
### Advanced options
Here are the advanced options specific to internetarchive (Internet Archive).
#### --internetarchive-endpoint
IAS3 Endpoint.
Leave blank for default value.
Properties:
- Config: endpoint
- Env Var: RCLONE_INTERNETARCHIVE_ENDPOINT
- Type: string
- Default: "https://s3.us.archive.org"
#### --internetarchive-front-endpoint
Host of InternetArchive Frontend.
Leave blank for default value.
Properties:
- Config: front_endpoint
- Env Var: RCLONE_INTERNETARCHIVE_FRONT_ENDPOINT
- Type: string
- Default: "https://archive.org"
#### --internetarchive-disable-checksum
Don't ask the server to test against MD5 checksum calculated by rclone.
Normally rclone will calculate the MD5 checksum of the input before
uploading it so it can ask the server to check the object against checksum.
This is great for data integrity checking but can cause long delays for
large files to start uploading.
Properties:
- Config: disable_checksum
- Env Var: RCLONE_INTERNETARCHIVE_DISABLE_CHECKSUM
- Type: bool
- Default: true
#### --internetarchive-wait-archive
Timeout for waiting the server's processing tasks (specifically archive and book_op) to finish.
Only enable if you need to be guaranteed to be reflected after write operations.
0 to disable waiting. No errors to be thrown in case of timeout.
Properties:
- Config: wait_archive
- Env Var: RCLONE_INTERNETARCHIVE_WAIT_ARCHIVE
- Type: Duration
- Default: 0s
#### --internetarchive-encoding
The encoding for the backend.
See the [encoding section in the overview](/overview/#encoding) for more info.
Properties:
- Config: encoding
- Env Var: RCLONE_INTERNETARCHIVE_ENCODING
- Type: MultiEncoder
- Default: Slash,LtGt,CrLf,Del,Ctl,InvalidUtf8,Dot
{{< rem autogenerated options stop >}}

View file

@ -32,6 +32,7 @@ Here is an overview of the major features of each cloud storage system.
| HDFS | - | Yes | No | No | - | | HDFS | - | Yes | No | No | - |
| HTTP | - | No | No | No | R | | HTTP | - | No | No | No | R |
| Hubic | MD5 | Yes | No | No | R/W | | Hubic | MD5 | Yes | No | No | R/W |
| Internet Archive | MD5, SHA1, CRC32 | Yes | No | No | - |
| Jottacloud | MD5 | Yes | Yes | No | R | | Jottacloud | MD5 | Yes | Yes | No | R |
| Koofr | MD5 | No | Yes | No | - | | Koofr | MD5 | No | Yes | No | - |
| Mail.ru Cloud | Mailru ⁶ | Yes | Yes | No | - | | Mail.ru Cloud | Mailru ⁶ | Yes | Yes | No | - |
@ -427,6 +428,7 @@ upon backend-specific capabilities.
| HDFS | Yes | No | Yes | Yes | No | No | Yes | No | Yes | Yes | | HDFS | Yes | No | Yes | Yes | No | No | Yes | No | Yes | Yes |
| HTTP | No | No | No | No | No | No | No | No | No | Yes | | HTTP | No | No | No | No | No | No | No | No | No | Yes |
| Hubic | Yes † | Yes | No | No | No | Yes | Yes | No | Yes | No | | Hubic | Yes † | Yes | No | No | No | Yes | Yes | No | Yes | No |
| Internet Archive | No | Yes | No | No | Yes | Yes | No | Yes | Yes | No |
| Jottacloud | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | | Jottacloud | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes |
| Mail.ru Cloud | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | | Mail.ru Cloud | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes |
| Mega | Yes | No | Yes | Yes | Yes | No | No | Yes | Yes | Yes | | Mega | Yes | No | Yes | Yes | Yes | No | No | Yes | Yes | Yes |

View file

@ -73,6 +73,7 @@
<a class="dropdown-item" href="/hdfs/"><i class="fa fa-globe"></i> HDFS (Hadoop Distributed Filesystem)</a> <a class="dropdown-item" href="/hdfs/"><i class="fa fa-globe"></i> HDFS (Hadoop Distributed Filesystem)</a>
<a class="dropdown-item" href="/http/"><i class="fa fa-globe"></i> HTTP</a> <a class="dropdown-item" href="/http/"><i class="fa fa-globe"></i> HTTP</a>
<a class="dropdown-item" href="/hubic/"><i class="fa fa-space-shuttle"></i> Hubic</a> <a class="dropdown-item" href="/hubic/"><i class="fa fa-space-shuttle"></i> Hubic</a>
<a class="dropdown-item" href="/internetarchive/"><i class="fa fa-archive"></i> Internet Archive</a>
<a class="dropdown-item" href="/jottacloud/"><i class="fa fa-cloud"></i> Jottacloud</a> <a class="dropdown-item" href="/jottacloud/"><i class="fa fa-cloud"></i> Jottacloud</a>
<a class="dropdown-item" href="/koofr/"><i class="fa fa-suitcase"></i> Koofr</a> <a class="dropdown-item" href="/koofr/"><i class="fa fa-suitcase"></i> Koofr</a>
<a class="dropdown-item" href="/mailru/"><i class="fa fa-at"></i> Mail.ru Cloud</a> <a class="dropdown-item" href="/mailru/"><i class="fa fa-at"></i> Mail.ru Cloud</a>

View file

@ -133,6 +133,9 @@ backends:
- backend: "hubic" - backend: "hubic"
remote: "TestHubic:" remote: "TestHubic:"
fastlist: false fastlist: false
- backend: "internetarchive"
remote: "TestIA:lesmi-rclone-test/"
fastlist: true
- backend: "jottacloud" - backend: "jottacloud"
remote: "TestJottacloud:" remote: "TestJottacloud:"
fastlist: true fastlist: true