forked from TrueCloudLab/rclone
810644e873
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
271 lines
10 KiB
Go
271 lines
10 KiB
Go
package bisync
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"fmt"
|
|
"strings"
|
|
|
|
"github.com/rclone/rclone/backend/crypt"
|
|
"github.com/rclone/rclone/cmd/bisync/bilib"
|
|
"github.com/rclone/rclone/cmd/check"
|
|
"github.com/rclone/rclone/fs"
|
|
"github.com/rclone/rclone/fs/accounting"
|
|
"github.com/rclone/rclone/fs/filter"
|
|
"github.com/rclone/rclone/fs/hash"
|
|
"github.com/rclone/rclone/fs/operations"
|
|
)
|
|
|
|
var hashType hash.Type
|
|
var fsrc, fdst fs.Fs
|
|
var fcrypt *crypt.Fs
|
|
|
|
// WhichCheck determines which CheckFn we should use based on the Fs types
|
|
// It is more robust and accurate than Check because
|
|
// it will fallback to CryptCheck or DownloadCheck instead of --size-only!
|
|
// it returns the *operations.CheckOpt with the CheckFn set.
|
|
func WhichCheck(ctx context.Context, opt *operations.CheckOpt) *operations.CheckOpt {
|
|
ci := fs.GetConfig(ctx)
|
|
common := opt.Fsrc.Hashes().Overlap(opt.Fdst.Hashes())
|
|
|
|
// note that ci.IgnoreChecksum doesn't change the behavior of Check -- it's just a way to opt-out of cryptcheck/download
|
|
if common.Count() > 0 || ci.SizeOnly || ci.IgnoreChecksum {
|
|
// use normal check
|
|
opt.Check = CheckFn
|
|
return opt
|
|
}
|
|
|
|
FsrcCrypt, srcIsCrypt := opt.Fsrc.(*crypt.Fs)
|
|
FdstCrypt, dstIsCrypt := opt.Fdst.(*crypt.Fs)
|
|
|
|
if (srcIsCrypt && dstIsCrypt) || (!srcIsCrypt && dstIsCrypt) {
|
|
// if both are crypt or only dst is crypt
|
|
hashType = FdstCrypt.UnWrap().Hashes().GetOne()
|
|
if hashType != hash.None {
|
|
// use cryptcheck
|
|
fsrc = opt.Fsrc
|
|
fdst = opt.Fdst
|
|
fcrypt = FdstCrypt
|
|
fs.Infof(fdst, "Crypt detected! Using cryptcheck instead of check. (Use --size-only or --ignore-checksum to disable)")
|
|
opt.Check = CryptCheckFn
|
|
return opt
|
|
}
|
|
} else if srcIsCrypt && !dstIsCrypt {
|
|
// if only src is crypt
|
|
hashType = FsrcCrypt.UnWrap().Hashes().GetOne()
|
|
if hashType != hash.None {
|
|
// use reverse cryptcheck
|
|
fsrc = opt.Fdst
|
|
fdst = opt.Fsrc
|
|
fcrypt = FsrcCrypt
|
|
fs.Infof(fdst, "Crypt detected! Using cryptcheck instead of check. (Use --size-only or --ignore-checksum to disable)")
|
|
opt.Check = ReverseCryptCheckFn
|
|
return opt
|
|
}
|
|
}
|
|
|
|
// if we've gotten this far, niether check or cryptcheck will work, so use --download
|
|
fs.Infof(fdst, "Can't compare hashes, so using check --download for safety. (Use --size-only or --ignore-checksum to disable)")
|
|
opt.Check = DownloadCheckFn
|
|
return opt
|
|
}
|
|
|
|
// CheckFn is a slightly modified version of Check
|
|
func CheckFn(ctx context.Context, dst, src fs.Object) (differ bool, noHash bool, err error) {
|
|
same, ht, err := operations.CheckHashes(ctx, src, dst)
|
|
if err != nil {
|
|
return true, false, err
|
|
}
|
|
if ht == hash.None {
|
|
return false, true, nil
|
|
}
|
|
if !same {
|
|
err = fmt.Errorf("%v differ", ht)
|
|
fs.Errorf(src, "%v", err)
|
|
return true, false, nil
|
|
}
|
|
return false, false, nil
|
|
}
|
|
|
|
// CryptCheckFn is a slightly modified version of CryptCheck
|
|
func CryptCheckFn(ctx context.Context, dst, src fs.Object) (differ bool, noHash bool, err error) {
|
|
cryptDst := dst.(*crypt.Object)
|
|
underlyingDst := cryptDst.UnWrap()
|
|
underlyingHash, err := underlyingDst.Hash(ctx, hashType)
|
|
if err != nil {
|
|
return true, false, fmt.Errorf("error reading hash from underlying %v: %w", underlyingDst, err)
|
|
}
|
|
if underlyingHash == "" {
|
|
return false, true, nil
|
|
}
|
|
cryptHash, err := fcrypt.ComputeHash(ctx, cryptDst, src, hashType)
|
|
if err != nil {
|
|
return true, false, fmt.Errorf("error computing hash: %w", err)
|
|
}
|
|
if cryptHash == "" {
|
|
return false, true, nil
|
|
}
|
|
if cryptHash != underlyingHash {
|
|
err = fmt.Errorf("hashes differ (%s:%s) %q vs (%s:%s) %q", fdst.Name(), fdst.Root(), cryptHash, fsrc.Name(), fsrc.Root(), underlyingHash)
|
|
fs.Debugf(src, err.Error())
|
|
// using same error msg as CheckFn so integration tests match
|
|
err = fmt.Errorf("%v differ", hashType)
|
|
fs.Errorf(src, err.Error())
|
|
return true, false, nil
|
|
}
|
|
return false, false, nil
|
|
}
|
|
|
|
// ReverseCryptCheckFn is like CryptCheckFn except src and dst are switched
|
|
// result: src is crypt, dst is non-crypt
|
|
func ReverseCryptCheckFn(ctx context.Context, dst, src fs.Object) (differ bool, noHash bool, err error) {
|
|
return CryptCheckFn(ctx, src, dst)
|
|
}
|
|
|
|
// DownloadCheckFn is a slightly modified version of Check with --download
|
|
func DownloadCheckFn(ctx context.Context, a, b fs.Object) (differ bool, noHash bool, err error) {
|
|
differ, err = operations.CheckIdenticalDownload(ctx, a, b)
|
|
if err != nil {
|
|
return true, true, fmt.Errorf("failed to download: %w", err)
|
|
}
|
|
return differ, false, nil
|
|
}
|
|
|
|
// check potential conflicts (to avoid renaming if already identical)
|
|
func (b *bisyncRun) checkconflicts(ctxCheck context.Context, filterCheck *filter.Filter, fs1, fs2 fs.Fs) (bilib.Names, error) {
|
|
matches := bilib.Names{}
|
|
if filterCheck.HaveFilesFrom() {
|
|
fs.Debugf(nil, "There are potential conflicts to check.")
|
|
|
|
opt, close, checkopterr := check.GetCheckOpt(b.fs1, b.fs2)
|
|
if checkopterr != nil {
|
|
b.critical = true
|
|
b.retryable = true
|
|
fs.Debugf(nil, "GetCheckOpt error: %v", checkopterr)
|
|
return matches, checkopterr
|
|
}
|
|
defer close()
|
|
|
|
opt.Match = new(bytes.Buffer)
|
|
|
|
opt = WhichCheck(ctxCheck, opt)
|
|
|
|
fs.Infof(nil, "Checking potential conflicts...")
|
|
check := operations.CheckFn(ctxCheck, opt)
|
|
fs.Infof(nil, "Finished checking the potential conflicts. %s", check)
|
|
|
|
//reset error count, because we don't want to count check errors as bisync errors
|
|
accounting.Stats(ctxCheck).ResetErrors()
|
|
|
|
//return the list of identical files to check against later
|
|
if len(fmt.Sprint(opt.Match)) > 0 {
|
|
matches = bilib.ToNames(strings.Split(fmt.Sprint(opt.Match), "\n"))
|
|
}
|
|
if matches.NotEmpty() {
|
|
fs.Debugf(nil, "The following potential conflicts were determined to be identical. %v", matches)
|
|
} else {
|
|
fs.Debugf(nil, "None of the conflicts were determined to be identical.")
|
|
}
|
|
|
|
}
|
|
return matches, nil
|
|
}
|
|
|
|
// WhichEqual is similar to WhichCheck, but checks a single object.
|
|
// Returns true if the objects are equal, false if they differ or if we don't know
|
|
func WhichEqual(ctx context.Context, src, dst fs.Object, Fsrc, Fdst fs.Fs) bool {
|
|
opt, close, checkopterr := check.GetCheckOpt(Fsrc, Fdst)
|
|
if checkopterr != nil {
|
|
fs.Debugf(nil, "GetCheckOpt error: %v", checkopterr)
|
|
}
|
|
defer close()
|
|
|
|
opt = WhichCheck(ctx, opt)
|
|
differ, noHash, err := opt.Check(ctx, dst, src)
|
|
if err != nil {
|
|
fs.Errorf(src, "failed to check: %v", err)
|
|
return false
|
|
}
|
|
if noHash {
|
|
fs.Errorf(src, "failed to check as hash is missing")
|
|
return false
|
|
}
|
|
return !differ
|
|
}
|
|
|
|
// Replaces the standard Equal func with one that also considers checksum
|
|
// Note that it also updates the modtime the same way as Sync
|
|
func (b *bisyncRun) EqualFn(ctx context.Context) context.Context {
|
|
ci := fs.GetConfig(ctx)
|
|
ci.CheckSum = false // force checksum off so modtime is evaluated if needed
|
|
// modtime and size settings should already be set correctly for Equal
|
|
var equalFn operations.EqualFn = func(ctx context.Context, src fs.ObjectInfo, dst fs.Object) bool {
|
|
fs.Debugf(src, "evaluating...")
|
|
equal := false
|
|
logger, _ := operations.GetLogger(ctx)
|
|
// temporarily unset logger, we don't want Equal to duplicate it
|
|
noop := func(ctx context.Context, sigil operations.Sigil, src, dst fs.DirEntry, err error) {
|
|
fs.Debugf(src, "equal skipped")
|
|
}
|
|
ctxNoLogger := operations.WithLogger(ctx, noop)
|
|
|
|
timeSizeEqualFn := func() (equal bool, skipHash bool) { return operations.Equal(ctxNoLogger, src, dst), false } // normally use Equal()
|
|
if b.opt.ResyncMode == PreferOlder || b.opt.ResyncMode == PreferLarger || b.opt.ResyncMode == PreferSmaller {
|
|
timeSizeEqualFn = func() (equal bool, skipHash bool) { return b.resyncTimeSizeEqual(ctxNoLogger, src, dst) } // but override for --resync-mode older, larger, smaller
|
|
}
|
|
skipHash := false // (note that we might skip it anyway based on compare/ht settings)
|
|
equal, skipHash = timeSizeEqualFn()
|
|
if equal && !skipHash {
|
|
whichHashType := func(f fs.Info) hash.Type {
|
|
ht := getHashType(f.Name())
|
|
if ht == hash.None && b.opt.Compare.SlowHashSyncOnly && !b.opt.Resync {
|
|
ht = f.Hashes().GetOne()
|
|
}
|
|
return ht
|
|
}
|
|
srcHash, _ := src.Hash(ctx, whichHashType(src.Fs()))
|
|
dstHash, _ := dst.Hash(ctx, whichHashType(dst.Fs()))
|
|
srcHash, _ = tryDownloadHash(ctx, src, srcHash)
|
|
dstHash, _ = tryDownloadHash(ctx, dst, dstHash)
|
|
equal = !hashDiffers(srcHash, dstHash, whichHashType(src.Fs()), whichHashType(dst.Fs()), src.Size(), dst.Size())
|
|
}
|
|
if equal {
|
|
logger(ctx, operations.Match, src, dst, nil)
|
|
fs.Debugf(src, "EqualFn: files are equal")
|
|
return true
|
|
}
|
|
logger(ctx, operations.Differ, src, dst, nil)
|
|
fs.Debugf(src, "EqualFn: files are NOT equal")
|
|
return false
|
|
}
|
|
return operations.WithEqualFn(ctx, equalFn)
|
|
}
|
|
|
|
func (b *bisyncRun) resyncTimeSizeEqual(ctxNoLogger context.Context, src fs.ObjectInfo, dst fs.Object) (equal bool, skipHash bool) {
|
|
switch b.opt.ResyncMode {
|
|
case PreferLarger, PreferSmaller:
|
|
// note that arg order is path1, path2, regardless of src/dst
|
|
path1, path2 := b.resyncWhichIsWhich(src, dst)
|
|
if sizeDiffers(path1.Size(), path2.Size()) {
|
|
winningPath := b.resolveLargerSmaller(path1.Size(), path2.Size(), path1.Remote(), path2.Remote(), b.opt.ResyncMode)
|
|
// don't need to check/update modtime here, as sizes definitely differ and something will be transferred
|
|
return b.resyncWinningPathToEqual(winningPath), b.resyncWinningPathToEqual(winningPath) // skip hash check if true
|
|
}
|
|
// sizes equal or don't know, so continue to checking time/hash, if applicable
|
|
return operations.Equal(ctxNoLogger, src, dst), false // note we're back to src/dst, not path1/path2
|
|
case PreferOlder:
|
|
// note that arg order is path1, path2, regardless of src/dst
|
|
path1, path2 := b.resyncWhichIsWhich(src, dst)
|
|
if timeDiffers(ctxNoLogger, path1.ModTime(ctxNoLogger), path2.ModTime(ctxNoLogger), path1.Fs(), path2.Fs()) {
|
|
winningPath := b.resolveNewerOlder(path1.ModTime(ctxNoLogger), path2.ModTime(ctxNoLogger), path1.Remote(), path2.Remote(), b.opt.ResyncMode)
|
|
// if src is winner, proceed with equal to check size/hash and possibly just update dest modtime instead of transferring
|
|
if !b.resyncWinningPathToEqual(winningPath) {
|
|
return operations.Equal(ctxNoLogger, src, dst), false // note we're back to src/dst, not path1/path2
|
|
}
|
|
// if dst is winner (and definitely unequal), do not proceed further as we want dst to overwrite src regardless of size difference, and we do not want dest modtime updated
|
|
return true, true
|
|
}
|
|
// times equal or don't know, so continue to checking size/hash, if applicable
|
|
}
|
|
return operations.Equal(ctxNoLogger, src, dst), false // note we're back to src/dst, not path1/path2
|
|
}
|