fuse: Mix inode hashes in a non-symmetric way

Since 0.15 (#4020), inodes are generated as hashes of names, xor'd with
the parent inode. That means that the inode of a/b/b is

	h(a/b/b) = h(a) ^ h(b) ^ h(b) = h(a).

I.e., the grandchild has the same inode as the grandparent. GNU find
trips over this because it thinks it has encountered a loop in the
filesystem, and fails to search a/b/b. This happens more generally when
the same name occurs an even number of times.

Fix this by multiplying the parent by a large prime, so the combining
operation is not longer symmetric in its arguments. This is what the FNV
hash does, which we used prior to 0.15. The hash is now

	h(a/b/b) = h(b) ^ p*(h(b) ^ p*h(a))

Note that we already ensure that h(x) is never zero.

Collisions can still occur, but they should be much less likely to occur
within a single path.

Fixes #4253.
This commit is contained in:
greatroar 2023-03-21 17:33:18 +01:00
parent f646406822
commit a0885d5d69
3 changed files with 33 additions and 2 deletions

View file

@ -0,0 +1,18 @@
Bugfix: Mount command should no longer create spurious filesystem loops
When a backup contains a directory that has the same name as its parent,
say, a/b/b, and the GNU find command were run on this backup in a restic
mount, find command would refuse to traverse the lowest "b" directory,
instead printing "File system loop detected". This is due to the way the
restic mount command generates inode numbers for directories in the mount
point.
The rule for generating these inode numbers was changed in 0.15.0. It has
now been changed again to avoid this issue. A perfect rule does not exist,
but the probability of this behavior occurring is now extremely small.
When it does occur, the mount point is not broken, and scripts that traverse
the mount point should work as long as they don't rely on inode numbers for
detecting filesystem loops.
https://github.com/restic/restic/issues/4253
https://github.com/restic/restic/pull/4255

View file

@ -226,6 +226,17 @@ func TestInodeFromNode(t *testing.T) {
ino1 = inodeFromNode(1, node)
ino2 = inodeFromNode(2, node)
rtest.Assert(t, ino1 != ino2, "same inode %d but different parent", ino1)
// Regression test: in a path a/b/b, the grandchild should not get the
// same inode as the grandparent.
a := &restic.Node{Name: "a", Type: "dir", Links: 2}
ab := &restic.Node{Name: "b", Type: "dir", Links: 2}
abb := &restic.Node{Name: "b", Type: "dir", Links: 2}
inoA := inodeFromNode(1, a)
inoAb := inodeFromNode(inoA, ab)
inoAbb := inodeFromNode(inoAb, abb)
rtest.Assert(t, inoA != inoAb, "inode(a/b) = inode(a)")
rtest.Assert(t, inoA != inoAbb, "inode(a/b/b) = inode(a)")
}
var sink uint64

View file

@ -10,9 +10,11 @@ import (
"github.com/restic/restic/internal/restic"
)
const prime = 11400714785074694791 // prime1 from xxhash.
// inodeFromName generates an inode number for a file in a meta dir.
func inodeFromName(parent uint64, name string) uint64 {
inode := parent ^ xxhash.Sum64String(cleanupNodeName(name))
inode := prime*parent ^ xxhash.Sum64String(cleanupNodeName(name))
// Inode 0 is invalid and 1 is the root. Remap those.
if inode < 2 {
@ -33,7 +35,7 @@ func inodeFromNode(parent uint64, node *restic.Node) (inode uint64) {
} else {
// Else, use the name and the parent inode.
// node.{DeviceID,Inode} may not even be reliable.
inode = parent ^ xxhash.Sum64String(cleanupNodeName(node.Name))
inode = prime*parent ^ xxhash.Sum64String(cleanupNodeName(node.Name))
}
// Inode 0 is invalid and 1 is the root. Remap those.