Benchmark results on Linux/amd64, using updated benchmark for old and
new:
name old time/op new time/op delta
HashFile-8 88.6ms ± 1% 88.3ms ± 1% -0.33% (p=0.046 n=19+19)
name old speed new speed delta
HashFile-8 201MB/s ± 1% 202MB/s ± 1% +0.33% (p=0.044 n=19+19)
name old alloc/op new alloc/op delta
HashFile-8 59.4kB ± 0% 46.1kB ± 0% -22.47% (p=0.000 n=14+20)
name old allocs/op new allocs/op delta
HashFile-8 29.0 ± 0% 27.0 ± 0% -6.90% (p=0.000 n=20+20)
Co-authored-by: greatroar <@>
This fixes the change in #6674 where the weak hash became a deciding
factor. Now we again just use it to accept a block, but don't take a
negative as meaning the block is bad.
The folder already knew how to stop properly, but the fs.Walk() didn't
and can potentially take a very long time. This adds context support to
Walk and the underlying scanning stuff, and passes in an appropriate
context from above. The stop channel in model.folder is replaced with a
context for this purpose.
To test I added an infiniteFS that represents a large amount of data
(not actually infinite, but close) and verify that walking it is
properly stopped. For that to be implemented smoothly I moved out the
Walk function to it's own type, as typically the implementer of a new
filesystem type might not need or want to reimplement Walk.
It's somewhat tricky to test that this actually works properly on the
actual sendReceiveFolder and so on, as those are started from inside the
model and the filesystem isn't easily pluggable etc. Instead I've tested
that part manually by adding a huge folder and verifying that pause,
resume and reconfig do the right things by looking at debug output.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4117
Can't do what I did, as the rolling function is not the same as the
non-rolling one. Instead this uses an improved version of the rolling
adler32 to accomplish the same thing. (PR filed on upstream, so should
be able to use that directly in the future.)
The rolling version of adler32 is just a wrapper around the standard
hash/adler32 when used in a non-rolling fashion, but it's inefficient as
it allocates a new hash instance for every Write(). This uses the
default version instead in the block hasher, and adds a test to verify
the result is the same as they were before. It reduces allocations by
88% and increases speed about 5%.
benchmark old ns/op new ns/op delta
BenchmarkHashFile-8 64434698 61303647 -4.86%
benchmark old MB/s new MB/s speedup
BenchmarkHashFile-8 276.65 290.78 1.05x
benchmark old allocs new allocs delta
BenchmarkHashFile-8 1238 150 -87.88%
benchmark old bytes new bytes delta
BenchmarkHashFile-8 17877363 49292 -99.72%
This adds autodetection of the fastest hashing library on startup, thus
handling the performance regression. It also adds an environment
variable to control the selection, STHASHING=standard (Go standard
library version, avoids SIGILL crash when the minio library has bugs on
odd CPUs), STHASHING=minio (to force using the minio version) or unset
for the default autodetection.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3617
Otherwise if the file grows during scanning the block list will be out
of sync with the stated size and things get confused. We could fixup the
size afterwards based on the block list, but then we might see other
inconsistencies as the mtime should have changed to reflect the new size
etc. Better stick to the original state and let the next scan pick up
the change.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3442
By using copyBuffer we avoid a buffer allocation for each block we hash,
and by allocating space for the hashes up front we get one large backing
array instead of a small one for each block. For a 17 MiB file this
makes quite a difference in the amount of memory allocated:
benchmark old ns/op new ns/op delta
BenchmarkHashFile-8 102045110 100459158 -1.55%
benchmark old allocs new allocs delta
BenchmarkHashFile-8 415 144 -65.30%
benchmark old bytes new bytes delta
BenchmarkHashFile-8 4504296 48104 -98.93%