27 Commits

Author SHA1 Message Date
greatroar
d985aa9e4b
lib/scanner: Fix Validate docs (#6776)
Co-authored-by: greatroar <@>
2020-06-21 19:23:06 +01:00
greatroar
9c0825c0d9
lib/scanner: Simplify, optimize and document Validate (#6674) (#6688) 2020-05-27 22:23:12 +02:00
Audrius Butkevicius
fafd30f804 lib/scanner: Use standard adler32 when we don't need rolling (#5556)
* lib/scanner: Use standard adler32 when we don't need rolling

Seems the rolling adler32 implementation is super slow when executed on large blocks, even tho I can't explain why.

BenchmarkFind1MFile-16    				     100	  18991667 ns/op	  55.21 MB/s	  398844 B/op	      20 allocs/op
BenchmarkBlock/adler32-131072/#00-16     		     200	   9726519 ns/op	1078.06 MB/s	 2654936 B/op	     163 allocs/op
BenchmarkBlock/bozo32-131072/#00-16      		      20	  73435540 ns/op	 142.79 MB/s	 2654928 B/op	     163 allocs/op
BenchmarkBlock/buzhash32-131072/#00-16   		      20	  61482005 ns/op	 170.55 MB/s	 2654928 B/op	     163 allocs/op
BenchmarkBlock/buzhash64-131072/#00-16   		      20	  61673660 ns/op	 170.02 MB/s	 2654928 B/op	     163 allocs/op
BenchmarkBlock/vanilla-adler32-131072/#00-16         	     300	   4377307 ns/op	2395.48 MB/s	 2654935 B/op	     163 allocs/op
BenchmarkBlock/adler32-16777216/#00-16               	       2	 544010100 ns/op	  19.27 MB/s	   65624 B/op	       5 allocs/op
BenchmarkBlock/bozo32-16777216/#00-16                	       1	4678108500 ns/op	   2.24 MB/s	51970144 B/op	      24 allocs/op
BenchmarkBlock/buzhash32-16777216/#00-16             	       1	3880370700 ns/op	   2.70 MB/s	51970144 B/op	      24 allocs/op
BenchmarkBlock/buzhash64-16777216/#00-16             	       1	3875911700 ns/op	   2.71 MB/s	51970144 B/op	      24 allocs/op
BenchmarkBlock/vanilla-adler32-16777216/#00-16       	     300	   4010279 ns/op	2614.72 MB/s	   65624 B/op	       5 allocs/op
BenchmarkRoll/adler32-131072/#00-16                  	    2000	    974279 ns/op	 134.53 MB/s	     270 B/op	       0 allocs/op
BenchmarkRoll/bozo32-131072/#00-16                   	    2000	    791770 ns/op	 165.54 MB/s	     270 B/op	       0 allocs/op
BenchmarkRoll/buzhash32-131072/#00-16                	    2000	    917409 ns/op	 142.87 MB/s	     270 B/op	       0 allocs/op
BenchmarkRoll/buzhash64-131072/#00-16                	    2000	    881125 ns/op	 148.76 MB/s	     270 B/op	       0 allocs/op
BenchmarkRoll/adler32-16777216/#00-16                	      10	 124000400 ns/op	 135.30 MB/s	 7548937 B/op	       0 allocs/op
BenchmarkRoll/bozo32-16777216/#00-16                 	      10	 118008080 ns/op	 142.17 MB/s	 7548928 B/op	       0 allocs/op
BenchmarkRoll/buzhash32-16777216/#00-16              	      10	 126794440 ns/op	 132.32 MB/s	 7548928 B/op	       0 allocs/op
BenchmarkRoll/buzhash64-16777216/#00-16              	      10	 126631960 ns/op	 132.49 MB/s	 7548928 B/op	       0 allocs/op

* Update benchmark_test.go

* gofmt

* fixup benchmark
2019-02-25 13:29:31 +04:00
Jakob Borg
c2ddc83509 all: Revert the underscore sillyness 2019-02-02 12:16:27 +01:00
Jakob Borg
0b2cabbc31
all: Even more boring linter fixes (#5501) 2019-02-02 11:45:17 +01:00
Audrius Butkevicius
ef0dcea6a4 lib/model: Verify request content against weak (and possibly strong) hash (#4767) 2018-05-05 10:24:44 +02:00
Lars K.W. Gohlke
89a021609b lib/scanner: Refactoring
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4642
LGTM: imsodin, AudriusButkevicius
2018-01-14 14:30:11 +00:00
Jakob Borg
d6fbfc3545 lib/fs, lib/model, lib/scanner: Make scans cancellable (fixes #3965)
The folder already knew how to stop properly, but the fs.Walk() didn't
and can potentially take a very long time. This adds context support to
Walk and the underlying scanning stuff, and passes in an appropriate
context from above. The stop channel in model.folder is replaced with a
context for this purpose.

To test I added an infiniteFS that represents a large amount of data
(not actually infinite, but close) and verify that walking it is
properly stopped. For that to be implemented smoothly I moved out the
Walk function to it's own type, as typically the implementer of a new
filesystem type might not need or want to reimplement Walk.

It's somewhat tricky to test that this actually works properly on the
actual sendReceiveFolder and so on, as those are started from inside the
model and the filesystem isn't easily pluggable etc. Instead I've tested
that part manually by adding a huge folder and verifying that pause,
resume and reconfig do the right things by looking at debug output.

GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4117
2017-04-26 00:15:23 +00:00
Jakob Borg
f7fc0c1d3e all: Update license url to https (ref #3976) 2017-02-09 08:04:16 +01:00
Audrius Butkevicius
dd78177ae0 scanner: Allow disabling weak hash in scanning (fixes #3891)
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3905
2017-01-23 13:50:32 +00:00
Jakob Borg
68f1c6ccab lib/scanner: Avoid per iteration allocations in Blocks()
Resetting the io.LimitReader is better than creating a new one on every
iteration.
2017-01-18 18:43:00 +01:00
Jakob Borg
bd1c29ee32 lib/scanner, vendor: Fix previous commit
Can't do what I did, as the rolling function is not the same as the
non-rolling one. Instead this uses an improved version of the rolling
adler32 to accomplish the same thing. (PR filed on upstream, so should
be able to use that directly in the future.)
2017-01-18 11:57:01 +01:00
Jakob Borg
9b1c592fb7 lib/scanner: Speed up weak hash
The rolling version of adler32 is just a wrapper around the standard
hash/adler32 when used in a non-rolling fashion, but it's inefficient as
it allocates a new hash instance for every Write(). This uses the
default version instead in the block hasher, and adds a test to verify
the result is the same as they were before. It reduces allocations by
88% and increases speed about 5%.

	benchmark               old ns/op     new ns/op     delta
	BenchmarkHashFile-8     64434698      61303647      -4.86%

	benchmark               old MB/s     new MB/s     speedup
	BenchmarkHashFile-8     276.65       290.78       1.05x

	benchmark               old allocs     new allocs     delta
	BenchmarkHashFile-8     1238           150            -87.88%

	benchmark               old bytes     new bytes     delta
	BenchmarkHashFile-8     17877363      49292         -99.72%
2017-01-18 10:33:17 +01:00
Audrius Butkevicius
29d010ec0e lib/model, lib/weakhash: Hash using adler32, add heuristic in puller
Adler32 is much faster, and the heuristic avoid the obvious cases where it
will not help.

GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3872
2017-01-04 21:04:13 +00:00
Jakob Borg
47f22ff3e5 build: Enable gometalinter "unconvert" check 2016-12-21 14:53:45 +01:00
Audrius Butkevicius
0582836820 lib/model, lib/scanner: Efficient inserts/deletes in the middle of the file
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3527
2016-12-14 23:30:29 +00:00
Jakob Borg
d328e0fb75 cmd/syncthing: Add selectable sha256 package (fixes #3613, fixes #3614)
This adds autodetection of the fastest hashing library on startup, thus
handling the performance regression. It also adds an environment
variable to control the selection, STHASHING=standard (Go standard
library version, avoids SIGILL crash when the minio library has bugs on
odd CPUs), STHASHING=minio (to force using the minio version) or unset
for the default autodetection.

GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3617
2016-09-23 19:33:54 +00:00
Jakob Borg
5e99d38412 all: Use github.com/minio/sha256-simd
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3581
2016-09-09 09:57:51 +00:00
Jakob Borg
7aaa1dd8a3 lib/scanner: Recheck file size and modification time after hashing (ref #3440)
To catch the case where the file changed. Also make sure we never let a
size-vs-blocklist mismatch slip through.

GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3443
2016-07-26 08:51:39 +00:00
Jakob Borg
2a6f164923 lib/scanner: When scanning a file, stick to the size given by Lstat (fixes #3440)
Otherwise if the file grows during scanning the block list will be out
of sync with the stated size and things get confused. We could fixup the
size afterwards based on the block list, but then we might see other
inconsistencies as the mtime should have changed to reflect the new size
etc. Better stick to the original state and let the next scan pick up
the change.

GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3442
2016-07-25 19:16:49 +00:00
Jakob Borg
f5f0e46016 lib: Use bytes.Equal instead of bytes.Compare where possible 2016-03-31 15:12:46 +00:00
Jakob Borg
a8a2192cf9 Show scan rate in web GUI 2015-11-17 21:23:17 +01:00
Jakob Borg
dc32f7f0a3 Reduce allocations in HashFile
By using copyBuffer we avoid a buffer allocation for each block we hash,
and by allocating space for the hashes up front we get one large backing
array instead of a small one for each block. For a 17 MiB file this
makes quite a difference in the amount of memory allocated:

	benchmark               old ns/op     new ns/op     delta
	BenchmarkHashFile-8     102045110     100459158     -1.55%

	benchmark               old allocs     new allocs     delta
	BenchmarkHashFile-8     415            144            -65.30%

	benchmark               old bytes     new bytes     delta
	BenchmarkHashFile-8     4504296       48104         -98.93%
2015-10-27 09:37:27 +01:00
Jakob Borg
4581c57478 Fix import paths 2015-09-22 19:38:46 +02:00
Jakob Borg
bc016e360e Refactor: ints used in arithmetic should be signed 2015-08-27 21:37:12 +02:00
AudriusButkevicius
94c52e3a77 Add scan percentages (fixes #1030) 2015-08-27 19:20:43 +01:00
Jakob Borg
7705a6c1f1 mv internal lib 2015-08-09 09:35:26 +02:00