Most notably, it now detects all-lowercase files and returns these
as-is. The tests have been expanded with two cases and are now used
as a benchmark (admittedly a rather trivial one).
name old time/op new time/op delta
UnicodeLowercaseMaybeChange-8 4.59µs ± 2% 4.57µs ± 1% ~ (p=0.197 n=10+10)
UnicodeLowercaseNoChange-8 3.26µs ± 1% 3.09µs ± 1% -5.27% (p=0.000 n=9+10)
This changes the cache to cache less things, yet retain the required
efficiency for our walk usecase. This uses less memory.
Specifically, instead of keeping result and child caches for each path
level, only keep a single cached child. In practice our operations are
depth-first, or almost depth-first, and then we retain the same hit
ratio for a smaller cache size.
I improved the benchmark so that it counts the Lstat and DirNames
operations performed, and they do not change significantly. The amount
of allocated memory is reduced by 20% and the walk itself is actually
slightly faster.
This also removes the clear based on number of cached names (as that is
not a thing any more) and the timer based clear (which was unused). This
means we'll retain the last cache state forever until it's cleared by a
write operation, but we did that before too and that state is now a lot
smaller...
The overhead compared to not using a casefs, for our typical "double
walk" (walk the tree then stat everything again) is 2x the dirnames we
would otherwise call, and no overhead on the stats (unchanged from old
implementation)
```
name old time/op new time/op delta
WalkCaseFakeFS100k/rawfs-8 306ms ± 1% 305ms ± 2% ~ (p=0.182 n=9+10)
WalkCaseFakeFS100k/casefs-8 579ms ± 5% 557ms ± 1% -3.77% (p=0.000 n=10+10)
name old B/entry new B/entry delta
WalkCaseFakeFS100k/rawfs-8 590 ± 0% 590 ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 1.09k ± 0% 0.87k ± 0% -19.98% (p=0.000 n=10+10)
name old DirNames/entry new DirNames/entry delta
WalkCaseFakeFS100k/rawfs-8 0.51 ± 0% 0.51 ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 1.02 ± 0% 1.02 ± 0% ~ (all equal)
name old DirNames/op new DirNames/op delta
WalkCaseFakeFS100k/rawfs-8 51.2k ± 0% 51.2k ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 102k ± 0% 102k ± 0% ~ (all equal)
name old Lstat/entry new Lstat/entry delta
WalkCaseFakeFS100k/rawfs-8 3.02 ± 0% 3.02 ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 3.02 ± 0% 3.02 ± 0% ~ (all equal)
name old Lstat/op new Lstat/op delta
WalkCaseFakeFS100k/rawfs-8 302k ± 0% 302k ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 302k ± 0% 302k ± 0% ~ (all equal)
name old allocs/entry new allocs/entry delta
WalkCaseFakeFS100k/rawfs-8 15.7 ± 0% 15.7 ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 27.5 ± 0% 26.1 ± 0% -5.09% (p=0.000 n=10+10)
name old ns/entry new ns/entry delta
WalkCaseFakeFS100k/rawfs-8 2.02k ± 1% 2.02k ± 2% ~ (p=0.163 n=9+10)
WalkCaseFakeFS100k/casefs-8 3.83k ± 5% 3.68k ± 1% -3.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
WalkCaseFakeFS100k/rawfs-8 89.2MB ± 0% 89.2MB ± 0% ~ (p=0.364 n=9+10)
WalkCaseFakeFS100k/casefs-8 164MB ± 0% 131MB ± 0% -19.97% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
WalkCaseFakeFS100k/rawfs-8 2.38M ± 0% 2.38M ± 0% ~ (all equal)
WalkCaseFakeFS100k/casefs-8 4.16M ± 0% 3.95M ± 0% -5.05% (p=0.000 n=10+10)
```
Since iterators must be released before committing or discarding a
transaction we have the pattern of both deferring a release plus doing
it manually. But we can't release twice because we track this with a
WaitGroup that will panic when there are more Done()s than Add()s. This
just adds a boolean to let an iterator keep track.
We created a new fileset before stopping the folder during restart. When
we create that fileset it loads the current metadata and sequence
numbers from the database. But the folder may have time to update those
before stopping, leaving the new fileset with bad data.
This would cause wrong accounting (forgotten files) and potentially
sequence reuse causing files not sent to other devices.
This change reuses the fileset on restart, skipping the issue entirely.
It also moves the creation of the fileset back under the lock so there
should be no chance of concurrency issues here.
The FileSet.Drop operation in there needs to potentially update a whole lot of global lists, which can take a while (longer than the deadlock interval apparently)
* Add clean up for Simple File Versioning pt.1
created test
* Add clean up for Simple File Versioning pt.2
Passing the test
* stuck on how javascript communicates with backend
* Add trash clean up for Simple File Versioning
Add trash clean up functionality of to allow the user to delete backups
after specified amount of days.
* Fixed html and js style
* Refactored cleanup test cases
Refactored cleanup test cases to one file and deleted duplicated code.
* Added copyright to test file
* Refactor folder cleanout to utility function
* change utility function to package private
* refactored utility function; fixed build errors
* Updated copyright year.
* refactor test and logging
* refactor html and js
* revert style change in html
* reverted changes in html and some js
* checkout origin head version edit...html
* checkout upstream master and correct file
Our authentication is based on device ID (certificate fingerprint) but
we also check the certificate name for ... historical extra security
reasons. (I don't think this adds anything but it is what it is.) Since
that check breaks in Go 1.15 this change does two things:
- Adds a manual check for the peer certificate CommonName, and if they
are equal we are happy and don't call the more advanced
VerifyHostname() function. This allows our old style certificates to
still pass the check.
- Adds the cert name "syncthing" as a DNS SAN when generating the
certificate. This is the correct way nowadays and makes VerifyHostname()
happy in Go 1.15 as well, even without the above patch.
Apparently our Tags field depended on having specific files react to
tags and add themselves there. This, instead, works for all tags.
Also, pass tags to the test command line.
The QUIC package is notorious for being incompatible with either too
old or too new Go releases. Currently it doesn't build with Go 1.15 RC
and I want to test the rest with Go 1.15. With this I can do `go run
build.go --tags noquic` to do that.
With this change we emulate a case sensitive filesystem on top of
insensitive filesystems. This means we correctly pick up case-only renames
and throw a case conflict error when there would be multiple files differing
only in case.
This safety check has a small performance hit (about 20% more filesystem
operations when scanning for changes). The new advanced folder option
`caseSensitiveFS` can be used to disable the safety checks, retaining the
previous behavior on systems known to be fully case sensitive.
Co-authored-by: Jakob Borg <jakob@kastelo.net>
Prompted by https://forum.syncthing.net/t/infinite-filesystem-recursion-detected/15285. In my opinion the filesystem shouldn't throw warnings but pass on errors for the caller to decide what's to be happening with it. Right now in this PR an infinite recursion is a normal scan error, i.e. folder is in failed state and displays failed items, but no warning. I think that's appropriate but if deemed appropriate an additional warning can be thrown in the scanner.
This fixes the change in #6674 where the weak hash became a deciding
factor. Now we again just use it to accept a block, but don't take a
negative as meaning the block is bad.
If the GC finds a key k that it wants to keep, it records that in a
Bloom filter. If a key k' can be removed but its hash collides with k,
it will be kept. Since the old Bloom filter code was completely
deterministic, the next run would encounter the same collision, assuming
k must still be kept.
A randomized hash function that uses all the SHA-256 bits solves this
problem: the second run has a non-zero probability of removing k', as
long as the Bloom filter is not completely full.
* Fix ui, hide report date
* Undo Goland madness
* UR now web scale
* Fix migration
* Fix marshaling, force tick on start
* Fix tests
* Darwin build
* Split "all" build target, add package name as a tag
* Remove pq and sql dep from syncthing, split build targets
* Empty line
* Revert "Empty line"
This reverts commit f74af2b067dadda8a343714123512bd545a643c3.
* Revert "Remove pq and sql dep from syncthing, split build targets"
This reverts commit 8fc295ad007c5bb7886c557f492dacf51be307ad.
* Revert "Split "all" build target, add package name as a tag"
This reverts commit f4dc88995106d2b06042f30bea781a0feb08e55f.
* Normalise contract types
* Fix build add more logging
This is shorter, skips two allocations, makes the function inlineable
and is safer, since the compiler now check whether
DeviceIDLength == sha256.Size.
This changes the error handling in loading ignores slightly:
- There is a new ParseError type that is returned as the error
(somewhere in the chain) when the problem was not an I/O error loading
the file, but some issue with the contents.
- If the file was read successfully but not parsed successfully we still
return the lines read (in addition to nil patterns and a ParseError).
- In the API, if the error IsParseError then we return a successful
HTTP response with the lines and the actual error included in the JSON
object.
- In the GUI, as long as the HTTP call to load the ignores was
successful we can edit the ignores. If there was an error we show this
as a validation error on the dialog.
Also some cleanup on the Javascript side as it for some reason used
jQuery instead of Angular for this editor...
crypto/rand output is cryptographically secure by the Go library
documentation's promise. That, rather than strength (= passes randomness
tests) is the property that Syncthing needs).
This makes Go 1.15 test/vet happy, avoiding "conversion from untyped int
to string yields a string of one rune" warning where we do
string(KeyTypeWhatever) in namespaced.go.
It also clarifies and enforces the currently allowed range of these
numbers so I think it's fine.
This matches the convention of the stdlib and avoids ambiguity: when
customErr{} and &customErr{} both implement error, client code needs to
check for both.
Memory use should remain the same, since storing a non-pointer type in
an interface value still copies the value to the heap.
This adds some env vars to the long version string as if they were build
tags. The purpose is to better understand what code was running or not
in the version output, usage reporting and crash reports. In order to
prevent possible privacy issues the actual value of the variable is not
reported, just the fact that it was set to something non-empty.
Example:
% ./bin/syncthing --version
syncthing v1.6.1+47-g1eb104f3-buildtags "Fermium Flea" (go1.14.3 darwin-amd64) jb@kvin.kastelo.net 2020-06-03 07:25:46 UTC [stnoupgrade, use_badger]
Group the global list of files by version, instead of having one flat list for all devices. This removes lots of duplicate protocol.Vectors.
Co-authored-by: Jakob Borg <jakob@kastelo.net>
This reduces the size of our write batches before we flush them. This
has two effects: reducing the amount of data lost if we crash when
updating the database, and reducing the amount of memory used when we do
large updates without checkpoint (e.g., deleting a folder).
I ran our SyncManyFiles benchmark as it is the one doing most
transactions, however there was no relevant change in any metric (it's
limited by our fsync I expect). This is good as any visible change would
just be a decrease in performance.
I don't have a benchmark on deleting a large folder, taking that part on
trust for now...
If we fail to take the rename shortcut we may crash on a later loop,
because we do trickiness with the indexes but the original buckets[key]
in "range buckets[key]" isn't re-evaluated so i exceeds the max index.
This adds indirection of large version vectors in the same manner as we
already to block lists. The effect is the same: less duplicated data in
some situations.
To mitigate the impact for when this indirection
wouldn't be needed I've added an indirection cutoff for both blocks and
the new version vector stuff: we don't do the indirection at all for
small block lists or small version vectors, instead storing it directly
like we used to do. This is faster for small files and small setups.
Storing assets as []byte requires every compiled-in asset to be copied
into writable memory at program startup. That currently takes up 1.6MB
per syncthing process. Strings stay in the RODATA section and should be
shared between processes running the same binary.
This makes version vector values clock based instead of just incremented
from zero. The effect is that a vector that is created from scratch
(after database reset) will have a higher value for the local device
than what it could have been previously, causing a conflict. That is, if
we are A and we had
{A: 42, B: 12}
in the old scheme, a reset and rescan would give us
{A: 1}
which is a strict ancestor of the older file (this might be wrong). With
the new scheme we would instead have
{A: someClockTime, b: otherClockTime}
and the new version after reset would become
{A: someClockTime+delta}
which is in conflict with the previous entry (better).
In case the clocks are wrong (current time is less than the value in the
vector) we fall back to just simple increment like today.
This scheme is ineffective if we suffer a database reset while at the
same time setting the clock back far into the past. It's however no
worse than what we already do.
This loses the ability to emit the "added" event, as we can't look for
the magic 1 entry any more. That event was however already broken
(#5541).
Another place where we infer meaning from the vector itself is in
receive only folders, but there the only criteria is that the vector is
one item long and includes just ourselves, which remains the case with
this change.
* wip
This changes the build script to build all the things in one go
invocation, instead of one invocation per cmd. This is a lot faster
because it means more things get compiled concurrently. It's especially
a lot faster when things *don't* need to be rebuilt, possibly because it
only needs to build the dependency map and such once instead of once per
binary.
In order for this to work we need to be able to pass the same ldflags to
all the binaries. This means we can't set the program name with an
ldflag.
When it needs to rebuild everything (go clean -cache):
( ./old-build -gocmd go1.14.2 build all 2> /dev/null; ) 65.82s user 11.28s system 574% cpu 13.409 total
( ./new-build -gocmd go1.14.2 build all 2> /dev/null; ) 63.26s user 7.12s system 1220% cpu 5.766 total
On a subsequent run (nothing to build, just link the binaries):
( ./old-build -gocmd go1.14.2 build all 2> /dev/null; ) 26.58s user 7.53s system 582% cpu 5.853 total
( ./new-build -gocmd go1.14.2 build all 2> /dev/null; ) 18.66s user 2.45s system 1090% cpu 1.935 total
This makes the GC runner a service that will stop fairly quickly when
told to.
As a bonus, STTRACE=app will print the service tree on the way out,
including any errors they've flagged.
So, in a funny plot twist, it turns out that WriteFile in Go 1.13
doesn't actually set the read only bit on Windows when called with
permissions 0444 so my test was broken. With an improved test it turns
out that Rename does not, in fact, overwrite a read-only file on
Windows. This adds a fix for that.
(Rename might get improved in Go 1.15...)
This adds the functionality to run a user search with a filter for LDAP
authentication. The search is done after successful bind, as the binding
user. The typical use case is to limit authentication to users who are
member of a group or under a certain OU. For example, to only match
users in the "Syncthing" group in otherwise default Active Directory
set up for example.com:
<searchBaseDN>CN=Users,DC=example,DC=com</searchBaseDN>
<searchFilter>(&(sAMAccountName=%s)(memberOf=CN=Syncthing,CN=Users,DC=example,DC=com))</searchFilter>
The search filter is an "and" of two criteria (with the ampersand being
XML quoted),
- "(sAMAccountName=%s)" matches the user logging in
- "(memberOf=CN=Syncthing,CN=Users,DC=example,DC=com)" matches members
of the group in question.
Authentication will only proceed if the search filter matches precisely
one user.
The previous implementation was very generic; its tests didn't cover the
actual alphabet for device IDs.
Benchmark results on amd64:
name old time/op new time/op delta
Luhnify-8 1.00µs ± 1% 0.28µs ± 4% -72.38% (p=0.000 n=9+10)
Unluhnify-8 992ns ± 2% 274ns ± 1% -72.39% (p=0.000 n=10+9)