* lib/db: Deduplicate block lists in database (fixes#5898)
This moves the block list in the database out from being just a field on
the FileInfo to being an object of its own. When putting a FileInfo we
marshal the block list separately and store it keyed by the sha256 of
the marshalled block list. When getting, if we are not doing a
"truncated" get, we do an extra read and unmarshal for the block list.
Old block lists are cleared out by a periodic GC sweep. The alternative
would be to use refcounting, but:
- There is a larger risk of getting that wrong and either dropping a
block list in error or keeping them around forever.
- It's tricky with our current database, as we don't have dirty reads.
This means that if we update two FileInfos with identical block lists in
the same transaction we can't just do read/modify/write for the ref
counters as we wouldn't see our own first update. See above about
tracking this and risks about getting it wrong.
GC uses a bloom filter for keys to avoid heavy RAM usage. GC can't run
concurrently with FileInfo updates so there is a new lock around those
operation at the lowlevel.
The end result is a much more compact database, especially for setups
with many peers where files get duplicated many times.
This is per-key-class stats for a large database I'm currently working
with, under the current schema:
```
0x00: 9138161 items, 870876 KB keys + 7397482 KB data, 95 B + 809 B avg, 1637651 B max
0x01: 185656 items, 10388 KB keys + 1790909 KB data, 55 B + 9646 B avg, 924525 B max
0x02: 916890 items, 84795 KB keys + 3667 KB data, 92 B + 4 B avg, 192 B max
0x03: 384 items, 27 KB keys + 5 KB data, 72 B + 15 B avg, 87 B max
0x04: 1109 items, 17 KB keys + 17 KB data, 15 B + 15 B avg, 69 B max
0x06: 383 items, 3 KB keys + 0 KB data, 9 B + 2 B avg, 18 B max
0x07: 510 items, 4 KB keys + 12 KB data, 9 B + 24 B avg, 41 B max
0x08: 1349 items, 12 KB keys + 10 KB data, 9 B + 8 B avg, 17 B max
0x09: 194 items, 0 KB keys + 123 KB data, 5 B + 634 B avg, 11484 B max
0x0a: 3 items, 0 KB keys + 0 KB data, 14 B + 7 B avg, 30 B max
0x0b: 181836 items, 2363 KB keys + 10694 KB data, 13 B + 58 B avg, 173 B max
Total 10426475 items, 968490 KB keys + 9202925 KB data.
```
Note 7.4 GB of data in class 00, total size 9.2 GB. After running the
migration we get this instead:
```
0x00: 9138161 items, 870876 KB keys + 2611392 KB data, 95 B + 285 B avg, 4788 B max
0x01: 185656 items, 10388 KB keys + 1790909 KB data, 55 B + 9646 B avg, 924525 B max
0x02: 916890 items, 84795 KB keys + 3667 KB data, 92 B + 4 B avg, 192 B max
0x03: 384 items, 27 KB keys + 5 KB data, 72 B + 15 B avg, 87 B max
0x04: 1109 items, 17 KB keys + 17 KB data, 15 B + 15 B avg, 69 B max
0x06: 383 items, 3 KB keys + 0 KB data, 9 B + 2 B avg, 18 B max
0x07: 510 items, 4 KB keys + 12 KB data, 9 B + 24 B avg, 41 B max
0x09: 194 items, 0 KB keys + 123 KB data, 5 B + 634 B avg, 11484 B max
0x0a: 3 items, 0 KB keys + 0 KB data, 14 B + 17 B avg, 51 B max
0x0b: 181836 items, 2363 KB keys + 10694 KB data, 13 B + 58 B avg, 173 B max
0x0d: 44282 items, 1461 KB keys + 61081 KB data, 33 B + 1379 B avg, 1637399 B max
Total 10469408 items, 969939 KB keys + 4477905 KB data.
```
Class 00 is now down to 2.6 GB, with just 61 MB added in class 0d.
There will be some additional reads in some cases which theoretically
hurts performance, but this will be more than compensated for by smaller
writes and better compaction.
On my own home setup which just has three devices and a handful of
folders the difference is smaller in absolute numbers of course, but
still less than half the old size:
```
0x00: 297122 items, 20894 KB keys + 306860 KB data, 70 B + 1032 B avg, 103237 B max
0x01: 115299 items, 7738 KB keys + 17542 KB data, 67 B + 152 B avg, 419 B max
0x02: 1430537 items, 121223 KB keys + 5722 KB data, 84 B + 4 B avg, 253 B max
...
Total 1947412 items, 151268 KB keys + 337485 KB data.
```
to:
```
0x00: 297122 items, 20894 KB keys + 37038 KB data, 70 B + 124 B avg, 520 B max
0x01: 115299 items, 7738 KB keys + 17542 KB data, 67 B + 152 B avg, 419 B max
0x02: 1430537 items, 121223 KB keys + 5722 KB data, 84 B + 4 B avg, 253 B max
...
0x0d: 18041 items, 595 KB keys + 71964 KB data, 33 B + 3988 B avg, 101109 B max
Total 1965447 items, 151863 KB keys + 139628 KB data.
```
* wip
* wip
* wip
* wip
This adds error returns to model methods called by the protocol layer.
Returning an error will cause the connection to be torn down as the
message couldn't be handled. Using this to signal that a folder isn't
currently available will then cause a reconnection a few moments later,
when it'll hopefully work better.
Tested manually by running with STRECHECKDBEVERY=0 on a nontrivially
sized setup. This panics reliably before this patch, but just causes a
disconnect/reconnect now.
As foretold by the prophecy, "once the database refactor is merged, then
shall appear a request to propagate errors from the store known
throughout the land as the NamedspacedKV, and it shall be good".
* lib/versioner: Reduce surface area
This is a refactor while I was anyway rooting around in the versioner.
Instead of exporting every possible implementation and the factory and
letting the caller do whatever, this now encapsulates all that and
exposes a New() that takes a config.VersioningConfiguration.
Given that and that we don't know (from the outside) how a versioner
works or what state it keeps, we now just construct it once per folder
and keep it around. Previously it was recreated for each restore
request.
* unparam
* wip
The check in ClusterConfig() when iterating through announced devices
in a folder explicitly skips entries without a non-zero IndexID.
Therefore, the check for IndexID == 0 just below will never be true
and the intended cleanup of local index data will not happen.
Plainly remove that check to make the intended case distinction work.
* lib/protocol: Wait for reader/writer loops on close (fixes#4170)
* waitgroup
* lib/model: Don't hold lock while closing connection
* fix comments
* review (lock once, func argument) and naming
* cmd/syncthing, lib/gui: Separate gui into own package (ref #4085)
* fix tests
* Don't use main as interface name (make old go happy)
* gui->api
* don't leak state via locations and use in-tree config
* let api (un-)subscribe to config
* interface naming and exporting
* lib/ur
* fix tests and lib/foldersummary
* shorter URVersion and ur debug fix
* review
* model.JsonCompletion(FolderCompletion) -> FolderCompletion.Map()
* rename debug facility https -> api
* folder summaries in model
* disassociate unrelated constants
* fix merge fail
* missing id assignement
* lib/tlsutil: Enable TLS 1.3 when available, on test builds (fixes#5065)
This enables TLS 1.3 negotiation on Go 1.12 by setting the GODEBUG
variable. For now, this just gets enabled on test versions (those with a
dash in the version number).
Users wishing to enable this on production builds can set GODEBUG
manually.
The string representation of connections now includes the TLS version
and cipher suite. This becomes part of the log output on connections.
That is, when talking to an old client:
Established secure connection .../TLS1.2-TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
and now potentially:
Established secure connection .../TLS1.3-TLS_AES_128_GCM_SHA256
(The cipher suite was there previously in the log output, but not the
TLS version.)
I also added this info as a new Crypto() method on the connection, and
propagate this out to the API and GUI, where it can be seen in the
connection address hover (although with bad word wrapping sometimes).
* wip
* wip
* cleanup Fatal in lib/config/config.go
* cleanup Fatal in lib/config/folderconfiguration.go
* cleanup Fatal in lib/model/model.go
* cleanup Fatal in cmd/syncthing/monitor.go
* cleanup Fatal in cmd/syncthing/main.go
* cleanup Fatal in lib/api
* remove Fatal methods from logger
* lowercase in errors.Wrap
* one less channel
I'm working through linter complaints, these are some fixes. Broad
categories:
1) Ignore errors where we can ignore errors: add "_ = ..." construct.
you can argue that this is annoying noise, but apart from silencing the
linter it *does* serve the purpose of highlighting that an error is
being ignored. I think this is OK, because the linter highlighted some
error cases I wasn't aware of (starting CPU profiles, for example).
2) Untyped constants where we though we had set the type.
3) A real bug where we ineffectually assigned to a shadowed err.
4) Some dead code removed.
There'll be more of these, because not all packages are fixed, but the
diff was already large enough.
This avoids waiting until next ping and timeout until the connection is actually
closed both by notifying the peer of the disconnect and by immediately closing
the local end of the connection after that. As a nice side effect, info level
logging about dropped connections now have the actual reason in it, not a generic
timeout error which looks like a real problem with the connection.
This adds a thin type that holds the state associated with the
leveldb.DB, leaving the huge Instance type more or less stateless. Also
moves some keying stuff into the DB package so that other packages need
not know the keying specifics.
(This does not, yet, fix the cmd/stindex program, in order to keep the
diff size down. Hence the keying constants are still exported.)
* lib/model, cmd/syncthing: Wait for folder restarts to complete (fixes#5233)
This is the somewhat ugly - but on the other hand clear - fix for what
is really a somewhat thorny issue. To avoid zombie folder runners a new
mutex is introduced that protects the RestartFolder operation. I hate
adding more mutexes but the alternatives I can think of are worse.
The other part of it is that the POST /rest/system/config operation now
waits for the config commit to complete. The point of this is that until
the commit has completed we should not accept another config commit. If
we did, we could end up with two separate RestartFolders queued in the
background. While they are both correct, and will run without
interfering with each other, we can't guarantee the order in which they
will run. Thus it could happen that the newer config got committed
first, and the older config commited after that, leaving us with the
wrong config running.
* test
* wip
* hax
* hax
* unflake test
* per folder mutexes
* paranoia
* race
The problem here is that we would update the sequence index before
updating the FileInfos, which would result in a high sequence number
pointing to a low-sequence FileInfo. The index sender would pick up the
high sequence number, send the old file, and think everything was good.
On the receiving side the old file is a no-op and ignored. The file
remains out of sync until another update for it happens.
This fixes that by correcting the order of operations in the database
update: first we remove old sequence index entries, then we update the
FileInfos (which now don't have anything pointing to them) and then we
add the sequence indexes (which the index sender can see).
The other option is to add "proper" transactions where required at the
database layer. I actually have a branch for that, but it's literally
thousands of lines of diff and I'm putting that off for another day as
this solves the problem...
The problem here is that we would update the sequence index before
updating the FileInfos, which would result in a high sequence number
pointing to a low-sequence FileInfo. The index sender would pick up the
high sequence number, send the old file, and think everything was good.
On the receiving side the old file is a no-op and ignored. The file
remains out of sync until another update for it happens.
This fixes that by correcting the order of operations in the database
update: first we remove old sequence index entries, then we update the
FileInfos (which now don't have anything pointing to them) and then we
add the sequence indexes (which the index sender can see).
The other option is to add "proper" transactions where required at the
database layer. I actually have a branch for that, but it's literally
thousands of lines of diff and I'm putting that off for another day as
this solves the problem...
Adds a receive only folder type that does not send changes, and where the user can optionally revert local changes. Also changes some of the icons to make the three folder types distinguishable.
We have the invalid bit to indicate that a file isn't good. That's enough for remote devices. For ourselves, it would be good to know sometimes why the file isn't good - because it's an unsupported type, because it matches an ignore pattern, or because we detected the data is bad and we need to rescan it.
Or, and this is the main future reason for the PR, because it's a change detected on a receive only device. We will want something like the invalid flag for those changes, but marking them as invalid today means the scanner will rehash them. Hence something more fine grained is required.
This introduces a LocalFlags fields to the FileInfo where we can stash things that we care about locally. For example,
FlagLocalUnsupported = 1 << 0 // The kind is unsupported, e.g. symlinks on Windows
FlagLocalIgnored = 1 << 1 // Matches local ignore patterns
FlagLocalMustRescan = 1 << 2 // Doesn't match content on disk, must be rechecked fully
The LocalFlags fields isn't sent over the wire; instead the Invalid attribute is calculated based on the flags at index sending time. It's on the FileInfo anyway because that's what we serialize to database etc.
The actual Invalid flag should after this just be considered when building the global state and figuring out availability for remote devices. It is not used for local file index entries.
I'm trying to slowly clean this up a bit, and moving functionality out
into the folder types and having those methods not reach into model is
part of it. That can mean takign some odd arguments in the meantime,
some of those should probably become interfaces or properties on folder
in the long term.
Instead of walking and unmarshalling the entire db and sorting the resulting
file infos by sequence, add store device keys by sequence number in the
database. Thus only the required file infos need be unmarshalled and are already
sorted by index.
When scanner.Walk detects a change, it now returns the new file info as well as the old file info. It also finds deleted and ignored files while scanning.
Also directory deletions are now always committed to db after their children to prevent temporary failure on remote due to non-empty directory.
Since #4340 pulls aren't happening every 10s anymore and may be delayed up to 1h.
This means that no folder error event reaches the web UI for a long time, thus no
failed items will show up for a long time. Now errors are populated when the
web UI is opened.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4650
LGTM: AudriusButkevicius
This no longer pokes at model internals, and only touches the config.
As a result, model handles this in CommitConfiguration, which restarts
the folders if things change, which repopulate m.folderDevice, m.deviceFolder
and other interal mappings.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4639
Fix the folder restart behavior (ignore Label), improve the API for that
(imho).
Also removes the tab switch animation in the settings modal, because
annoying.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4577
This should address issue as described in https://forum.syncthing.net/t/stun-nig-party-with-paused-devices/10942/13
Essentially the model and the connection service goes out of sync in terms of thinking if we are connected or not.
Resort to model as being the ultimate source of truth.
I can't immediately pin down how this happens, yet some ideas.
ConfigSaved happens in separate routine, so it's possbile that we have some sort of device removed yet connection comes in parallel kind of thing.
However, in this case the connection exists in the model, and does not exist in the connection service and the only way for the connection to be removed
in the connection service is device removal from the config.
Given the subject, this might also be related to the device being paused.
Also, adds more info to the logs
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4533
We need to reset prevSeq so that we force a full check when someone
reconnects - the sequence number may not have changed due to the
reconnect. (This is a regression; we did this before f6ea2a7.)
Also add an optimization: we schedule a pull after scanning, but there
is no need to do so if no changes were detected. This matters now
because the scheduled pull actually traverses the database which is
expensive.
This, however, makes the pull not happen on initial scan if there were
no changes during the initial scan. Compensate by always scheduling a
pull after initial scan in the rwfolder itself.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4508
LGTM: imsodin, AudriusButkevicius
This removes a significant, complex chunk of database code. The
"replace" operation walked both the old and new in lockstep and made the
relevant changes to make the new situation correct. But since delta
indexes we pretty much never need this - we just used replace to drop
the existing data and start over.
This makes that explicit and removes the complexity.
(This is one of those things that would be annoying to make case
insensitive, while the actual "drop and then insert" that we do is
easier.)
This is fairly well unit tested...
The one change to the tests is to cover the fact that previously replace
with something identical didn't bump the sequence number, while
obviously removing everything and re-inserting does. This is not
behavior we depend on anywhere.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4500
LGTM: imsodin, AudriusButkevicius
The folder marker conversion forgot to hide the .stfolder. This adds
that, for those who have not yet been converted.
Also adds Hide() calls to the folder start, to mend historical
unhidedness. (I'm sure this will upset someone who is manually managing
their .stignores in the other direction...)
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4384
This solves the erratic test failures on model.TestIgnores by ensuring
that the ignore patterns are reloaded even in the face of unchanged
timestamps.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4208
The folder already knew how to stop properly, but the fs.Walk() didn't
and can potentially take a very long time. This adds context support to
Walk and the underlying scanning stuff, and passes in an appropriate
context from above. The stop channel in model.folder is replaced with a
context for this purpose.
To test I added an infiniteFS that represents a large amount of data
(not actually infinite, but close) and verify that walking it is
properly stopped. For that to be implemented smoothly I moved out the
Walk function to it's own type, as typically the implementer of a new
filesystem type might not need or want to reimplement Walk.
It's somewhat tricky to test that this actually works properly on the
actual sendReceiveFolder and so on, as those are started from inside the
model and the filesystem isn't easily pluggable etc. Instead I've tested
that part manually by adding a huge folder and verifying that pause,
resume and reconfig do the right things by looking at debug output.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4117
Harmonize how we use batches in the model, using ProtoSize() to judge
the actual weight of the entire batch instead of estimating. Use smaller
batches in the block map - I think we might have though that batch.Len()
in the leveldb was the batch size in bytes, but it's actually number of
operations.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4114
This deprecates the current minDiskFreePct setting and introduces
minDiskFree. The latter is, in it's serialized form, a string with a
unit. We accept percentages ("2.35%") and absolute values ("250 k", "12.5
Gi"). Common suffixes are understood. The config editor lets the user
enter the string, and validates it.
We still default to "1 %", but the user can change that to an absolute
value at will.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4087
LGTM: AudriusButkevicius, imsodin
This adds a new config AllowedNetworks per device, which when set should
contain a list of network prefixes (192.168.0.0/126 etc) that are
allowed for the given device. The connection service will not attempt
connections to addresses outside of the given networks and incoming
connections will be rejected as well.
I've added the config to the normal device editor and shown it (when
set) in the device summary on the main screen.
There's a unit test for the IsAllowedNetwork method, I've done some
manual sanity testing on top of that.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4073
One more step on the path of the great refactoring. Touches rwfolder a
little bit since it uses the Lstat from fs as well, but mostly this is
just on the scanner as rwfolder is scheduled for a later refactor.
There are a couple of usages of fs.DefaultFilesystem that will in the
end become a filesystem injected from the top, but that comes later.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4070
LGTM: AudriusButkevicius, imsodin
Basically, if we don't care about the sync status of the file we should
not tag someone else out of sync because they don't have the latest
version. This solves *my* "Syncing - 100%" scenario at least.
The reason this happens seems to be like this, in my situation. I have
three devices, connected in a "line": A-B-C. A is a Mac and litters
.DS_Store files everywhere. I've ignored these, but some escaped into
the folders before I did so. I've also ignored them on B and C but at
different stages. B was flagging C as out of sync, because at the point
the ignores were introduced C had a lower version of .DS_Store than A.
Now none of them are sending updates about it any more since it's
ignored...
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3981
After this change,
- Symlinks on Windows are always unsupported. Sorry.
- Symlinks are always enabled on other platforms. They are just a small
file like anything else. There is no need to special case them. If you
don't want to sync some symlinks, ignore them.
- The protocol doesn't differentiate between different "types" of
symlinks. If that distinction ever does become relevant the individual
devices can figure it out by looking at the destination when they
create the link.
It's backwards compatible in that all the old symlink types are still
understood to be symlinks, and the new SYMLINK type is equivalent to the
old SYMLINK_UNKNOWN which was always a valid way to do it.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3962
LGTM: AudriusButkevicius
Syncthing adds some hidden files when a folder is added, but there is currently
no equivalent cleanup procedure. This change is conservative as not to
accidentally cause data loss.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3874
Instead, trust (and test) that the temp file has appropriate permissions
from the start. The only place where this changes our behavior is for
ignores which go from 0644 to 0600. I'm OK with that.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/3756