Since #4340 pulls aren't happening every 10s anymore and may be delayed up to 1h.
This means that no folder error event reaches the web UI for a long time, thus no
failed items will show up for a long time. Now errors are populated when the
web UI is opened.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4650
LGTM: AudriusButkevicius
This is a new revision of the discovery server. Relevant changes and
non-changes:
- Protocol towards clients is unchanged.
- Recommended large scale design is still to be deployed nehind nginx (I
tested, and it's still a lot faster at terminating TLS).
- Database backend is leveldb again, only. It scales enough, is easy to
setup, and we don't need any backend to take care of.
- Server supports replication. This is a simple TCP channel - protect it
with a firewall when deploying over the internet. (We deploy this within
the same datacenter, and with firewall.) Any incoming client announces
are sent over the replication channel(s) to other peer discosrvs.
Incoming replication changes are applied to the database as if they came
from clients, but without the TLS/certificate overhead.
- Metrics are exposed using the prometheus library, when enabled.
- The database values and replication protocol is protobuf, because JSON
was quite CPU intensive when I tried that and benchmarked it.
- The "Retry-After" value for failed lookups gets slowly increased from
a default of 120 seconds, by 5 seconds for each failed lookup,
independently by each discosrv. This lowers the query load over time for
clients that are never seen. The Retry-After maxes out at 3600 after a
couple of weeks of this increase. The number of failed lookups is
stored in the database, now and then (avoiding making each lookup a
database put).
All in all this means clients can be pointed towards a cluster using
just multiple A / AAAA records to gain both load sharing and redundancy
(if one is down, clients will talk to the remaining ones).
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4648
This keeps the data we need about sequence numbers and object counts
persistently in the database. The sizeTracker is expanded into a
metadataTracker than handled multiple folders, and the Counts struct is
made protobuf serializable. It gains a Sequence field to assist in
tracking that as well, and a collection of Counts become a CountsSet
(for serialization purposes).
The initial database scan is also a consistency check of the global
entries. This shouldn't strictly be necessary. Nonetheless I added a
created timestamp to the metadata and set a variable to compare against
that. When the time since the metadata creation is old enough, we drop
the metadata and rebuild from scratch like we used to, while also
consistency checking.
A new environment variable STCHECKDBEVERY can override this interval,
and for example be set to zero to force the check immediately.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4547
LGTM: imsodin
So STDEADLOCK seems to do the same thing as STDEADLOCKTIMEOUT, except in
the other package. Consolidate?
STDEADLOCKTHRESHOLD is actually called STLOCKTHRESHOLD, correct the help
text.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4598
Two issues since the filesystem migration;
- filepath.Base() doesn't do what the code expected when the path ends
in a slash, and we make sure the path ends in a slash.
- the return should be fully qualified, not just the last component.
With this change it works as before, and passes the new test for it.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4591
LGTM: imsodin
This should address issue as described in https://forum.syncthing.net/t/stun-nig-party-with-paused-devices/10942/13
Essentially the model and the connection service goes out of sync in terms of thinking if we are connected or not.
Resort to model as being the ultimate source of truth.
I can't immediately pin down how this happens, yet some ideas.
ConfigSaved happens in separate routine, so it's possbile that we have some sort of device removed yet connection comes in parallel kind of thing.
However, in this case the connection exists in the model, and does not exist in the connection service and the only way for the connection to be removed
in the connection service is device removal from the config.
Given the subject, this might also be related to the device being paused.
Also, adds more info to the logs
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4533
When STHASHING is set, don't benchmark as it's already decided. If weak
hashing isn't set to "auto", don't benchmark that either.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4349
Starting stuff from init() is an antipattern, and the innerProcess
variable isn't 100% reliable. We should sort out the other uses of it as
well in due time.
Also removing the hack on innerProcess as I happened to see it and the
affected versions are now <1% users.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4185
The folder already knew how to stop properly, but the fs.Walk() didn't
and can potentially take a very long time. This adds context support to
Walk and the underlying scanning stuff, and passes in an appropriate
context from above. The stop channel in model.folder is replaced with a
context for this purpose.
To test I added an infiniteFS that represents a large amount of data
(not actually infinite, but close) and verify that walking it is
properly stopped. For that to be implemented smoothly I moved out the
Walk function to it's own type, as typically the implementer of a new
filesystem type might not need or want to reimplement Walk.
It's somewhat tricky to test that this actually works properly on the
actual sendReceiveFolder and so on, as those are started from inside the
model and the filesystem isn't easily pluggable etc. Instead I've tested
that part manually by adding a huge folder and verifying that pause,
resume and reconfig do the right things by looking at debug output.
GitHub-Pull-Request: https://github.com/syncthing/syncthing/pull/4117