syncthing

mirror of https://github.com/octoleo/syncthing.git synced 2024-12-22 10:58:57 +00:00

Author	SHA1	Message	Date
Jakob Borg	77970d5113	refactor: use modern Protobuf encoder (#9817 ) At a high level, this is what I've done and why: - I'm moving the protobuf generation for the `protocol`, `discovery` and `db` packages to the modern alternatives, and using `buf` to generate because it's nice and simple. - After trying various approaches on how to integrate the new types with the existing code, I opted for splitting off our own data model types from the on-the-wire generated types. This means we can have a `FileInfo` type with nicer ergonomics and lots of methods, while the protobuf generated type stays clean and close to the wire protocol. It does mean copying between the two when required, which certainly adds a small amount of inefficiency. If we want to walk this back in the future and use the raw generated type throughout, that's possible, this however makes the refactor smaller (!) as it doesn't change everything about the type for everyone at the same time. - I have simply removed in cold blood a significant number of old database migrations. These depended on previous generations of generated messages of various kinds and were annoying to support in the new fashion. The oldest supported database version now is the one from Syncthing 1.9.0 from Sep 7, 2020. - I changed config structs to be regular manually defined structs. For the sake of discussion, some things I tried that turned out not to work... ### Embedding / wrapping Embedding the protobuf generated structs in our existing types as a data container and keeping our methods and stuff: ``` package protocol type FileInfo struct { generated.FileInfo } ``` This generates a lot of problems because the internal shape of the generated struct is quite different (different names, different types, more pointers), because initializing it doesn't work like you'd expect (i.e., you end up with an embedded nil pointer and a panic), and because the types of child types don't get wrapped. That is, even if we also have a similar wrapper around a `Vector`, that's not the type you get when accessing `someFileInfo.Version`, you get the `generated.Vector` that doesn't have methods, etc. ### Aliasing ``` package protocol type FileInfo = generated.FileInfo ``` Doesn't help because you can't attach methods to it, plus all the above. ### Generating the types into the target package like we do now and attaching methods This fails because of the different shape of the generated type (as in the embedding case above) plus the generated struct already has a bunch of methods that we can't necessarily override properly (like `String()` and a bunch of getters). ### Methods to functions I considered just moving all the methods we attach to functions in a specific package, so that for example ``` package protocol func (f FileInfo) Equal(other FileInfo) bool ``` would become ``` package fileinfos func Equal(a, b generated.FileInfo) bool ``` and this would mostly work, but becomes quite verbose and cumbersome, and somewhat limits discoverability (you can't see what methods are available on the type in auto completions, etc). In the end I did this in some cases, like in the database layer where a lot of things like `func (fv FileVersion) IsEmpty() bool` becomes `func fvIsEmpty(fv *generated.FileVersion)` because they were anyway just internal methods. Fixes #8247	2024-12-01 16:50:17 +01:00
Jakob Borg	6d64daaba3	chore(db): process "unchanged" files anyway (#9755 ) Skipping these makes the sequence numbering inconcistent; we've received a file and suppsedly added it to the database, but if you check the sequence number afterwards it didn't increase, i.e., we trigger [this failure condition](`47f48faed7/lib/model/indexhandler.go (L447-L459)`) and, similarly, a future update will look like there was a hole in the numbering. I propose to at least temporarily remove this optimisation in order for things to make more sense. Is there a reason to keep this beyond saving some database operations?	2024-10-04 19:47:57 +00:00
Gusted	356c5055ad	lib/sha256: Remove it (#9643 ) ### Purpose Remove the `lib/sha256` package, because it's no longer necessary. Go's standard library now has the same performance and is on par with `sha256-simd` since [Since Go 1.21](`1a64574f42`). Therefore using `sha256-simd` has no benefits anymore. ARM already has optimized sha256 assembly code since `7b8a7f8272`, `sha256-simd` published their results before that optimized assembly was implemented, `f941fedda8`. The assembly looks very similar and the benchmarks in the Go commit match that of `sha256-simd`. This patch removes all of the related code of `lib/sha256` and makes `crypto/sha256` the 'default'. Benchmark of `sha256-simd` and `crypto/sha256`: <details> ``` cpu: AMD Ryzen 5 3600X 6-Core Processor │ simd.txt │ go.txt │ │ sec/op │ sec/op vs base │ Hash/8Bytes-12 63.25n ± 1% 73.38n ± 1% +16.02% (p=0.002 n=6) Hash/64Bytes-12 98.73n ± 1% 105.30n ± 1% +6.65% (p=0.002 n=6) Hash/1K-12 567.2n ± 1% 572.8n ± 1% +0.99% (p=0.002 n=6) Hash/8K-12 4.062µ ± 1% 4.062µ ± 1% ~ (p=0.396 n=6) Hash/1M-12 512.1µ ± 0% 510.6µ ± 1% ~ (p=0.485 n=6) Hash/5M-12 2.556m ± 1% 2.564m ± 0% ~ (p=0.093 n=6) Hash/10M-12 5.112m ± 0% 5.127m ± 0% ~ (p=0.093 n=6) geomean 13.82µ 14.27µ +3.28% │ simd.txt │ go.txt │ │ B/s │ B/s vs base │ Hash/8Bytes-12 120.6Mi ± 1% 104.0Mi ± 1% -13.81% (p=0.002 n=6) Hash/64Bytes-12 618.2Mi ± 1% 579.8Mi ± 1% -6.22% (p=0.002 n=6) Hash/1K-12 1.682Gi ± 1% 1.665Gi ± 1% -0.98% (p=0.002 n=6) Hash/8K-12 1.878Gi ± 1% 1.878Gi ± 1% ~ (p=0.310 n=6) Hash/1M-12 1.907Gi ± 0% 1.913Gi ± 1% ~ (p=0.485 n=6) Hash/5M-12 1.911Gi ± 1% 1.904Gi ± 0% ~ (p=0.093 n=6) Hash/10M-12 1.910Gi ± 0% 1.905Gi ± 0% ~ (p=0.093 n=6) geomean 1.066Gi 1.032Gi -3.18% ``` </details> ### Testing Compiled and tested on Linux. ### Documentation https://github.com/syncthing/docs/pull/874	2024-08-10 12:58:20 +01:00
Jakob Borg	61b94b9ea5	lib/db: Drop indexes for outgoing data to force refresh (ref #9496 ) (#9502 ) ### Purpose Resend our indexes since we fixed that index-sending issue. I made a new thing to only drop the non-local-device index IDs, i.e., those for other devices. This means we will see a mismatch and resend all indexes, but they will not. This is somewhat cleaner as it avoids resending everything twice when two devices are upgraded, and in any case, we have no reason to force a resend of incoming indexes here. ### Testing It happens on my computer...	2024-04-08 11:14:27 +02:00
Jakob Borg	acd767b30b	all: Remove lib/util package (#9049 ) Grab-bag packages are nasty, this cleans it up a little by splitting it into topical packages sempahore, netutil, stringutil, structutil.	2023-08-21 19:44:33 +02:00
Simon Frei	591e4d8af1	gui, lib: Fix tracking deleted locally-changed on encrypted (fixes #7715 ) (#7726 )	2021-11-10 09:46:21 +01:00
greatroar	f96c211198	lib/db: Replace SipHash with hash/maphash (#7962 )	2021-09-24 21:26:07 +02:00
greatroar	7fa141ea39	all: Unused args, retvals, assignments (#7926 )	2021-09-08 00:11:16 +02:00
Simon Frei	aeca1fb575	lib/db: Check if sequences change when repairing metadata (#7770 )	2021-06-17 13:53:39 +02:00
Jakob Borg	ce65aea0ab	lib/db: Use a more concurrent GC (fixes #7722 ) (#7750 ) This changes the GC mechanism so that the first pass (which reads all FileInfos to populate bloom filters with block & version hashes) can happen concurrently with normal database operations. The big gcMut still exists, and we grab it temporarily to block all other modifications while we set up the bloom filters. We then release the lock and let other things happen, with those other things also updating the bloom filters as required. Once the first phase is done we again grab the gcMut, knowing that we are the sole modifier of the database, and do the cleanup. I also removed the final compaction step.	2021-06-07 10:52:06 +02:00
greatroar	95c9561e97	lib/db: Clean up Timer and wait for logging before return in GC (#7720 )	2021-05-31 09:50:21 +02:00
Simon Frei	58592e3ef1	lib/db: Add logging for GC (#7707 )	2021-05-22 21:36:43 +02:00
Simon Frei	0d054f9b64	lib/model: Don't use empty folder cfg for index sender (fixes #7649 ) (#7671 )	2021-05-15 11:13:39 +02:00
Simon Frei	f30f9c50f8	lib/db: Handle indirection error repairing sequences (fixes #7026 ) (#7525 )	2021-04-05 10:24:16 +02:00
Jakob Borg	6e5514419d	lib/db: Fix some omitted error checks, unused variable (#7489 )	2021-03-17 21:41:07 +01:00
Simon Frei	273ee09925	lib/db, lib/model: Allow needing invalid files (fixes #7474 ) (#7476 )	2021-03-15 07:58:01 +01:00
Simon Frei	27a34609a1	all: Failure reporting fixes (#7331 )	2021-02-05 11:21:14 +01:00
Simon Frei	070bf3b776	lib/db: Report number of repaired items from checkGlobal (#7329 )	2021-02-04 14:42:46 +01:00
Simon Frei	a20a5f61f0	lib/ur: Send unreported failures on shutdown (#7164 )	2020-12-22 20:17:14 +01:00
Simon Frei	78bd0341a8	all: Handle errors opening db/creating file-set (ref #5907 ) (#7150 )	2020-12-21 12:59:22 +01:00
Simon Frei	4a787986cd	lib/db: Prevent IndexID creation race (#7211 )	2020-12-21 11:32:59 +01:00
Simon Frei	bd0c9913cf	lib/db: Remove index ids when dropping folder (#7200 )	2020-12-21 11:10:59 +01:00
Simon Frei	fa40ccece1	lib: Consistently set suture logging (#7202 )	2020-12-18 19:44:00 +01:00
Simon Frei	9524b51708	all: Implement suture v4-api (#6947 )	2020-11-17 13:19:04 +01:00
Simon Frei	01a7ef3b0f	lib/db: Undo adding user info to panic msgs (ref #7029 ) (#7040 )	2020-10-19 08:40:37 +02:00
Simon Frei	23c935b05a	lib/db: Ignore not found on delete in recalcGlobal (ref #7026 ) (#7041 )	2020-10-19 08:28:53 +02:00
Simon Frei	d828adb648	cmd/stcrashreceiver, lib/db: Improve panic message handling (#7029 )	2020-10-08 17:37:45 +02:00
Simon Frei	14ae330eff	lib/db: Improve error handling checking db (ref #7026 ) (#7027 )	2020-10-06 20:14:09 +02:00
Simon Frei	08bebbe59b	lib/db, lib/syncthing: Don't repair DB on upgrade, but on error (fixes #6917 ) (#6971 )	2020-09-10 10:54:41 +02:00
Simon Frei	7b821d2550	lib/db: Add local need check&repair (#6950 )	2020-09-04 14:01:46 +02:00
Simon Frei	cc1f6e4d4a	lib/db, lib/model: Cover exec-paths with debug logging (#6918 )	2020-08-20 16:11:20 +02:00
Simon Frei	8f5215878b	lib/db: Don't put truncated files (ref #6855 , ref #6501 ) (#6888 )	2020-08-18 09:20:12 +02:00
Simon Frei	424d1b1608	lib/db: Commit meta when dropping device (#6862 )	2020-07-28 16:46:42 +02:00
Simon Frei	1b9e5c0937	lib/db: Include blocks in db check (ref #6855 ) (#6861 )	2020-07-28 16:25:07 +02:00
Audrius Butkevicius	55147f5901	lib/db: Rework flush hooks (#6838 )	2020-07-19 08:55:27 +02:00
greatroar	9f92f8c609	lib/db: Use SipHash to deal with hash collision in GC (#6826 ) If the GC finds a key k that it wants to keep, it records that in a Bloom filter. If a key k' can be removed but its hash collides with k, it will be kept. Since the old Bloom filter code was completely deterministic, the next run would encounter the same collision, assuming k must still be kept. A randomized hash function that uses all the SHA-256 bits solves this problem: the second run has a non-zero probability of removing k', as long as the Bloom filter is not completely full.	2020-07-11 09:36:09 +02:00
Simon Frei	8cf9d91ed4	lib: Print nicely rounded durations (#6756 )	2020-06-18 10:55:41 +02:00
Simon Frei	cbe0d2fffc	lib/db: Improve error message on meta inconsistency (#6751 )	2020-06-17 10:03:39 +02:00
André Colomb	46536509d7	lib/protocol: Avoid panic in DeviceIDFromBytes (#6714 )	2020-06-07 10:31:12 +02:00
Simon Frei	1f8e6c55f6	lib/db: Refactor to use global list by version (fixes #6372 ) (#6638 ) Group the global list of files by version, instead of having one flat list for all devices. This removes lots of duplicate protocol.Vectors. Co-authored-by: Jakob Borg <jakob@kastelo.net>	2020-05-30 09:50:23 +02:00
Jakob Borg	94beed5c10	lib/db: Add Badger backend (fixes #5910 ) (#6250 )	2020-05-29 13:43:02 +02:00
Audrius Butkevicius	a1c5b44c74	lib/model: Fix rename handling (ref #6650 ) (#6652 )	2020-05-16 14:39:27 +02:00
Simon Frei	974551375e	lib/db: Dont add symlinks to blocks map (fixes #6637 ) (#6639 )	2020-05-13 20:38:21 +02:00
Jakob Borg	531ceb2b0f	Add indirection for large version vectors. (#6376 ) This adds indirection of large version vectors in the same manner as we already to block lists. The effect is the same: less duplicated data in some situations. To mitigate the impact for when this indirection wouldn't be needed I've added an indirection cutoff for both blocks and the new version vector stuff: we don't do the indirection at all for small block lists or small version vectors, instead storing it directly like we used to do. This is faster for small files and small setups.	2020-05-13 14:28:42 +02:00
Audrius Butkevicius	decb967969	all: Reorder sequences for better rename detection (#6574 )	2020-05-11 20:15:11 +02:00
Simon Frei	a94951becd	lib/db, lib/model: Keep need stats in metadata (ref #5899 ) (#6413 )	2020-05-11 15:07:06 +02:00
Jakob Borg	c0c18a568c	lib/db: Hold the bloom filter the right way (fixes #6614 ) (#6617 )	2020-05-08 14:18:00 +02:00
greatroar	0e5ba3ca05	lib/db: Upgrade to Blobloom v0.1.1 (#6553 ) Now faster and Apache-licensed.	2020-04-20 14:23:36 +02:00
greatroar	44b0f0b456	lib/db: Switch to faster blobloom Bloom filter pkg (#6537 )	2020-04-20 09:02:33 +02:00
Jakob Borg	0e67c036bb	lib/db: Make database GC a service, stop on Stop() (#6518 ) This makes the GC runner a service that will stop fairly quickly when told to. As a bonus, STTRACE=app will print the service tree on the way out, including any errors they've flagged.	2020-04-12 10:26:57 +02:00

1 2

78 Commits