Commit Graph

78 Commits

Author SHA1 Message Date
Jakob Borg
77970d5113
refactor: use modern Protobuf encoder (#9817)
At a high level, this is what I've done and why:

- I'm moving the protobuf generation for the `protocol`, `discovery` and
`db` packages to the modern alternatives, and using `buf` to generate
because it's nice and simple.
- After trying various approaches on how to integrate the new types with
the existing code, I opted for splitting off our own data model types
from the on-the-wire generated types. This means we can have a
`FileInfo` type with nicer ergonomics and lots of methods, while the
protobuf generated type stays clean and close to the wire protocol. It
does mean copying between the two when required, which certainly adds a
small amount of inefficiency. If we want to walk this back in the future
and use the raw generated type throughout, that's possible, this however
makes the refactor smaller (!) as it doesn't change everything about the
type for everyone at the same time.
- I have simply removed in cold blood a significant number of old
database migrations. These depended on previous generations of generated
messages of various kinds and were annoying to support in the new
fashion. The oldest supported database version now is the one from
Syncthing 1.9.0 from Sep 7, 2020.
- I changed config structs to be regular manually defined structs.

For the sake of discussion, some things I tried that turned out not to
work...

### Embedding / wrapping

Embedding the protobuf generated structs in our existing types as a data
container and keeping our methods and stuff:

```
package protocol

type FileInfo struct {
  *generated.FileInfo
}
```

This generates a lot of problems because the internal shape of the
generated struct is quite different (different names, different types,
more pointers), because initializing it doesn't work like you'd expect
(i.e., you end up with an embedded nil pointer and a panic), and because
the types of child types don't get wrapped. That is, even if we also
have a similar wrapper around a `Vector`, that's not the type you get
when accessing `someFileInfo.Version`, you get the `*generated.Vector`
that doesn't have methods, etc.

### Aliasing

```
package protocol

type FileInfo = generated.FileInfo
```

Doesn't help because you can't attach methods to it, plus all the above.

### Generating the types into the target package like we do now and
attaching methods

This fails because of the different shape of the generated type (as in
the embedding case above) plus the generated struct already has a bunch
of methods that we can't necessarily override properly (like `String()`
and a bunch of getters).

### Methods to functions

I considered just moving all the methods we attach to functions in a
specific package, so that for example

```
package protocol

func (f FileInfo) Equal(other FileInfo) bool
```

would become

```
package fileinfos

func Equal(a, b *generated.FileInfo) bool
```

and this would mostly work, but becomes quite verbose and cumbersome,
and somewhat limits discoverability (you can't see what methods are
available on the type in auto completions, etc). In the end I did this
in some cases, like in the database layer where a lot of things like
`func (fv *FileVersion) IsEmpty() bool` becomes `func fvIsEmpty(fv
*generated.FileVersion)` because they were anyway just internal methods.

Fixes #8247
2024-12-01 16:50:17 +01:00
Jakob Borg
6d64daaba3
chore(db): process "unchanged" files anyway (#9755)
Skipping these makes the sequence numbering inconcistent; we've received
a file and suppsedly added it to the database, but if you check the
sequence number afterwards it didn't increase, i.e., we trigger [this
failure
condition](47f48faed7/lib/model/indexhandler.go (L447-L459))
and, similarly, a future update will look like there was a hole in the
numbering.

I propose to at least temporarily remove this optimisation in order for
things to make more sense. Is there a reason to keep this beyond saving
some database operations?
2024-10-04 19:47:57 +00:00
Gusted
356c5055ad
lib/sha256: Remove it (#9643)
### Purpose

Remove the `lib/sha256` package, because it's no longer necessary. Go's
standard library now has the same performance and is on par with
`sha256-simd` since [Since Go
1.21](1a64574f42).
Therefore using `sha256-simd` has no benefits anymore.

ARM already has optimized sha256 assembly code since
7b8a7f8272,
`sha256-simd` published their results before that optimized assembly was
implemented,
f941fedda8.
The assembly looks very similar and the benchmarks in the Go commit
match that of `sha256-simd`.

This patch removes all of the related code of `lib/sha256` and makes
`crypto/sha256` the 'default'.

Benchmark of `sha256-simd` and `crypto/sha256`:
<details>

```
cpu: AMD Ryzen 5 3600X 6-Core Processor
                │  simd.txt   │               go.txt                │
                │   sec/op    │    sec/op     vs base               │
Hash/8Bytes-12    63.25n ± 1%    73.38n ± 1%  +16.02% (p=0.002 n=6)
Hash/64Bytes-12   98.73n ± 1%   105.30n ± 1%   +6.65% (p=0.002 n=6)
Hash/1K-12        567.2n ± 1%    572.8n ± 1%   +0.99% (p=0.002 n=6)
Hash/8K-12        4.062µ ± 1%    4.062µ ± 1%        ~ (p=0.396 n=6)
Hash/1M-12        512.1µ ± 0%    510.6µ ± 1%        ~ (p=0.485 n=6)
Hash/5M-12        2.556m ± 1%    2.564m ± 0%        ~ (p=0.093 n=6)
Hash/10M-12       5.112m ± 0%    5.127m ± 0%        ~ (p=0.093 n=6)
geomean           13.82µ         14.27µ        +3.28%

                │   simd.txt   │               go.txt                │
                │     B/s      │     B/s       vs base               │
Hash/8Bytes-12    120.6Mi ± 1%   104.0Mi ± 1%  -13.81% (p=0.002 n=6)
Hash/64Bytes-12   618.2Mi ± 1%   579.8Mi ± 1%   -6.22% (p=0.002 n=6)
Hash/1K-12        1.682Gi ± 1%   1.665Gi ± 1%   -0.98% (p=0.002 n=6)
Hash/8K-12        1.878Gi ± 1%   1.878Gi ± 1%        ~ (p=0.310 n=6)
Hash/1M-12        1.907Gi ± 0%   1.913Gi ± 1%        ~ (p=0.485 n=6)
Hash/5M-12        1.911Gi ± 1%   1.904Gi ± 0%        ~ (p=0.093 n=6)
Hash/10M-12       1.910Gi ± 0%   1.905Gi ± 0%        ~ (p=0.093 n=6)
geomean           1.066Gi        1.032Gi        -3.18%
```

</details>


### Testing

Compiled and tested on Linux.

### Documentation

https://github.com/syncthing/docs/pull/874
2024-08-10 12:58:20 +01:00
Jakob Borg
61b94b9ea5
lib/db: Drop indexes for outgoing data to force refresh (ref #9496) (#9502)
### Purpose

Resend our indexes since we fixed that index-sending issue.

I made a new thing to only drop the non-local-device index IDs, i.e.,
those for other devices. This means we will see a mismatch and resend
all indexes, but they will not. This is somewhat cleaner as it avoids
resending everything twice when two devices are upgraded, and in any
case, we have no reason to force a resend of incoming indexes here.

### Testing

It happens on my computer...
2024-04-08 11:14:27 +02:00
Jakob Borg
acd767b30b
all: Remove lib/util package (#9049)
Grab-bag packages are nasty, this cleans it up a little by splitting it
into topical packages sempahore, netutil, stringutil, structutil.
2023-08-21 19:44:33 +02:00
Simon Frei
591e4d8af1
gui, lib: Fix tracking deleted locally-changed on encrypted (fixes #7715) (#7726) 2021-11-10 09:46:21 +01:00
greatroar
f96c211198
lib/db: Replace SipHash with hash/maphash (#7962) 2021-09-24 21:26:07 +02:00
greatroar
7fa141ea39
all: Unused args, retvals, assignments (#7926) 2021-09-08 00:11:16 +02:00
Simon Frei
aeca1fb575
lib/db: Check if sequences change when repairing metadata (#7770) 2021-06-17 13:53:39 +02:00
Jakob Borg
ce65aea0ab
lib/db: Use a more concurrent GC (fixes #7722) (#7750)
This changes the GC mechanism so that the first pass (which reads all
FileInfos to populate bloom filters with block & version hashes) can
happen concurrently with normal database operations.

The big gcMut still exists, and we grab it temporarily to block all
other modifications while we set up the bloom filters. We then release
the lock and let other things happen, with those other things also
updating the bloom filters as required. Once the first phase is done we
again grab the gcMut, knowing that we are the sole modifier of the
database, and do the cleanup.

I also removed the final compaction step.
2021-06-07 10:52:06 +02:00
greatroar
95c9561e97
lib/db: Clean up Timer and wait for logging before return in GC (#7720) 2021-05-31 09:50:21 +02:00
Simon Frei
58592e3ef1
lib/db: Add logging for GC (#7707) 2021-05-22 21:36:43 +02:00
Simon Frei
0d054f9b64
lib/model: Don't use empty folder cfg for index sender (fixes #7649) (#7671) 2021-05-15 11:13:39 +02:00
Simon Frei
f30f9c50f8
lib/db: Handle indirection error repairing sequences (fixes #7026) (#7525) 2021-04-05 10:24:16 +02:00
Jakob Borg
6e5514419d
lib/db: Fix some omitted error checks, unused variable (#7489) 2021-03-17 21:41:07 +01:00
Simon Frei
273ee09925
lib/db, lib/model: Allow needing invalid files (fixes #7474) (#7476) 2021-03-15 07:58:01 +01:00
Simon Frei
27a34609a1
all: Failure reporting fixes (#7331) 2021-02-05 11:21:14 +01:00
Simon Frei
070bf3b776
lib/db: Report number of repaired items from checkGlobal (#7329) 2021-02-04 14:42:46 +01:00
Simon Frei
a20a5f61f0
lib/ur: Send unreported failures on shutdown (#7164) 2020-12-22 20:17:14 +01:00
Simon Frei
78bd0341a8
all: Handle errors opening db/creating file-set (ref #5907) (#7150) 2020-12-21 12:59:22 +01:00
Simon Frei
4a787986cd
lib/db: Prevent IndexID creation race (#7211) 2020-12-21 11:32:59 +01:00
Simon Frei
bd0c9913cf
lib/db: Remove index ids when dropping folder (#7200) 2020-12-21 11:10:59 +01:00
Simon Frei
fa40ccece1
lib: Consistently set suture logging (#7202) 2020-12-18 19:44:00 +01:00
Simon Frei
9524b51708
all: Implement suture v4-api (#6947) 2020-11-17 13:19:04 +01:00
Simon Frei
01a7ef3b0f
lib/db: Undo adding user info to panic msgs (ref #7029) (#7040) 2020-10-19 08:40:37 +02:00
Simon Frei
23c935b05a
lib/db: Ignore not found on delete in recalcGlobal (ref #7026) (#7041) 2020-10-19 08:28:53 +02:00
Simon Frei
d828adb648
cmd/stcrashreceiver, lib/db: Improve panic message handling (#7029) 2020-10-08 17:37:45 +02:00
Simon Frei
14ae330eff
lib/db: Improve error handling checking db (ref #7026) (#7027) 2020-10-06 20:14:09 +02:00
Simon Frei
08bebbe59b
lib/db, lib/syncthing: Don't repair DB on upgrade, but on error (fixes #6917) (#6971) 2020-09-10 10:54:41 +02:00
Simon Frei
7b821d2550
lib/db: Add local need check&repair (#6950) 2020-09-04 14:01:46 +02:00
Simon Frei
cc1f6e4d4a
lib/db, lib/model: Cover exec-paths with debug logging (#6918) 2020-08-20 16:11:20 +02:00
Simon Frei
8f5215878b
lib/db: Don't put truncated files (ref #6855, ref #6501) (#6888) 2020-08-18 09:20:12 +02:00
Simon Frei
424d1b1608
lib/db: Commit meta when dropping device (#6862) 2020-07-28 16:46:42 +02:00
Simon Frei
1b9e5c0937
lib/db: Include blocks in db check (ref #6855) (#6861) 2020-07-28 16:25:07 +02:00
Audrius Butkevicius
55147f5901
lib/db: Rework flush hooks (#6838) 2020-07-19 08:55:27 +02:00
greatroar
9f92f8c609
lib/db: Use SipHash to deal with hash collision in GC (#6826)
If the GC finds a key k that it wants to keep, it records that in a
Bloom filter. If a key k' can be removed but its hash collides with k,
it will be kept. Since the old Bloom filter code was completely
deterministic, the next run would encounter the same collision, assuming
k must still be kept.

A randomized hash function that uses all the SHA-256 bits solves this
problem: the second run has a non-zero probability of removing k', as
long as the Bloom filter is not completely full.
2020-07-11 09:36:09 +02:00
Simon Frei
8cf9d91ed4
lib: Print nicely rounded durations (#6756) 2020-06-18 10:55:41 +02:00
Simon Frei
cbe0d2fffc
lib/db: Improve error message on meta inconsistency (#6751) 2020-06-17 10:03:39 +02:00
André Colomb
46536509d7
lib/protocol: Avoid panic in DeviceIDFromBytes (#6714) 2020-06-07 10:31:12 +02:00
Simon Frei
1f8e6c55f6
lib/db: Refactor to use global list by version (fixes #6372) (#6638)
Group the global list of files by version, instead of having one flat list for all devices. This removes lots of duplicate protocol.Vectors.

Co-authored-by: Jakob Borg <jakob@kastelo.net>
2020-05-30 09:50:23 +02:00
Jakob Borg
94beed5c10
lib/db: Add Badger backend (fixes #5910) (#6250) 2020-05-29 13:43:02 +02:00
Audrius Butkevicius
a1c5b44c74
lib/model: Fix rename handling (ref #6650) (#6652) 2020-05-16 14:39:27 +02:00
Simon Frei
974551375e
lib/db: Dont add symlinks to blocks map (fixes #6637) (#6639) 2020-05-13 20:38:21 +02:00
Jakob Borg
531ceb2b0f
Add indirection for large version vectors. (#6376)
This adds indirection of large version vectors in the same manner as we
already to block lists. The effect is the same: less duplicated data in
some situations.

To mitigate the impact for when this indirection
wouldn't be needed I've added an indirection cutoff for both blocks and
the new version vector stuff: we don't do the indirection at all for
small block lists or small version vectors, instead storing it directly
like we used to do. This is faster for small files and small setups.
2020-05-13 14:28:42 +02:00
Audrius Butkevicius
decb967969
all: Reorder sequences for better rename detection (#6574) 2020-05-11 20:15:11 +02:00
Simon Frei
a94951becd
lib/db, lib/model: Keep need stats in metadata (ref #5899) (#6413) 2020-05-11 15:07:06 +02:00
Jakob Borg
c0c18a568c
lib/db: Hold the bloom filter the right way (fixes #6614) (#6617) 2020-05-08 14:18:00 +02:00
greatroar
0e5ba3ca05
lib/db: Upgrade to Blobloom v0.1.1 (#6553)
Now faster and Apache-licensed.
2020-04-20 14:23:36 +02:00
greatroar
44b0f0b456
lib/db: Switch to faster blobloom Bloom filter pkg (#6537) 2020-04-20 09:02:33 +02:00
Jakob Borg
0e67c036bb
lib/db: Make database GC a service, stop on Stop() (#6518)
This makes the GC runner a service that will stop fairly quickly when
told to.

As a bonus, STTRACE=app will print the service tree on the way out,
including any errors they've flagged.
2020-04-12 10:26:57 +02:00