Commit Graph

13 Commits

Author SHA1 Message Date
Vincent Bernat 2ee07ded2b filter: ability to use negative patterns
This is quite similar to gitignore. If a pattern is suffixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with `--exclude-file` to cancel the exclusion of some files.

Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep `*.c` in some directory should not use:

    ~/work
    !~/work/*.c

But:

    ~/work/*
    !~/work/*.c

I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:

    $HOME/**/*
    !$HOME/Documents
    !$HOME/code
    !$HOME/.emacs.d
    !$HOME/games
    # [...]
    node_modules
    *~
    *.o
    *.lo
    *.pyc
    # [...]
    $HOME/code/linux/*
    !$HOME/code/linux/.git
    # [...]

There are some limitations for this change:

 - Patterns are not mixed accross methods: patterns from file are
   handled first and if a file is excluded with this method, it's not
   possible to reinclude it with `--exclude !something`.

 - Patterns starting with `!` are now interpreted as a negative
   pattern. I don't think anyone was relying on that.

 - The whole list of patterns is walked for each match. We may
   optimize later by exiting early if we know no pattern is starting
   with `!`.

Fix #233
2022-03-20 13:33:08 +01:00
Vincent Bernat 13c40d4199 filter: additional tests for filter.List() 2022-03-20 13:33:08 +01:00
Michael Eischer 55bea6e7a6 filter: Fix crash for '**' pattern 2021-05-14 23:50:31 +02:00
Michael Eischer 54a124de3b filter: explicitly test separate ListWithChild function 2020-10-07 20:47:52 +02:00
Michael Eischer bcc3bddcf4 filter: only check whether a child path could match when necessary
When checking excludes there is no need to test whether a child path
could also match the pattern, as it is by definition excluded.
Previously childMayMatch was calculated but then discarded. For simple
absolute paths this can account for half the time spent for checking
pattern matches.

name                          old time/op    new time/op    delta
FilterPatterns/Relative-4       23.3ms ± 9%    21.7ms ± 6%   -6.68%  (p=0.004 n=10+10)
FilterPatterns/Absolute-4       13.9ms ± 7%    10.0ms ± 5%  -27.61%  (p=0.000 n=10+10)
FilterPatterns/Wildcard-4       51.4ms ± 7%    47.0ms ± 7%   -8.51%  (p=0.001 n=9+9)
FilterPatterns/ManyNoMatch-4     551ms ± 9%     190ms ± 1%  -65.41%  (p=0.000 n=10+8)

name                          old alloc/op   new alloc/op   delta
FilterPatterns/Relative-4       3.57MB ± 0%    3.57MB ± 0%     ~     (p=0.665 n=10+9)
FilterPatterns/Absolute-4       3.57MB ± 0%    3.57MB ± 0%     ~     (p=0.480 n=9+10)
FilterPatterns/Wildcard-4       14.3MB ± 0%    14.3MB ± 0%     ~     (p=0.431 n=9+10)
FilterPatterns/ManyNoMatch-4    3.57MB ± 0%    3.57MB ± 0%     ~     (all equal)

name                          old allocs/op  new allocs/op  delta
FilterPatterns/Relative-4        22.2k ± 0%     22.2k ± 0%     ~     (all equal)
FilterPatterns/Absolute-4        22.2k ± 0%     22.2k ± 0%     ~     (all equal)
FilterPatterns/Wildcard-4        88.7k ± 0%     88.7k ± 0%     ~     (all equal)
FilterPatterns/ManyNoMatch-4     22.2k ± 0%     22.2k ± 0%     ~     (all equal)
2020-10-07 20:47:52 +02:00
Michael Eischer 375c2a56de filter: Parse filter patterns only once
name                          old time/op    new time/op    delta
FilterPatterns/Relative-4       30.3ms ±10%    23.6ms ±20%  -22.12%  (p=0.000 n=10+10)
FilterPatterns/Absolute-4       49.0ms ± 3%    32.3ms ± 8%  -33.94%  (p=0.000 n=8+10)
FilterPatterns/Wildcard-4        345ms ± 9%     334ms ±17%     ~     (p=0.315 n=10+10)
FilterPatterns/ManyNoMatch-4     3.93s ± 2%     0.71s ± 7%  -81.98%  (p=0.000 n=9+10)

name                          old alloc/op   new alloc/op   delta
FilterPatterns/Relative-4       4.63MB ± 0%    3.57MB ± 0%  -22.98%  (p=0.000 n=9+9)
FilterPatterns/Absolute-4       8.54MB ± 0%    3.57MB ± 0%  -58.20%  (p=0.000 n=10+10)
FilterPatterns/Wildcard-4        146MB ± 0%     141MB ± 0%   -2.93%  (p=0.000 n=9+9)
FilterPatterns/ManyNoMatch-4     907MB ± 0%       4MB ± 0%  -99.61%  (p=0.000 n=9+9)

name                          old allocs/op  new allocs/op  delta
FilterPatterns/Relative-4        66.6k ± 0%     22.2k ± 0%  -66.67%  (p=0.000 n=10+10)
FilterPatterns/Absolute-4        88.7k ± 0%     22.2k ± 0%  -75.00%  (p=0.000 n=10+10)
FilterPatterns/Wildcard-4        1.70M ± 0%     1.63M ± 0%   -3.92%  (p=0.000 n=10+10)
FilterPatterns/ManyNoMatch-4     4.46M ± 0%     0.02M ± 0%  -99.50%  (p=0.000 n=10+10)
2020-10-07 20:47:27 +02:00
Michael Eischer e73c281142 filter: Benchmark absolute paths, wildcards and long filter lists 2020-10-07 17:54:36 +02:00
Michael Eischer fdd3b14db3 filter: test some corner cases 2020-10-07 17:09:44 +02:00
Alexander Neumann 14aead94b3 filter: Allow double wildcard in ChildMatch 2018-06-09 23:18:13 +02:00
Michael Pratt 92eb1cbffd filter: document recursive wildcards
Match/ChildMatch accept a ** pattern which is not noted in the doc
string, nor do any of the docs or tests specify whether the match is
greedy (i.e., can 'foo/**/bar' match paths with additional intermediate
bar directories?).

Add a note to the doc string and add test cases for greedy matches.
2017-09-04 14:38:48 -07:00
Loic Nageleisen 4a36993c19 Smarter filter when children won't match
This improves restore performance by several orders of magniture by not
going through the whole tree recursively when we can anticipate that no
match will ever occur.
2017-08-16 15:25:02 +02:00
Alexander Neumann 6caeff2408 Run goimports 2017-07-23 14:21:03 +02:00
Alexander Neumann 83d1a46526 Moves files 2017-07-23 14:19:13 +02:00