In the best case (all ascii), this reduces the memory footprint by 60%
and the response time by 15% to 20%. In the worst case (every line has
non-ascii characters), 3 to 4% overhead is observed.
Based on the patch by Matt Westcott (@mjwestcott).
But with a more conservative approach:
- Does not use linearly increasing penalties; It is agreed upon that we
should prefer matching characters at the beginnings of the words, but
it's not always clear that the relevance is inversely proportional to
the distance from the beginning.
- The approach here is more conservative in that the bonus is never
large enough to override the matchlen, so it can be thought of as the
first implicit tiebreak criterion.
- One may argue the change breaks the contract of --tiebreak, but the
judgement depends on the definition of "tie".
This change improves sort ordering for aligned tabular input.
Given the following input:
apple juice 100
apple pie 200
fzf --nth=2 will now prefer the one with pie. Before this change fzf
compared "juice " and "pie ", both of which have the same length.
I profiled fzf and it turned out that it was spending significant amount
of time repeatedly converting character arrays into Unicode codepoints.
This commit greatly improves search performance after the initial scan
by memoizing the converted results.
This commit also addresses the problem of unbounded memory usage of fzf.
fzf is a short-lived process that usually processes small input, so it
was implemented to cache the intermediate results very aggressively with
no notion of cache expiration/eviction. I still think a proper
implementation of caching scheme is definitely an overkill. Instead this
commit introduces limits to the maximum size (or minimum selectivity) of
the intermediate results that can be cached.