When --with-nth is used, fzf used to preprocess each line and store the
result as rune array, which was wasteful if the line only contains ascii
characters.
Make sure to consistently calculate tiebreak scores based on the
original line.
This change may not be preferable if you filter aligned tabular input on
a subset of columns using --nth. However, if we calculate length
tiebreak only on the matched components instead of the entire line, the
result can be very confusing when multiple --nth components are
specified, so let's keep it simple and consistent.
Close#926
In the best case (all ascii), this reduces the memory footprint by 60%
and the response time by 15% to 20%. In the worst case (every line has
non-ascii characters), 3 to 4% overhead is observed.
This change improves sort ordering for aligned tabular input.
Given the following input:
apple juice 100
apple pie 200
fzf --nth=2 will now prefer the one with pie. Before this change fzf
compared "juice " and "pie ", both of which have the same length.
I profiled fzf and it turned out that it was spending significant amount
of time repeatedly converting character arrays into Unicode codepoints.
This commit greatly improves search performance after the initial scan
by memoizing the converted results.
This commit also addresses the problem of unbounded memory usage of fzf.
fzf is a short-lived process that usually processes small input, so it
was implemented to cache the intermediate results very aggressively with
no notion of cache expiration/eviction. I still think a proper
implementation of caching scheme is definitely an overkill. Instead this
commit introduces limits to the maximum size (or minimum selectivity) of
the intermediate results that can be cached.