How fuzzy search is implemented for file paths

by Lance Pollard   Last Updated August 27, 2018 00:05 AM

Say I have file paths like this:

my/long/directory/structure/index.js
my/long/directory/structure2/index.js
my/long/directory/structure3/index.js
my/long/directory/structure.../index.js
my/long/directory/structuren/index.js
my/long/directory/index.js
my/long/directory2/index.js
my/long/directory.../index.js
my/long/directoryn/index.js
my/long/index.js
my/index.js
...

I search for d <n> i (e.g. d 2 i or d 3 i), and it gives this:

my/long/irectory/structure/ndex.js
my/long/irectory/ndex.js

The questions are 2:

  1. How it actually matches the strings.
  2. How it returns the results as syntax highlighted.

For (1), there is the data structure and the algorithm. In terms of data structure, I have seen a lot of autocomplete-like functionality recommended to be implemented as a trie. That makes sense for keywords, but I don't see how that would work for a file path. As a start, it seems you could break it into individual directories/filenames:

my
long
directory
structuren
index.js

And have a trie for each level. That would allow at least for exact prefix match of the file path. I don't see how it would work for fuzzy match, as in a d <n> i search. That search stretches over several layers of this multi-trie system. The only way I could seeing it return fuzzy results is by checking every file path from start to end, thus no better than simply doing a regexp match against each file path in its entirety. So I'm wondering what sort of data structure is actually used for fuzzy matching in this case.

For (2), most fuzzy matching implementations I've seen are based on the levenshtein distance, and simply return a yes/no match, not a parse tree. Wondering if there are implementations that return the parse tree, or there are alternative ways to accomplish this. That way you can easily syntax highlight without having to resort to a hack that duplicates some of the work of finding the matching parts of the file text from the frontend.



Related Questions



Search algorithm on sequential data model

Updated July 30, 2015 14:02 PM

Data Structure for "Intuitive" Text Matching

Updated July 20, 2018 13:05 PM