diff options
| author | 2019-10-18 10:39:27 +0700 | |
|---|---|---|
| committer | 2019-10-18 10:39:27 +0700 | |
| commit | 8c2871f89846f2c34f52061323dc7503855266ea (patch) | |
| tree | 34ef3457570340c1dcc3d0aad7855ae029bcba79 /tests/helpers.py | |
| parent | Fix rule alias. (#537) (diff) | |
Make it easier for user to search for tags
#### Closes #231
Applying the algorithm for `Needles and Haystack` to find and match tag in tags, for example:

This only applies to searching tag_name with more than 3 in length, and at least 80% of its letters are found, from left to right.
There are 3 levels of checking, stop at first found:
- Check if exact name ( case insensitive ) O(1) getting from a dictionary Dict[str, Tag]
- Check for all tags that has 100% matching via algorithm
- Check for all tags that has >= 80% matching
If there are more than one hit, it will be shown as suggestions:

In order to avoid api being called multiple times, I've implemented a cache to only refresh itself when the is a gap of more than 5 minutes from the last api call to get all tags.
Editing / Adding / Deleting tags will also modify the cache directly.
##### What about other solution like fuzzywuzzy?
fuzzywuzzy was considered for using, but from testing, it was giving much lower scores than expected:
Code used to test:
```py
from fuzzywuzzy import fuzz
def _fuzzy_search(search: str, target: str) -> bool:
found = 0
index = 0
_search = search.lower().replace(' ', '')
_target = target.lower().replace(' ', '')
for letter in _search:
index = _target.find(letter, index)
if index == -1:
break
found += index > 0
# return found / len(_search) * 100
return (
found / len(_search) * 100,
fuzz.ratio(search, target),
fuzz.partial_ratio(search, target)
)
tests = (
'this-is-gonna-be-fun',
'this-too-will-be-fun'
)
for test in tests:
print(test, '->', _fuzzy_search('this too fun', test))
```
Result from test:
```py
this-is-gonna-be-fun -> (30.0, 50, 50)
this-too-will-be-fun -> (90.0, 62, 58)
```
Diffstat (limited to 'tests/helpers.py')
0 files changed, 0 insertions, 0 deletions