Skip to Content
0
Jul 29, 2020 at 01:26 PM

Fuzzy search typo: o insted of 0

41 Views

Dear readers,

I'm currently trying to find duplicate invoices in the system based on the reference number (field xblnr).

The following dataset is used:
BELNR XBLNR
88669911 1230678
88669923 123O678

To find the duplicates based on field XBLNR i'm using the following SQL:
select belnr, xblnr, score() as score
from "FUZZYTEST"
where contains (xblnr, '1230678', fuzzy(0.70))

This gives me the following output:
BELNR XBLNR SCORE
88669911 1230678 1
88669923 123O678 0.8911111

I know that the 2nd record has a typo, they used the character "O" instead of the 0 (zero). To my opinion the score of 0.891111 is much to low! It's really important that this fuzzy search delivers a higher score. I tried many things with fuzzy search to add parameters etc. but nothing did work. Does anyone have an idea to make the score higher when a typo has been made?