algorithm - Normalizing the edit distance -


i have question can normalize levenshtein edit distance dividing e.d value length of 2 strings? asking because, if compare 2 strings of unequal length, difference between lengths of 2 counted well. eg: ed('has a', 'has ball') = 4 , ed('has a', 'has ball round') = 15. if increase length of string, edit distance increase though similar. therefore, can not set value, edit distance value should be.

yes, normalizing edit distance 1 way put differences between strings on single scale "identical" "nothing in common".

a few things consider:

  1. whether or not normalized distance better measure of similarity between strings depends on application. if question "how word misspelling of word?", normalization way go. if it's "how has document changed since last version?", raw edit distance may better option.
  2. if want result in range [0, 1], need divide distance maximum possible distance between 2 strings of given lengths. is, length(str1)+length(str2) lcs distance , max(length(str1), length(str2)) levenshtein distance.
  3. the normalized distance not metric, violates triangle inequality.

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -