ruby - Why do Postgres full text search and Elasticsearch rank results differently? -


i wondering if experience implementing full text search shed light on strange results when comparing postgres's full text search elasticsearch.

i use pair of rails apps test them, each same model (but different gems, 'textacular' pg test, 'searchkick' es test) , same test data:

# seeds.rb  def make_post(body)   {     title: 'a post fruits',     body: body,     num_likes: 0   } end  post.destroy_all  post.create([   make_post('i apples.'),   make_post('i bananas.'),   make_post('i apples , bananas.'),   make_post('i oranges.'),   make_post('i like.') ]) 

but when run bunch of searches on them, results seem make more sense postgres sometimes, make more sense elasticsearch sometimes, , contradict each other in behavior. in following results, list top 2 posts returned each search term, or 1 post or 0 if that's returned:

search for:

'apples':

pg: 1. 'i apples.' 2. 'i apples , bananas.'

es:

  1. 'i apples , bananas.'
  2. 'i apples.'

'bananas':

pg: 1. 'i bananas.' 2. 'i apples , bananas.'

es: 1. 'i bananas.' 2. 'i apples , bananas.'

'apples and':

pg: 1. 'i apples.' 2. 'i apples , bananas.'

es: 1. 'i apples , bananas.'

'apples , bananas':

pg: 1. 'i apples , bananas.'

es: 1. 'i apples , bananas.'

'i apples.':

pg: 1. 'i apples.' 2. 'i apples , bananas.'

es: 1. 'i apples , bananas.' 2. 'i apples.'

'app':

pg: no results

es: 1. 'i apples , bananas.' 2. 'i apples.'

'appl':

pg: 1. 'i apples.' 2. 'i apples , bananas.'

es: 1. 'i apples , bananas.' 2. 'i apples.'

i have admit, default settings, did no tuning or using custom query syntax (to , vs or etc).

you getting weird results elasticsearch, because statistics computed across single shard, not across entire index. fine, because document collections large, when have few documents in shard, statistics don't make lot of sense. in case think statistic in question that's problem avgfieldlength, contributes tfnorm score. try creating new index 1 shard:

put /testindex {   "settings": {     "index": {       "number_of_shards": 1     }   } }   post /testindex/doc/1 {   "body": "i apples." }   post /testindex/doc/2 {   "body": "i apples , bananas." } 

then query:

post /testindex {   "query": {     "query_string": {       "query": "apples"     }  } 

then should see rank:

  1. i apples.
  2. i apples , bananas.

in case want figure out what's going on ranking, can use explain:

post /testindex {   "explain": true,   "query": {     "query_string": {       "query": "apples"     }  } 

all being said, should not expect postgres search ranking match elasticsearch ranking. elasticsearch uses normalized tf-idf score, , postgres not consider document frequency or document length. see question more information: does postgresql use tf-idf?


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -