ElasticSearch Analyzer and Tokenizer for Emails -


i not find perfect solution either in google or es following situation, hope here.

suppose there 5 email addresses stored under field "email":

1. {"email": "john.doe@gmail.com"} 2. {"email": "john.doe@gmail.com, john.doe@outlook.com"} 3. {"email": "hello-john.doe@outlook.com"} 4. {"email": "john.doe@outlook.com} 5. {"email": "john@yahoo.com"} 

i want fulfill following searching scenarios:

[search -> receive]

"john.doe@gmail.com" -> 1,2

"john.doe@outlook.com" -> 2,4

"john@yahoo.com" -> 5

"john.doe" -> 1,2,3,4

"john" -> 1,2,3,4,5

"gmail.com" -> 1,2

"outlook.com" -> 2,3,4

the first 3 matchings must, , rest of them more precise better. have tried different combinations of index/search analyzers, tokenizers, , filters. tried work on condition match queries, did not find ideal solution, thought welcome, , no limit mappings, analyzers, or kind of query use, thanks.

mapping:

put /test {   "settings": {     "analysis": {       "filter": {         "email": {           "type": "pattern_capture",           "preserve_original": 1,           "patterns": [             "([^@]+)",             "(\\p{l}+)",             "(\\d+)",             "@(.+)",             "([^-@]+)"           ]         }       },       "analyzer": {         "email": {           "tokenizer": "uax_url_email",           "filter": [             "email",             "lowercase",             "unique"           ]         }       }     }   },   "mappings": {     "emails": {       "properties": {         "email": {           "type": "string",           "analyzer": "email"         }       }     }   } } 

test data:

post /test/emails/_bulk {"index":{"_id":"1"}} {"email": "john.doe@gmail.com"} {"index":{"_id":"2"}} {"email": "john.doe@gmail.com, john.doe@outlook.com"} {"index":{"_id":"3"}} {"email": "hello-john.doe@outlook.com"} {"index":{"_id":"4"}} {"email": "john.doe@outlook.com"} {"index":{"_id":"5"}} {"email": "john@yahoo.com"} 

query used:

get /test/emails/_search {   "query": {     "term": {       "email": "john.doe@gmail.com"     }   } } 

Comments