CopyPastor

Detecting plagiarism made easy.

Score: 0.8176968693733215; Reported for: String similarity Open both answers

Possible Plagiarism

Plagiarized on 2022-04-25
by ESCoder

Original Post

Original - Posted on 2020-03-16
by jaspreet chahal



            
Present in both answers; Present only in the new answer; Present only in the old answer;

If you are using Elasticsearch default index mapping, then you can use a [term query][1] on `text.keyword` field.
The term query is used to return documents that match exactly with the search term.
You can include the `term` query in the bool should clause, which will increase the score of exact matching documents, as compared to the other documents' score.

``` { "query": { "bool": { "should": [ { "match": { "text": "tonight" } }, { "term": { "text.keyword": "tonight" } } ] } } } ``` Search Results will be
``` { "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, "relation": "eq" }, "max_score": 1.5025805, "hits": [ { "_index": "stof", "_id": "1", "_score": 1.5025805, "_source": { "id": 1, "text": "tonight" } }, { "_index": "stof", "_id": "3", "_score": 0.13236837, "_source": { "id": 3, "text": "tonight tonight tonight" } }, { "_index": "stof", "_id": "2", "_score": 0.12794474, "_source": { "id": 2, "text": "tonight tonight" } }, { "_index": "stof", "_id": "5", "_score": 0.08185939, "_source": { "id": 5, "text": "tonight and you" } }, { "_index": "stof", "_id": "4", "_score": 0.07130444, "_source": { "id": 4, "text": "tonight and something else" } } ] } } ```
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
There is no direct way to do it in elastic search. The closest thing that can be done is to use [multi-term vectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html)
Query ```` POST /index51/_mtermvectors { "ids" : ["1", "2"], --> Ids of all documents (_id) "parameters": { "fields": [ "text" ], "term_statistics": true } } ```` It will return list of all documents with statistics for each word in the field
Result: ```` { "docs" : [ { "_index" : "index51", "_type" : "_doc", "_id" : "1", "_version" : 2, "found" : true, "took" : 3, "term_vectors" : { "text" : { "field_statistics" : { "sum_doc_freq" : 7, "doc_count" : 3, "sum_ttf" : 7 }, "terms" : { "another" : { "doc_freq" : 2, "ttf" : 2, "term_freq" : 1, "tokens" : [ { "position" : 0, "start_offset" : 0, "end_offset" : 7 } ] }, "test" : { "doc_freq" : 3, "ttf" : 3, "term_freq" : 1, "tokens" : [ { "position" : 2, "start_offset" : 16, "end_offset" : 20 } ] }, "twitter" : { "doc_freq" : 2, "ttf" : 2, "term_freq" : 1, "tokens" : [ { "position" : 1, "start_offset" : 8, "end_offset" : 15 } ] } } } } }, { "_index" : "index51", "_type" : "_doc", "_id" : "2", "_version" : 1, "found" : true, "took" : 2, "term_vectors" : { "text" : { "field_statistics" : { "sum_doc_freq" : 7, "doc_count" : 3, "sum_ttf" : 7 }, "terms" : { "test" : { "doc_freq" : 3, "ttf" : 3, "term_freq" : 1, "tokens" : [ { "position" : 0, "start_offset" : 0, "end_offset" : 4 } ] } } } } } ] } ````
Ids of all documents can be fetched using [scroll api](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll)

        
Present in both answers; Present only in the new answer; Present only in the old answer;