CopyPastor

Detecting plagiarism made easy.

Score: 1; Reported for: Exact paragraph match Open both answers

Possible Plagiarism

Plagiarized on 2019-09-13
by Harshal Parekh

Original Post

Original - Posted on 2017-09-09
by nitheism



            
Present in both answers; Present only in the new answer; Present only in the old answer;

You want to look into the [Phrases][1] class in gensim for this. A sample would look something like this:
// read the txt file sentences = Text8Corpus(datapath('testcorpus.txt')) phrases = Phrases(sentences, min_count=1, threshold=1) bigram = Phraser(phrases) sent = [u'trees', u'graph', u'minors'] // look for words in "sent" print(bigram[sent]) [u'trees_graph', u'minors'] // output
To get trigrams and so on, you should use the bigram model that you already have and apply Phrases to it again, and so on. Example:
// to create the bigrams bigram_model = Phrases(unigram_sentences) // apply the trained model to a sentence for unigram_sentence in unigram_sentences: bigram_sentence = u' '.join(bigram_model[unigram_sentence]) // get a trigram model out of the bigram trigram_model = Phrases(bigram_sentences)
Hope this helps you, but next time give us more information on what you are using and your efforts, etc.
Leave further questions in comments. Good luck.
[1]: https://radimrehurek.com/gensim/models/phrases.html
First of all you should use gensim's class [Phrases](https://radimrehurek.com/gensim/models/phrases.html) in order to get bigrams, which works as pointed in the doc
>>> bigram = Phraser(phrases) >>> sent = [u'the', u'mayor', u'of', u'new', u'york', u'was', u'there'] >>> print(bigram[sent]) [u'the', u'mayor', u'of', u'new_york', u'was', u'there']
To get trigrams and so on, you should use the bigram model that you already have and apply Phrases to it again, and so on. Example:
trigram_model = Phrases(bigram_sentences)
Also there is a good notebook and video that explains how to use that .... [the notebook](https://github.com/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb), [the video](https://www.youtube.com/watch?v=6zm9NC9uRkk&list=PLw5BhoADa9fUtHRGND_rrdeZlv1XFM8In&index=14&t=3339s)
The most important part of it is how to use it in real life sentences which is as follows:
// to create the bigrams bigram_model = Phrases(unigram_sentences) // apply the trained model to a sentence for unigram_sentence in unigram_sentences: bigram_sentence = u' '.join(bigram_model[unigram_sentence]) // get a trigram model out of the bigram trigram_model = Phrases(bigram_sentences)

Hope this helps you, but next time give us more information on what you are using and etc.
P.S: Now that you edited it, you are not doing anything in order to get bigrams just splitting it, you have to use Phrases in order to get words like New York as bigrams.

        
Present in both answers; Present only in the new answer; Present only in the old answer;