* Online-Utility.org
Utilities for Online Operating System

 

Online text analysis tool algorithm

Allows you to find the most frequent phrases and frequencies of words. Non-English language texts are supported. - we use some variant of "shingling" algorithm
- instead of words we handle with hash values
- each word get its hash value
- for phrases we group hash values of words with << operator, hash(phrase)=hash(word1)<<7+hash(word2)<<14+....
- we create several HashTables with (hashvalue, occurrences) pair, read all hashvalues from the text and update occurrences
- extract data from these HashTables, find some hashvalues with most occurrences, and than finds its original values in the text