Products :: Key Phrase Extractor :: Demo
|
Collocations | SIPs | Terms | New Terms | Key Phrases |
In this demo, we use news article content from Reuters' Technology and
Entertainment feeds. We check for new content every few minutes.
- Collocations are word sequences (i.e. phrases) whose words are seen together more than you would expect given an estimate of how frequent each individual word is in the given text vs. how often they are seen together in the same text. (see Collocations on Wikipedia)
- Statistically Improbably Phrases (SIPs) are phrases that
appear in a text more often than you would expect given how often they
appear in another text. In this demo we extract SIPs by comparing
texts from two different time periods. The text for the new (or you
can think of it as "current") period goes from now to up to 7 days
back. The text for the old (or "past") period is for the 7 days
before that. This may be easier to visualize:
now <==== new text ====> (now - 7 days) <=== text ====> (now - 14 days)
(see SIPs on Wikipedia) - Terms are simply the most popular terms or words from the new/current time period (see SIPs definition above for the description of the new/current time period definition, but note the text from the old/past period is not involved in computation of popular terms).
- New Terms are terms like SIPs, but for individual words. Both text from the old/past and new/current periods are used for extraction of New Terms.
- Key Phrases are a hybrid of Collocations and SIPs. To extract top Key Phrases, the "strength" of a Collocation and the "informativeness/freshness" of a SIP is considered. This affects the selection and ordering of Key Phrases. The influence of strength vs. he informativeness is controlled by the caller.