aka. Keyword Extractor, Key Word Extractor, Concept Extractor, Collocation Extractor, SIP Extractor
Key Phrase Extractor is a toolkit for extracting key terms (key words) and phrases from text. It is designed to be used in two main modes:
Mode 1: Extractor of common (frequently occurring) phrases. These phrases are known as Collocations.
In this mode the Key Phrase Extractor identifies key phrases in the input text. For example, if Key Phrase Extractor were to analyze the content of Lucene in Action, it would find terms like "Lucene" and "search", as well as phrases such as "inverted index", "information retrieval", "query parser", and so on.
Mode 2: Extractor of phrases based on the comparison of two sets of documents (also known as background and foreground corpora). These phrases are known as Statistically Improbable Phrases or SIPs.
In this mode the Key Phrase Extractor finds the most differentiating phrases between two document sets. For example, when given news articles from the last 7 days and articles from the last 24 hours, the Key Phrase Extractor will identify key terms and phrases in news from the last 24 hours. Key terms and phrases may end up being names of people such as "Steve Jobs" or "Warren Buffett", as well as phrases such as "Swine Flu" or "Somali Pirates", thus identifying people and concepts that have more mentions today than they were yesterday.
Used in this mode, the Key Phrase Extractor is an excellent tool for extraction of popular terms and phrases from a text data stream, such as from news and social media (e.g. blogs, tweets, feeds)!
Key Phrase Extractor exposes a simple Java API, as well as an HTTP API. Given a piece of text it returns a list of phrases ordered by their computed score. The API includes the ability to filter out the returned phrases and the KPE package includes several useful filters. The extensible and very simple filter API lets you write and plug in your own filters, too.
None - ask us !