Products :: Key Phrase Extractor
aka. Keyword Extractor, Key Word Extractor, Concept Extractor, Collocation Extractor, SIP Extractor
Key Phrase Extractor is a toolkit for extracting key terms (key words)
and phrases from text. It is designed to be used in two
main modes:
Mode 1:
Extractor of
common (frequently occurring) phrases. These phrases are
known as
Collocations.
In this mode the Key Phrase Extractor identifies key phrases in
the input text. For example, if Key Phrase Extractor were to
analyze the content of
Lucene
in Action, it would find terms like "Lucene" and "search", as
well as phrases such as "inverted index", "information retrieval",
"query parser", and so on.
Mode 2:
Extractor of phrases based
on the comparison of two sets of documents (also known as
background and foreground corpora). These phrases are known as
Statistically Improbable
Phrases or
SIPs.
In this mode the Key Phrase Extractor finds the most
differentiating phrases between two document sets. For example,
when given news articles from the last 7
days and articles from the last 24 hours, the Key Phrase Extractor
will identify key terms and phrases in news from the last 24
hours. Key terms and phrases may end up being names of
people such as "Steve Jobs" or "Warren Buffett", as well as
phrases such as "Swine Flu" or "Somali Pirates", thus identifying
people and concepts that have more mentions today than they were
yesterday. Used in this mode, the
Key
Phrase Extractor is an excellent tool for extraction of popular
terms and phrases from a text data stream, such as from news and
social media (e.g. blogs, tweets, feeds)!
Applications of Key Phrases
- News & Media: Phrase and term
extraction from a continous content stream
- Content enrichment: Content
tagging (auto-tagging)
- Search Results Relevance: Key Phrases can be indexed in
separate fields whose matches are weighted higher than matches
in other indexed fields, thus increasing the quality of search
results.
- Search Experience: Key Phrases can be used to power AutoComplete
functionality, which helps people search faster, reduces
misspellings and typos and thus improves the overall search
experience for the end user.
- Search Experience: Key Phrases can be used to populate
fields used for faceted search,
thus increasing the findability and browsability of content and
improving overall search experience.
Business Value / Benefits
- Extracts key concepts from content
- Extracts key concepts from multiple pieces of content based on content difference
- Identifies key terms and phrases useful for describing main concepts from a larger piece of text
- Finds key terms and phrases for search results enhancement by providing additional navigational meta-data
Integration
Key Phrase Extractor exposes a simple Java API, as well as an HTTP
API. Given a piece of text it returns a list of phrases ordered
by their computed score. The API includes the ability to filter
out the returned phrases and the KPE package includes several
useful filters. The extensible and very simple filter API lets
you write and plug in your own filters, too.
FAQ
None -
ask us!