Some of our products are available for evaluation. Please contact us to inquire.

Contact Sales:
+1 347-480-1610
info@sematext.com

Products :: Morphological Analyzer

aka. Morpho Analyzer

Morphological Analyzer is a software component capable of detecting morphemes in a piece of text. English text is commonly pre-processed before it is indexed and one of the common pre-processing steps is stemming. Stemming is a process through which suffixes are removed and words are converted to their stems. For example, the word "caring" might be stemmed to "car". Stemming rules for English language are simple. Several known algorithms have been published and their implementations are freely available. Many other languages have more complex morphosyntactic characteristics (e.g. different suffixes or prefixes can be used with a single word depending on the tense, gender, number, case, etc.) and thus more complex rules for their stemming are needed. In most cases there are no publicly known algorithms and/or no available stemming products. Our Morphological Analyzer uses Statistical Natural Language Processing (NLP) to learn about the language morphosyntactic structure and uses that knowledge to detect morphemes. It works exceptionally well for highly-inflected languages - languages whose words tend to have lots of affixes, such as Polish, Czech, Slovak, Croatian, Serbian, etc.

Business Value / Benefits

Do You Need It?

How do you determine if Morphological Analyzer is for you?

Integration

Morphological Analyzer integrates tightly with Lucene and Solr. It exposes the typical Analyzer and Filter APIs for Lucene and additional FilterFactory for Solr. The ability to detect morphemes for a given language requires Morphological Analyzer to first be trained using content in that language, which is what we have already done for all supported languages.

FAQ

Q: Which languages can Morphological Analyzer handle?
A: It is most suitable for highly inflected languages, such as the Slavic family of languages.
Q: How accurate is the Morphological Analyzer?
A: Accuracy depends on the quality and size of the training set. In our experiments, we have achieved results that matched state of the art precision and recall.

See also