sematext

home · products · services · technology · clients · testimonials · jobs · about · contact · blog

Products :: Multilingual Indexer

aka. Language-aware Indexer

Description

Multilingual Indexer is a Solr component capable of handling content in multiple languages and analyzing it appropriately, based on the language. It relies on Language Identifier to figure out the primary language of the document, and processes the content using an Analyzer configured for the identified language.

Business Value / Benefits

  • Makes it possible to index content in various languages and make it searchable
  • Provides a single component capable of handling all languages your content is in

Do You Need It?

How do you determine if Multilingual Indexer is for you?
  • You need to index and search content in multiple languages
  • You, we, Lucene, or Solr have Analyzers for languages you need to handle

Integration

Multilingual Indexer integrates tightly with Solr through a custom UpdateRequestProcessor.

FAQ

Q: Which languages are supported?
A: Any language that has adequate Analyzers (Tokenizers and Filters) can be handled. Solr comes with support for: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Japanese, Korean, Norwegian, Persian / Farsi, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish. And, of course, our Morphological Analyzer integrates smoothly and supports additional languages.
Q: Can documents with multiple languages in a single document be handled?
A: Yes, every field could be in a different language, identified separately and analyzed according to its language.
See Also: