Products :: Multilingual Indexer
aka. Language-aware Indexer
Multilingual Indexer is a Solr component capable of handling
content in multiple languages and analyzing it appropriately,
based on the language. It relies on Language Identifier
to figure out the primary language of the document, and
processes the content using an Analyzer configured for the
identified language.
Business Value / Benefits
- Makes it possible to index content in various languages and make it searchable
- Provides a single component capable of handling all languages your content is in
Do You Need It?
How do you determine if Multilingual Indexer is for you?
- You need to index and search content in multiple languages
- You, we, Lucene, or Solr have Analyzers for languages you need to handle
Integration
Multilingual Indexer integrates tightly with Solr through a
custom UpdateRequestProcessor.
FAQ
Q: Which languages are supported?
A: Any language that has adequate Analyzers
(Tokenizers and Filters) can be handled. Solr comes with support for:
Arabic, Chinese, Danish, Dutch, English, Finnish, French,
German, Hungarian, Italian, Japanese, Korean, Norwegian, Persian
/ Farsi, Portuguese, Romanian, Russian, Spanish, Swedish,
Turkish. And, of course, our
Morphological Analyzer
integrates smoothly and supports additional languages.
Q: Can documents with multiple languages
in a single document be handled?
A: Yes, every field could be in a different
language, identified separately and analyzed according to its
language.
See also