On 11/13/2017 12:33 PM, Shamik Bandopadhyay wrote:
You haven't defined in *ANY* way exactly what a "source" is or how that
data actually gets into Solr. Without that information, it'll be
difficult to even understand your requirements.
If I make one assumption that for all of the data sources, the config
and schema are going to be identical, then I can give you this information:
If you set up each source as a collection in your SolrCloud, you can
create collection aliases that let you query multiple collections with
one query. Whether or not this will work correctly will depend on a few
factors, but most of all whether or not all the data is using the same
(or extremely similar) Solr config/schema.
10 million documents producing 60GB of index data means that the
documents are relatively large, but aren't super huge -- or that the
data in them is duplicated several times. For contrast, I have an index
where each shard has about 30 million docs, and each of those shards is
36GB in size. The entire index has six of these large shards and one
tiny hot shard.
I always get a little anxious when somebody wants best practice
information about Solr configurations and hardware. Any recommendation
that we make will be COMPLETELY wrong for some use cases, indexes,
and/or query patterns. Solr configurations and hardware must be
tailored specifically for the use case, index data, and query patterns
that actually exist. Typically, this means that you have to actually
set up a full system and try it to make any determinations about how
much hardware you need.https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
Regarding your hardware sizing, the only general advice I can give you
is this: Good performance usually ends up requiring significantly more
RAM than users plan on.