Hi, all We have a search engine service built with lucene 4.7, it seem that lucene eat too much momery, and we have approximate 10 million document，the index size on disk is approximate 750G. My question is why the FST$Arc objects consume so much memory? please refer to the following histo stat of jmap. Hope anybody can give me some suggestion.
Hi, Uwe Thanks for your timely reply. Yes, those documents are huge text. We have ten indices, and each of them has approximate 75G index size on disk. Following is the directory content of one of the indices.
the terms dictionary is using the "tim" and "tip" files. It should be approximately in the dimension of the FST.
Do you have all indexes running in the same JVM or is it 10 servers? Because then the numbers look correct. If you really want to have such an large index in a single machine using a single JVM, you should plan for more heap space. I'd start with 12 GiB of heap space to run this index.
A last recommendation: If you update your index during runtime, make sure that you correctly close the outdated IndexReaders/IndexSearchers (e.g. using SearcherManager), so you don't have orphaned instances of IndexReader consuming heap space and disk space, because the files can't be fully deleted as long as those are open!