Hi Team,

We are facing heap issues in elastic search 1.7.3 version for all the data nodes. Please find the below steps what we have implemented in my cluster.

Step-1: We have total 16 data nodes and each node is having 3 instances (data1, data2 and data3) total we have 48 instances and 3 masters+ 16  separate ingest(search) nodes. All the data nodes are bare metals and each node is having 7.1TB disk.
Filesystem                                         Size  Used Avail Use% Mounted on
/dev/sdi2                                          132G   16G  110G  13% /
devtmpfs                                           252G     0  252G   0% /dev
tmpfs                                              252G     0  252G   0% /dev/shm
tmpfs                                              252G   26M  252G   1% /run
tmpfs                                              252G     0  252G   0% /sys/fs/cgroup
/dev/mapper/Source--ES--eph-volume--367978823--14  7.0T  642G  6.4T   9% /app

Step-2: Please find the ES process configuration

elastic+ 13580     1 48 Dec05 ?        11:42:26 /bin/java -Xms30g -Xmx30g -Djava.awt.headless=true -XX:+UseG1GC -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:MaxGCPauseMillis=200 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=20 -XX:+UseStringDeduplication -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=3335 -Des.max-open-files=true -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/share/elasticsearch/logs/heapdump.hprof -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Delasticsearch -Des.foreground=yes -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.7.3.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.pidfile=/var/run/elasticsearch/10.37.38.124-data1/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/usr/local/var/log/elasticsearch/10.37.38.124-data1 -Des.default.path.data=/app/data/elasticsearch/10.37.38.124-data1 -Des.default.path.conf=/etc/elasticsearch/data1 org.elasticsearch.bootstrap.Elasticsearch

ES_HEAP_SIZE=30g
MAX_LOCKED_MEMORY=unlimited
# Additional Java OPTS
# es_java_opts: "$ES_JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.port=<jmx port> -Des.max-open-files=true",
ES_GC_OPTS="-XX:+UseG1GC -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:MaxGCPauseMillis=200 -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=20 -XX:+UseStringDeduplication -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.port=3335 -Des.max-open-files=true"
export ES_GC_OPTS

Step-3: Please find the settings.

action.auto_create_index: true
action.destructive_requires_name: true
action.disable_delete_all_indices: true
bootstrap.mlockall: true
cluster.name: Cluster_name
cluster.routing.allocation.same_shard.host: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: master_node1:9301,master_node2:9301,master_node3:9301
http.port: 9202
index.mapper.dynamic: true
index.merge.policy.use_compound_file: false
index.number_of_replicas: 0
index.number_of_shards: 1
index.query.bool.max_clause_count: 10000
index.refresh_interval: 1000s
indices.fielddata.cache.size: 10%
indices.recovery.max_bytes_per_sec: 60mb
network.host: 0.0.0.0
node.data: true
node.master: false
script.inline: false
script.stored: false
script.file:   false
script.groovy.sandbox.enabled: false
threadpool.bulk.queue_size: 300
threadpool.index.queue_size: 300
transport.tcp.port: 9302
threadpool.bulk.size: 60
threadpool.bulk.type: fixed
threadpool.index.size: 60
threadpool.index.type: fixed
threadpool.search.queue_size: 400
threadpool.search.size: 60
threadpool.search.type: fixed
discovery.zen.fd.ping_timeout: 180s
discovery.zen.fd.ping_interval: 60s
discovery.zen.fd.ping_retries: 3
indices.cluster.send_refresh_mapping: false
index.merge.policy.max_merge_at_once: 10
index.merge.policy.reclaim_deletes_weight: 2.0
index.merge.policy.max_merged_segment: 5GB
index.merge.policy.expunge_deletes_allowed: 10
index.merge.policy.segments_per_tier: 10

Step-4: Limits configuration setting under /etc/security/limits.conf

# End of file

# End of file
# End of file
*          soft     nproc          65535
*          hard     nproc          65535
*          soft     nofile         65535
*          hard     nofile         65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
elasticsearch soft     nproc          65535
elasticsearch hard     nproc          65535
elasticsearch soft     nofile         65535
elasticsearch hard     nofile         65535
app soft nofile 16384
app hard nofile 16384

We have checked sestatus its already disabled in all the nodes (Centos7 we are using).
sestatus
SELinux status:                 disabled

Free-memory:

free -g
              total        used        free      shared  buff/cache   available
Mem:            503          91         410           0           1         410
Swap:             0           0           0

and also we are dropping the caches for every 5 mints.
#Drop the page cache
*/5 * * * * sync; echo 1 > /proc/sys/vm/drop_caches

We implemented above all the steps but still data nodes are using 24 to 25GB out of 30GB (90%) every time and its not releasing the GC and cluster become red and nodes are going down.

Please suggest me anything we missed setting and configurations to fix this heap issue.