> All of our indexers send their logs via the ingest nodes

I believe this is your issue.

Ingest nodes work from the bulk/write queue with # of processors pulling from that queue [1]. So when you push all ingest data through only you ingest nodes in your configuration, you are pushing all ingest related needs through 6 cores (3 nodes * 2 cores).  You can think of the 6 cores as concurrent slots available to handle ingest needs.  In your case, those 6 slots need to either a) forward index requests to and wait for responses to respond to the client with success/failures ... or b) pre-process the data via an ingest pipeline then forward index requests and wait for responses to respond to the client with success/failures. Everything beyond 6 will get queued. In both cases you have diminished your "slot" bandwidth to handle ingestion from 48 (12 nodes * 4 cores) down to 6, AND added additional work to be done.

In this case, (based on the information in the original comment), I would not suggest to use dedicated ingest nodes. Rather spend the additional capacity on more data nodes (1-2 more) and allow all data nodes to be ingest capable nodes, and ensure that your clients are sending to all of the data nodes. This should help to prevent bottlenecks, increase max CPU capacity, and should help cover the overhead the additional overhead the ingest node is performing.
[1] https://www.elastic.co/guide/en/elasticsearch/reference/5.6/modules-threadpool.html

---