> Does this mean that, putting pipelines to one side, we should "never" ingest via coordinator nodes?
For clarity, a coordinator only node's (as defined by the doc ) do not pre-process the data via ingest pipelines.
Ingestion via coordinator only nodes is just fine, you just need to make sure you have enough resources in your coordinator only nodes to not bottleneck the ingestion. Adding the ingest node role to the coordinator only node (making it no longer a coordinator only node) changes the math of how much resources are needed.
> would enabling ingest node features add much overhead? Assuming the pipeline config isn't complex...
The overhead depends on a lot factors, you can look at our nightly benchmarks: https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/30d
that is running a grok pipeline, which reduces throughput by ~ 30% (YMMV). You can also run your pipelines with your data in your environment with Rally , can use at the http track as an example .