If you’re thinking of running OpenSearch on Kubernetes, you have to check out the OpenSearch Kubernetes Operator. It’s by far the easiest way to get going, you can configure pretty much everything and it has nice functionality, such as rolling upgrades and draining nodes before shutting them down.
Also, keep in mind that Sematext offers services for OpenSearch consulting, support, and training. Check out this short video below to learn more:
Let’s get going 🙂
Minimal Installation
You need three commands to start. First, add the Helm repository:
helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
Then install the Operator:
helm install opensearch-operator opensearch-operator/opensearch-operator
Finally, install your first OpenSearch cluster:
helm install example-opensearch opensearch-operator/opensearch-cluster
If you’re on an empty Kubernetes and you have the luxury to run kubectl get all
and grok the output, you’ll notice the following (after a few minutes):
- A StatefulSet where OpenSearch nodes run. You have 3 OpenSearch pods by default
- A Deployment for OpenSearch Dashboards – by default with one pod
- There’s also a deployment for the Operator itself
- A service to connect to OpenSearch (by default,
my-cluster
) - A service to connect to OpenSearch Dashboards (by default,
my-cluster-dashboards
) - A couple of “side” pods that take care of bootstrapping the cluster and applying the security configuration via securityadmin.sh
Once your cluster is up (e.g. your StatefulSet is 3/3), you should be able to connect to OpenSearch via port forwarding:
kubectl port-forward svc/my-cluster 9200:9200
And then test it via:
curl -k -u 'admin:admin' 'https://localhost:9200'
And/or you can port forward the OpenSearch Dashboards port:
kubectl port-forward svc/my-cluster-dashboards 5601:5601
After that, you can connect to http://localhost:5601 and log in with the user admin
and the password admin
. Don’t worry, we’ll change the default credentials in a bit. But until then, let’s tackle two immediate issues:
- If the
curl
above worked, you might notice that you’re running an old version of OpenSearch. At the time of writing this, with OpenSearch Operator 2.4, the latest version of OpenSearch is 2.9. Yet you’ll see OpenSearch 2.3 - The installation may fail, because by default the cluster asks for a lot of resources. In my case, it didn’t fit the local Docker Desktop with 8GB of RAM. One OpenSearch pod failed and
kubectl describe
told me there aren’t enough resources. But surely 8GB of RAM is enough for a test OpenSearch cluster
Basic Changes: Version, Resources
To change the default parameters, we’ll need a values file. You can also override via the command line, but you’ll see there are quite a few such parameters. Here’s a sample values file, let’s call it values.yaml
:
opensearchCluster: enabled: true general: httpPort: "9200" version: 2.9.0 serviceName: "my-cluster" drainDataNodes: true setVMMaxMapCount: true nodePools: - component: nodes replicas: 3 roles: - "cluster_manager" - "data" jvm: -Xmx256M -Xms256M resources: requests: memory: "500Mi" cpu: "500m" security: tls: transport: generate: true http: generate: true dashboards: enable: true replicas: 1 version: 2.9.0 resources: requests: memory: "500Mi" cpu: "500m"
You’d apply it via:
helm install -f values.yaml example-opensearch opensearch-operator/opensearch-cluster
If you previously installed an OpenSearch cluster, you’ll want to remove it first via helm uninstall example-opensearch
. And you may also want to remove the persistent volumes of old nodes (check kubectl get pvc
for details).
Once installation completes, you should have an OpenSearch 2.9 cluster asking for less memory than before. Let’s take a closer look at the parameters from values.yaml
:
- The general section has to do with your OpenSearch cluster as a whole:
- HTTP port to listen to.
- Version 2.9.
- Name of the Kubernetes service. It’s different than the cluster name, the cluster name is the Helm release name (
example-opensearch
in our case). - Whether to drain the data nodes on shutdown. We want that, otherwise we can lose data
- Whether to set
vm.max_map_count
in thesysctl
of the host (required for OpenSearch to allocate lots of virtual memory when mmapping files).
- nodePools defines different types of nodes. For example, cluster managers vs data nodes, but here our nodes have both roles. Here you can also change the number of pods (nodes) and also its resource requests and limits. By default, the heap size is going to be half of the requested memory, but you can override
Xms
andXmx
and other java parameters via thejvm
option. - The security section allows you to configure TLS for inter-node communication (transport) as well as HTTP, and whether you’d have the Operator generate certificates for you (vs supplying your own, which is probably what you’ll do in production).
- Finally, you may enable OpenSearch Dashboards (it’s enabled by default) and there you can also provide the number of replicas (pods), version, requests, etc. Dashboards requests 1GB of RAM by default, we can reduce that for a local test, too.
At this point we have the simplest cluster with three nodes holding all roles. In production, we’ll want to have dedicated cluster managers, or maybe a more complex configuration, like hot-cold. The Operator supports all this, let’s have a look how.
Cluster Layout
So far, all nodes are the same because they’re part of the same nodePool. But we can have more. Here’s an example cluster with three dedicated cluster manager nodes and two data nodes:
nodePools: - component: cluster-managers diskSize: "1Gi" replicas: 3 roles: - "cluster_manager" resources: requests: memory: "500Mi" cpu: "500m" - component: data-nodes diskSize: "30Gi" replicas: 2 roles: - "data" resources: requests: memory: "500Mi" cpu: "500m"
This will create two StatefulSets: one example-opensearch-cluster-managers
with the three dedicated cluster managers, and one for the data nodes with a similarly constructed name: helm release + node pool name.
Note the diskSize
option: it dictates the size of the Persistent Volume that the Operator creates for OpenSearch to store data.
Security
By default, OpenSearch Dashboards listens on plain HTTP. If you want it to listen on HTTPS, the dashboards
section of your values file can look like this:
dashboards: enable: true tls: enable: true generate: true replicas: 1 version: 2.9.0
Last but not least, you might want to change the default admin user credentials. Or any other security aspect, such as roles or roles mapping. To do that, you’ll need a secret that holds your security configuration. You can find an example here. In that secret, you’ll provide the content of all the security YAML files that are normally in the config/opensearch-security/
within a tar.gz installation.
For changing the default admin user, the essential part is this one:
apiVersion: v1 kind: Secret metadata: name: securityconfig-secret type: Opaque stringData: internal_users.yml: |- _meta: type: "internalusers" config_version: 2 admin: hash: "$2y$12$B1MZUbsRd6AbhUXiSw2GOejrlPrnqgoDwgPm/LqH0VTlF8xM2.leO" reserved: true backend_roles: - "admin" description: "Demo admin user" action_groups.yml: |- …
In this case, I’m keeping the username as admin
, but I’m changing the password to admin123
, as the Operator documentation also exemplifies. OpenSearch validates the password via the hash
, which we have to compute. If you have the tar.gz archive unpacked somewhere, you can compute the hash of a password via:
plugins/opensearch-security/tools/hash.sh -p admin123
If not, you can try using Python, as the Operator documentation suggests:
python -c 'import bcrypt; print(bcrypt.hashpw("admin123".encode("utf-8"), bcrypt.gensalt(12, prefix=b"2a")).decode("utf-8"))'
You’d apply the secret via:
kubectl create -f securityconfig-secret.yaml
Now OpenSearch can validate our new password, once security is initialized. But in order to initialize security, we need to give the Operator the default admin credentials. To do that, we’ll use another secret, where we base64-encode both the username and the password of our admin
user. Let’s call it admin-credentials-secret
:
apiVersion: v1 kind: Secret metadata: name: admin-credentials-secret type: Opaque data: # admin username: YWRtaW4= # admin123 password: YWRtaW4xMjM=
Finally, we’ll put the secrets in our Helm values file. The security
section would look like this:
security: config: adminCredentialsSecret: # these are the admin credentials for the Operator to use name: admin-credentials-secret securityConfigSecret: # this is the whole security configuration for OpenSearch name: securityconfig-secret tls: transport: generate: true http: generate: true
If you have OpenSearch Dashboards enabled, it also needs to communicate with OpenSearch (internally, separately from your login user when you go to Dashboards). You’ll provide that username and password with a secret as well, just like we did with adminCredentialsSecret
.
To keep things simple, we’ll use the same default admin user (i.e. the same admin-credentials-secret
), so the dashboard
section becomes:
dashboards: enable: true tls: enable: true generate: true opensearchCredentialsSecret: # doesn't have to be the same as adminCredentialsSecret name: admin-credentials-secret replicas: 1 version: 2.9.0
And that’s it! With your new secrets and values file, if you reinstall your opensearch-cluster
release, you should have your own admin credentials and Dashboards will work over HTTPS.
Cool! What’s Next?
That depends on what you’d like to do 🙂 Here’s an FAQ:
Other Options for the Operator?
Yes, the OpenSearch Operator Documentation explains most of them quite nicely. I didn’t yet find one that didn’t work with Helm, but you can check the chart template here.
Anything Exciting Coming Up?
Yes, I’m looking forward to autoscaling. Check out the roadmap for more. Solr can already do that and there’s another operator that works for both Elasticsearch and OpenSearch that supports autoscaling: you’ll find a tutorial focused on the logs use-case here.
How Do I Troubleshoot?
I’m usually doing kubectl get statefulsets
, kubectl get pods
and the like to see what’s going on. If something is wrong, kubectl describe
and kubectl logs
were my best friends.
Though at some point I got sick of going back and forth between the logs of the operator, those of OpenSearch pods and those of “helper” pods. So I just set up Sematext Logs: a helm repo add
, a helm install
, then I chose the logs to ship via log discovery. In 5 minutes I could filter out all the “I can’t connect to TLS” noise (of course TLS doesn’t work yet, it didn’t get set up) and focus on the important messages.
If Sematext Logs sounds interesting, click on the big orange button at the end of this post to start your trial. Or check out Opensearch performance monitoring and Opensearch log monitoring integrations.
Where Do I Learn More about OpenSearch or get help?
Here. We offer OpenSearch training, consulting and production support. In other words, we can help you get things done or help you get the skills to get things done yourself. Or both!
Any Interesting Jobs with OpenSearch and Kubernetes?
Yes, I just happen to know a company who’s hiring worldwide.