Setting up an Opensearch cluster on AKS

Published in

DevOps for you

5 min readApr 2, 2024

“The search is wide open…”

In this article, I aim to share my insights gained from implementing Opensearch on AKS (Azure Kubernetes Service). Opensearch, an open-source search and analytics engine licensed under Apache 2.0, stands as a fork of Elasticsearch (version 7.X).

We gain below advantages while using Opensearch;

It provides native API endpoints and client libraries, consistent with Elasticsearch.
We can overcome licensing issues associated with Elasticsearch.
It comes with additional capabilities such as Machine Learning, Alerting, Cross-Cluster Replication, and more.
It is Cost effective.

Opensearch on Kubernetes provides several advantages:

Better horizontal and vertical scaling capabilities through k8s.
Deployment and upgrades will be similar to any other k8s apps.
Improved observability and management functionalities.
Operating Kubernetes on a public cloud platform enables us to offload VM and cluster management responsibilities.

Streamlining deployment and upgrade methodologies, in alignment with standardized Kubernetes application paradigms.
Elevating observability and management capabilities, facilitating proactive monitoring and efficient issue resolution.
Capitalizing on public cloud platforms for optimized VM and cluster management, reducing operational overheads.

Components

There are 3 types of nodes in OpenSearch

1. Master Node:

The master node is responsible for cluster-wide management operations and maintaining the overall cluster state.
It handles tasks such as creating or deleting indices, adding or removing nodes from the cluster, and managing the allocation of shards across nodes.
A cluster typically has only one master node at a time, and other nodes in the cluster can be elected as a master in case the current master fails.

2. Data Node:

Data nodes store and manage the actual data within indices. They handle tasks related to indexing, searching, and serving data.
Data nodes hold shards, which are the basic units of data storage and retrieval. These shards can be primary or replica shards, contributing to the distribution and redundancy of data.
The more data nodes you have, the more data your cluster can manage, and the better it can distribute and parallelize search and indexing operations.

3. Client Node:

Client nodes are optional and serve as an interface for client applications to interact with the OpenSearch cluster.
They are not responsible for storing data but act as a gateway for routing client requests to the appropriate data nodes and coordinating responses.
Client nodes can help distribute the load of incoming requests and improve the performance of search and indexing operations.

Installing Opensearch in AKS

Steps

Setup AKS cluster with custom VM setting
Add node pool to the cluster
Install Opensearch components using helm charts

Detailed steps

ES uses memory mapped areas (mmapfs) to store its indices. The maximum map count check checks that the kernel allows a process to have at least 262,144 memory-mapped areas and is enforced on Linux only. To pass the maximum map count check, you must configure vm.max_map_count via sysctl to be at least 262144.

sysctl -w vm.max_map_count=262144

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/_maximum_map_count_check.html

In AKS we can implement it by either creating a new image with these values and using it in our worker pools OR setup custom sysctl values while creating the cluster itself.

Create the cluster as below;

az aks create --name opensearch-cluster1 --resource-group opensearch-dev --kubelet-config linuxkubeletconfig.json --linux-os-config linuxosconfig.json

Here in linuxkubeletconfig.json, we are mentioning the permitted kernal controls and in linuxosconfig we pass the sysctl configs.
Ref: https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration?tabs=linux-node-pools

Then add node pools as below;

az nodepool add --name opensrch1 --cluster-name opensearch-cluster1 --resource-group opensearch-dev --kubelet-config linuxkubeletconfig.json --linux-os-config linuxosconfig.json

Download Opensearch helm chart from opensearch repo
https://github.com/opensearch-project/helm-charts.git

Inside helm-charts/charts/opensearch, you can find values.yaml. Create 3 new files for each component as below

master.yaml (top 20 lines)

---
clusterName: "opensearch-cluster"
nodeGroup: "master"

# If discovery.type in the opensearch configuration is set to "single-node",
# this should be set to "true"
# If "true", replicas will be forced to 1
singleNode: false# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "opensearch-cluster-master"# OpenSearch roles that will be applied to this nodeGroup
# These will be set as environment variable "node.roles". E.g. node.roles=master,ingest,data,remote_cluster_client
roles:
  - master
  #- ingest
  #- data
  #- remote_cluster_client

data.yaml (top 20 lines)

---
clusterName: "opensearch-cluster"
nodeGroup: "data"

# If discovery.type in the opensearch configuration is set to "single-node",
# this should be set to "true"
# If "true", replicas will be forced to 1
singleNode: false# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "opensearch-cluster-master"# OpenSearch roles that will be applied to this nodeGroup
# These will be set as environment variable "node.roles". E.g. node.roles=master,ingest,data,remote_cluster_client
roles:
  #- master
  - ingest
  - data
  #- remote_cluster_client

client.yaml (top 20 lines)

---
clusterName: "opensearch-cluster"
nodeGroup: "client"

# If discovery.type in the opensearch configuration is set to "single-node",
# this should be set to "true"
# If "true", replicas will be forced to 1
singleNode: false# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "opensearch-cluster-master"# OpenSearch roles that will be applied to this nodeGroup
# These will be set as environment variable "node.roles". E.g. node.roles=master,ingest,data,remote_cluster_client
roles:
  #- master
  #- ingest
  #- data
  #- remote_cluster_client

Create opensearch namespace and install these components

helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update

helm install opensearch-master opensearch/opensearch -n opensearch -f master.yaml
helm install opensearch-data opensearch/opensearch -n opensearch -f data.yaml
helm install opensearch-client opensearch/opensearch -n opensearch -f client.yaml

Verify the pods are up in Opensearch namespace.

kubectl port-forward <opensearch-dashboard-pod> -n opensearch 5601 &

Conclusion

In conclusion, embracing Opensearch within Kubernetes environments not only unlocks advanced search and analytics capabilities but also empowers organizations with enhanced scalability, streamlined management, and cost-effective solutions.

I hope this article was helpful and adds value to your journey to implement centralized logging. If you liked my article, you can follow my publication for future articles, which give me the motivation to write more. — https://devopsforyou.com/

If we need to update the disk space due to hike in data ingestion, we can update PVC which will seamlessly update the volume size as below.

kubectl get pvc -n opensearch
kubectl patch pvc -n opensearch <pvc-name> --type=json -p '[{"op": "replace", "path": "/spec/resources/requests/storage", "value": "1000Gi"}]'

Next steps will be to create an ingress/Loadbalancer setup for Opensearch and Elastic services and configure filebeat to push logs to the cluster.