Elasticsearch is a widely using Search Engine and it’s other use cases are log analytics, full-text search, security intelligence, business analytics etc. It’s open source, you can set it up as a cluster on your own servers. In this article, we will discuss about the basics of Elasticsearch and it’s use cases. How to setup a three node Elasticsearch cluster on CentOS servers.
Little bit history
Shay Banon is the founder of Elasticsearch. The first version of Elasticsearch was released on 2010 February. Here I am adding few words from Wiki…
While thinking about the third version of
he realized that it would be necessary to rewrite big parts of Compass to "create a scalable search solution". So he created "a solution built from the ground up to be distributed" and used a common interface, JSON over HTTP, suitable for programming languages other than Java as well. Shay Banon released the first version of Elasticsearch in February 2010.
Since its release in 2010, Elasticsearch has quickly become the most popular search engine.
What is Elasticsearch?
Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. We can use Elasticsearch in many areas to improve the performance of your infra. Apart from Search Engine, It’s a good option in analytics area. It’s a core component in RELK stack. To analyse the logs and metrics you can use the Elasticsearch cluster as the data store.
We are not discussing these thing in detail in this article. In this blog article, I will explain the steps to setup / configure a three node Elasticsearch cluster in CentOS.
1, Three CentOS servers for setting up the Elasticsearch cluster. Elasticsearch cluster should have a minimum of 3 master-eligible nodes.
2, If possible attach a separate disk for data storage.
3, Memory: Use a minimum 2 GB, the more heap available to Elasticsearch, the more memory it can use for its internal caches, but the less memory it leaves available for the operating system to use for the filesystem cache. Refer this official documentation: Setting the heap size
4, Don’t expose the Elasticsearch process to Public. Make sure you have a private network for inter node communication. For a cluster setup, nodes need to communicate each other.
5, Enable port 9200 and 9300 on all nodes for other nodes in the cluster.
6, Java: Install Java on all the servers.
That’s it. You’re all set to start setting up the three node Elasticsearch cluster.
Steps to setup three node Elasticsearch cluster on CentOS 7
Step 1: Install Java
As I mentioned in prerequisites, Elasticsearch needs Java, so we need to install Java first. To install Java on CentOS, please execute the following command:
yum install java-1.8.0-openjdk
Execute “java -version” and make sure the Java is installed correctly.
Step 2: Download the Elasticsearch RPM
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.rpm
You can download the latest version from here >> Download Elasticsearch << In this page you can see all the packages, RPM, DEB etc…
Step 3: Install using RPM
rpm -i elasticsearch-6.7.2.rpm
Step 4: Start / Enable service
systemctl enable elasticsearch.service
systemctl start elasticsearch.service
Installation part is done. Once you installed it on all three servers, you can start editing the configuration to setup the cluster using these three nodes.
The Elasticsearch configuration file is located here: /etc/elasticsearch/elasticsearch.yml
Before making changes in the configuration make sure that the port 9200 and 9300 are open between the nodes in the cluster. Add firewall rules accordingly. Try telnet / nc and make sure that the connections are okay between nodes..
Step 5: Set minimum memory for JVM
By default, the minimum memory set for JVM is 2gb, if your server has small memory size, change this value in /etc/elasticsearch/jvm.options
Change the value to a minimum based on the memory available on your servers. Examples, -Xms512m or -Xms1g etc
Step 6: Create a Data Directory for Elasticsearch (optional)
It’s better to attach a separate disk for Elasticsearch Data. If you have enough space on your primary disk, you can go ahead with that one. Just create a new directory and set relevant permissions to that directory.
chown -R elasticsearch:elasticsearch /var/lib/elasticsearch/data
chmod -R 775 /var/lib/elasticsearch/data
Step 7: Set Data Directory
We already create a Directory for saving Elasticsearch Data, set that in configuration file.
Step 8: Configure Elasticsearch cluster
As I mentioned, we have to make changes on this configuration file /etc/elasticsearch/elasticsearch.yml You have to make the following changes in configuration file to setup a cluster.
8.1: Stop Elasticsearch, if it’s running.
systemctl stop elasticsearch.service
8.2: On all nodes, setup a cluster name:
Open the configuration file on all the three servers and set the same name as cluster name.
8.3: Set node name for all nodes
8.4: Bind an IP for Elasticsearch
By default, the Elasticsearch process listen on 0.0.0.0 we need to assign the private IP.
8.5: Set discovery by specifying all Nodes IP addresses (Add it on all nodes)
discovery.zen.ping.unicast.hosts: ["10.22.28.112", "10.22.28.113", "10.22.28.114"]
8.6: Specify the number of Master eligible nodes (Add it on all nodes)
8.7: Define Data & Master nodes
This you can add based on your requirement. I added it on all nodes.
8.8: Start Elasticsearch
systemctl start elasticsearch.service
That’s it your cluster is ready. Now you need to check the cluster health and make sure that the cluster is ready for Production use.
Run the following curl call and make sure that the cluster status is Green:
Yes, your cluster is ready to use now. I will create a separate article on basic commands (API calls) of Elasticsearch later.
Modern Monitoring Concepts – An Introduction To Prometheus WorldOne of the important thing in IT is maintaining the infra more reliable and companies are investigating a good amount of money for this. In modern world, the tools are sufficient to collect as many number of metrics as we need and we can create visualisations too. Modern systems can emit thousands or millions of metrics, and modern monitoring tools can collect them all.
But is this good to collect maximum number of metrics from servers or clusters, without knowing its actual power?!?!
Read more… https://www.crybit.com/modern-monitoring-concepts-an-intro-to-prometheus-world/