Tuesday, 21 July 2015

Setup Elasticsearch cluster on CentOS


Please note this was done from developer perspective. I am not administrator and in production security review MUST be done. Since I am opening ports and ES.
Since this is first my post i would like to add small background. Firstly i started with Elasticsearch in 2013, i noticed that its easy to install and maintain, it has variety of plugins, speak json, detect other nodes in network and build cluster so generally its what i will show how to setup it.

Setup Java

ES require java to be installed and they recommend to install Java 8 update 20 or later or Java 7 update 55 or later for ES version 1.2 and later. 
Lets download java firstly
wget --no-check-certificate --no-cookies --header 'Cookie: oraclelicense=accept-securebackup-cookie' http://download.oracle.com/otn-pub/java/jdk/8u25-b17/jre-8u25-linux-x64.rpm
Install it and set JAVA_HOME
rpm -ivh jre-8u25-linux-x64.rpm
export JAVA_HOME=/usr/java/jre1.8.0_25export PATH=$PATH:$JAVA_HOME
To verify that java is installed properly run
java -version
And return should be something like that.
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
Now we are done with Java setup. 

Elasticsearch Setup

Download Publick Signing Key
rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch
Then add ES repository, by creating file /etc/yum.repos.d/elasticsearch.repo, with content
[elasticsearch-1.2]
name=Elasticsearch repository for 1.2.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.2/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1
Now we are ready to install
yum install elasticsearch
After we install service can be run manually, and if server rebooted then service will not be started automatically what is wrong so we need configure auto start using chkconfig
sudo /sbin/chkconfig --add elasticsearch
sudo service elasticsearch start

Configuration & Run

We are done with setup, but our service is not yet started and since by default port 9200 is closed if we run our service it will not find other nodes. To do so we need to open firewall.By default i found that in CentOS there is firewalld service which for me, as .net developer, is a bit hard to maintain, so i found how to disable it and start iptables.
systemctl stop firewalld 
systemctl mask firewalld
yum install iptables-services
systemctl enable iptables
Now open ports
iptables -A INPUT -p tcp -m tcp --dport 9200 -j ACCEPT
iptables-save | sudo tee /etc/sysconfig/iptables
service iptables restart
Set maximum amount of memory which ES can use, there is recommendation that it should be half of server memory and no more than 32 gb, to set it open file /etc/sysconfig/elasticsearch and change 
ES_HEAP_SIZE=4g
We are almost done. Last step is recommendation and not required. To debug and investigate i am using head plugin, its easy to use and manage my indexes and documents. I can run any query and see what was returned, so its up to you if you want install it or not. If you decide to install it you can run simple command
/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head
Please note that here is absolute path to plugin tool, and it could be different depends on your install, more info about pathes
Now we are ready to start!!!
service elasticsearch start
This steps should be repeated for each server you want to add to your cluster

Testing

I have indexed some data to supercustomer index there is around 1.15 gb of data and 458 documents. 
As you see now we have one server NSES1 which contains one index with 5 shards. Lets turn on second machine and see what
After we start new server it was automatically detected and ES is "preparing" shard 0 and 1 to be copied to new server
As you can see now shards are copied, so we are going to add one more server to our cluster.
Same here copy started
And finished. Currently we have cluster which contains 3 server and our shards are splited to them. But here is a problem if any of that server will down then we will have troubles since not all data will be available, to fix that lets change amount of replicas
Usually people think that replicas its only copy of data, but its more than copy its server which can handle additional requests, if your system is under high load more replicas means faster response since not only one server will be processing your requests
To change amount of replicas  command should be sent to ES
{
  "index": {
    "number_of_replicas": 1
  }
}
Es will start coping data across server.


Now our cluster is ready we have 5 shards and one replica, even if one server will be down cluster will still respond to requests.

Volodymyr Bilyachat Web Developer