Cassandra Install on Centos 6.4 (Datastax Edition)

I have been playing around with the Datastax version of Cassandra and I boiled the whole thing down to a quick couple of steps to get a two plus node cluster operational.

After Deploying new Centos VM

  1. Add DNS record in DC01
  2. Configure Network adapter for correct IP address to get on Network
  3. Open Terminal prompt
  4. vi /etc/hosts and add in IP Address and hostname for DNS
  5. vi /etc/sysconfig/network
  6. Edit HOSTNAME to be name of box ( not FQDN )
  7. /etc/init.d/network restart

Installing JNA

Java Native Access (JNA) is required for production installations. Installing JNA can improve Cassandra memory usage. When installed and configured, Linux does not swap out the JVM, and thus avoids related performance issues.

Install with the following command:

Yum update

then

yum install jna

Steps to install Cassandra

Open a new repo file for editing

vi /etc/yum.repos.d/datastax.repo

Go to insert mode ( I ) in vi and paste the following using putty ssh into the new repo file

[opscenter]

name= DataStax Repository

baseurl=http://rpm.datastax.com/community

enabled=1

gpgcheck=0

Exit vi ( :wq )

yum install dsc12

Do the above steps on all nodes you want to join to the cluster. Once they are deployed do the following

Configure the MultiNode Cassandra Cluster

Make sure that cassandra is not running

Service cassandra status

Set it to start on reboot

chkconfig cassandra on

If service is running (its probably not but check anyways) then run

service cassandra stop

Make the following updates to the cassandra.yaml file

vi /etc/cassandra/default.conf/cassandra.yaml

Modify the following property settings in the cassandra.yaml file for each node:

On node 0 (zero)

cluster_name: ‘[YOURCLUSTERNAME]’

num_tokens: 256

seed_provider:

– class_name: org.apache.cassandra.locator.SimpleSeedProvider

parameters:

– seeds: “[Node Zero IP],[SECOND SEED NODE (not always used)]”

listen_address: [Local IP OF MACHINE]

rpc_address: 0.0.0.0

endpoint_snitch: RackInferringSnitch

node1 to node5

The properties for these nodes are the same as node zero except for the listen_address.

Start the Cassandra Cluster

Start the seed node first then the other nodes using the following command

Service cassandra start

Check to see if the ring is populated after a few minutes by using the nodetool command

Nodetool status

If everything works correctly, you will see the following

Datacenter: 10

==============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

—        Address         Load         Tokens   Owns (effective)  Host ID                                                       Rack

UN     IPADDRESS  41.7 KB    256        49.6%                    fef0c114-205d-4750-bd13-9b7d6dc9059e 104

UN    IPADDRESS   65.51 KB  256        50.4%                   4d4309f0-d850-4c00-ba2a-c772abd1a3a1 104

UN = status of UP and shows two nodes sharing the load.

If Adding a new node to existing cluster

If you add a single new node to your n-node cluster without changing tokens, you will end up with one node owning a higher percentage of the keys and the other nodes owning much less each. Instead, when you bring your new node online, you should specify the initial token (in cassandra.yaml).

Then use nodetool move to move the token on node 2, followed by nodetool cleanup to remove any unneeded data.

For example, in a two node cluster with a new third node you should use the following tokens:

Node 0: 0 — should already be set

Node 1: 56713727820156407428984779325531226112 — this node will require the move

Node 2: 113427455640312814857969558651062453333 — this node will be set using initial token