Adding nodes to an existing Cassandra cluster

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html

Virtual nodes (vnodes) greatly simplify adding nodes to an existing cluster:

  • Calculating tokens and assigning them to each node is no longer required.
  • Rebalancing a cluster is no longer necessary because a node joining the cluster assumes responsibility for an even portion of the data.

For a detailed explanation about how vnodes work, see Virtual nodes.Note: If you do not use vnodes, see Adding single-token nodes to a cluster.

Procedure

Be sure to use the same version of Cassandra on all nodes in the cluster. See Installing earlier releases of Apache Cassandra 3.0.

  1. Install Cassandra on the new nodes, but do not start Cassandra.If your Cassandra installation on Debian starts automatically, you must stop the node and clear the data.
  2. Depending on the snitch used in the cluster, set either the properties in the cassandra-topology.properties or the cassandra-rackdc.properties file:
  3. Set the following properties in the cassandra.yaml file:
    1. auto_bootstrapIf this option has been set to false, you must set it to true. This option is not listed in the default cassandra.yaml configuration file and defaults to true.
    2. cluster_nameThe name of the cluster the new node is joining.listen_address/broadcast_addressCan usually be left blank. Otherwise, use IP address or host name that other Cassandra nodes use to connect to the new node.
    3. endpoint_snitchThe snitch Cassandra uses for locating nodes and routing requests.num_tokensThe number of vnodes to assign to the node. Use the same number of tokens as set on other nodes in the datacenter. Token ranges are proportionally distributed, if the hardware capabilities varies, assign more token ranges to the systems with higher capacity and better performance.
    4. allocate_tokens_for_local_replication_factorSpecify the replication factor (RF) of the keyspaces in the datacenter. If you plan to increase the RF after adding the node, then use the (new) higher value. When the RF varies between keyspaces and you are adding a single node, use the highest RF value. When adding multiple nodes, use alternate between the most data intensive keyspace RFs. For example, in a datacenter with keyspace cycling RF=3, keyspace hockeyRF=2, and keyspace basketballRF =1, set the first node you are adding to 3 and the second node to 2.
    5. seed_providerMake sure that the new node lists at least one node in the existing cluster. The -seeds list determines which nodes the new node should contact to learn about the cluster and establish the gossip process.Note: Seed nodes cannot bootstrap. Make sure the new node is not listed in the -seeds list. Do not make all nodes seed nodes. 
    6. Please read Internode communications (gossip).Other non-default settingsChange any other non-default settings you have made to your existing cluster in the cassandra.yaml file and cassandra-topology.properties orcassandra-rackdc.properties files. Use the diff command to find and merge any differences between existing and new nodes.
  4. Start the bootstrap node.
  5. Use nodetool status to verify that the node is fully bootstrapped and all other nodes are up (UN) and not in any other state.
  6. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove the keys that no longer belong to those nodes. Wait for cleanup to complete on one node before running nodetool cleanup on the next node.Cleanup can be safely postponed for low-usage hours.

LEAVE A COMMENT