Setting Up a ScyllaDB Cluster on AWS Using Terraform

In this article, I present an example of a simple and quick installation of ScyllaDB in the AWS cloud using Terraform.

Initially, I intended to create a ScyllaDB AMI image using HashiCorp Packer. However, I later discovered that official images are available, allowing ScyllaDB to be easily configured during instance initialization via user data.

In fact, user data can define all parameters supported in scylla.yaml. Additional options and examples can be found in scylla-machine-image GitHub repository.

What else should you know? For ScyllaDB to automatically configure and start, supported instance types must be used. A list of such instance types can be found here: ScyllaDB System Requirements for AWS. In our example, we’ll use the i4i.large type as it is the cheapest among supported types.

Assumptions

  • A single seed node is sufficient for the setup.
  • Hosts are publicly accessible with restricted access from a specific IP address (a static public IP is required).

Terraform Configuration Example

Following best practices, the Terraform code is divided into multiple files in a single directory.

Variables File (variables.tf)

Plain Text

 

variable "scylladb_version" {
  type        = string
  default     = "6.2.1"
  description = "The version of the ScyllaDB to install."
}

variable "your_public_network" {
  type        = string
  default     = "0.0.0.0/0"
  description = "Your public static IP address or your provider network."
}

variable "instance_type" {
  type        = string
  default     = "i4i.large"
  description = "The AWS instance type."
}

variable "number_of_regular_hosts" {
  type        = number
  default     = 2
  description = "The number of the regular (not seed) hosts in a cluster."
}

variable "ssh_key_name" {
  type        = string
  default     = "my_ssh_key"
  description = "The name of your public SSH key uploaded to AWS."
}

This file contains the definition of variables used in the code contained in main.tf. We’ll discuss them later.

Main Configuration File (main.tf)

Plain Text

 

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = "eu-west-1"
}

variable "scylladb_version" {
  type        = string
  default     = "6.2.1"
  description = "The version of the ScyllaDB to install."
}

variable "your_public_network" {
  type        = string
  default     = "0.0.0.0/0"
  description = "Your public static IP address or your provider network."
}

variable "instance_type" {
  type        = string
  default     = "i4i.large"
  description = "The AWS instance type."
}

variable "number_of_regular_hosts" {
  type        = number
  default     = 2
  description = "The number of the regular (not seed) hosts in a cluster."
}

variable "ssh_key_name" {
  type        = string
  default     = "my_ssh_key"
  description = "The name of your public SSH key uploaded to AWS."
}

data "aws_ami" "scylladb_ami" {
  filter {
    name = "name"
    values = ["ScyllaDB ${var.scylladb_version}"]
  }
}

resource "aws_security_group" "scylladb_all" {
  name        = "scylladb_all"
  description = "Will allow all inbound traffic from your public IP"

  tags = {
    Name = "ScyllaDB"
  }
}

resource "aws_vpc_security_group_ingress_rule" "allow_all_inbound_traffic_ipv4" {
  security_group_id = aws_security_group.scylladb_all.id
  cidr_ipv4         = var.your_public_network
  ip_protocol       = "-1" # semantically equivalent to all ports
}

resource "aws_vpc_security_group_ingress_rule" "allow_all_internal_traffic_ipv4" {
  security_group_id            = aws_security_group.scylladb_all.id
  referenced_security_group_id = aws_security_group.scylladb_all.id
  ip_protocol                  = "-1" # semantically equivalent to all ports
}

resource "aws_vpc_security_group_egress_rule" "allow_all_traffic_ipv4" {
  security_group_id = aws_security_group.scylladb_all.id
  cidr_ipv4         = "0.0.0.0/0"
  ip_protocol       = "-1" # semantically equivalent to all ports
}

resource "aws_instance" "scylladb_seed" {
  ami           = data.aws_ami.scylladb_ami.id
  instance_type = var.instance_type
  vpc_security_group_ids = [aws_security_group.scylladb_all.id]
  key_name      = var.ssh_key_name

  user_data = <<EOF
scylla_yaml:
  cluster_name: test-cluster
  experimental: true
start_scylla_on_first_boot: true
EOF

  tags = {
    Name = "ScyllaDB seed"
  }
}

resource "aws_instance" "scylladb_host" {
  ami           = data.aws_ami.scylladb_ami.id
  instance_type = var.instance_type
  vpc_security_group_ids = [aws_security_group.scylladb_all.id]
  key_name      = var.ssh_key_name

  user_data = <<EOF
scylla_yaml:
  cluster_name: test-cluster
  experimental: true
  seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
        - seeds: ${aws_instance.scylladb_seed.private_ip}
start_scylla_on_first_boot: true
EOF

  tags = {
    Name = "ScyllaDB host"
  }

  count = var.number_of_regular_hosts
}

The main.tf file describes the infrastructure resources to be created.

File Describing Outputs (outputs.tf)

Plain Text

 

output "scylladb_seed_public_ip" {
  value       = aws_instance.scylladb_seed.public_ip
  description = "Public IP address of the ScyllaDB seed host."
}

output "scylladb_host_public_ip" {
  value = [aws_instance.scylladb_host.*.public_ip]
  description = "Public IP addresses of ScyllaDB regular hosts."
}

This file specifies the data to be output at the end. In our case, we want to know the IP addresses of the hosts so we can connect to them.

You can also find this code on GitHub: ScyllaDB Terraform Example.

How to Use This Terraform Configuration File

First, you need to install Terraform and AWS CLI.

Terraform installation differs across operating systems. Details can be found in the official documentation: Terraform Installation Guide.

AWS CLI is a Python module that can be installed via pip in a similar way across all operating systems where Python is available. Detailed instructions are available in the official documentation: AWS CLI on PyPI.

The next step is to set up security credentials for AWS CLI. Security credentials can be created using the IAM service in AWS. We assume that you already have them.

To enable AWS CLI and, consequently, the AWS provider for Terraform to use your credentials, you need to configure them using the following command:

Shell

 

aws configure

There are other ways to pass credentials to Terraform. More details can be found here: AWS Provider Authentication.

Understanding the Variables

Here’s a breakdown of all the variables:

  • scylladb_version: The version of ScyllaDB, used in the image name to search for the AMI.
  • your_public_network: The external IP address from which access to hosts will be allowed. It should be in CIDR format (e.g., /32 for a single address).
  • instance_type: The type of AWS instance. You must use one of the recommended types mentioned above.
  • number_of_regular_hosts: The number of hosts in the cluster, excluding the seed host.
  • ssh_key_name: The name of the preloaded public SSH key that will be added to the hosts.

Although variables can be overridden directly in the variables.tf file, it’s better to use a separate file for this purpose. This can be any file with a .tfvars extension, such as terraform.tfvars, located in the same directory as the Terraform configuration file.

In such a file, variables are written in the format <NAME> = <VALUE>. For example:

Plain Text

 

ssh_key_name = "KEYNAME"

How To Apply a Terraform Configuration

To create the cluster, navigate to the directory containing the code and run the following commands:

Initialize the AWS Provider:

Shell

 

terraform init

Example output:

Plain Text

 

Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.82.2...
- Installed hashicorp/aws v5.82.2 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

Apply the Configuration:

Shell

 

terraform apply

The command output will show that some parameters were taken from the configuration provided to Terraform, while others will be added automatically after applying the changes. Confirm the application by typing yes.

After Terraform completes its work, it will output the public IP addresses of the hosts, which you can use to connect to ScyllaDB.

Verifying Cluster Deployment

To verify that the ScyllaDB cluster was successfully deployed, connect to it via SSH using the following command:

Shell

 

ssh scyllaadm@<ip-address>

Once connected, you’ll immediately see the list of hosts in the cluster. Alternatively, you can run the following command:

Shell

 

nodetool status

Example output:

Plain Text

 

Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address       Load      Tokens Owns Host ID                              Rack
UN 172.31.39.205 489.02 KB 256    ?    ac814131-bac5-488b-b7f8-b7201a8dbb23 1b  
UN 172.31.42.145 466.77 KB 256    ?    0bd8a16f-26d3-4665-878c-74b992b91a70 1b  
UN 172.31.46.42  526.42 KB 256    ?    3eb8966e-b42b-48c3-9938-7f24b1a6b097 1b  

All hosts must have UN (Up Normal) in the first column.

Adding Hosts to the Cluster

ScyllaDB allows you to easily add hosts to clusters (but not remove them). Terraform, in turn, saves the state of the previous run and remembers, for example, the IP address of the seed host. Therefore, you can simply increase the number of hosts in the variable and run Terraform again. The new host will automatically join the cluster.

Add the following line to your variables file:

Plain Text

 

number_of_regular_hosts = 3

In this example, it will add one more host to the cluster, but you can set the variable to any number greater than 2.

Run terraform apply again. Then, log in to the seed host and verify that the list of hosts has increased.

Managing Multiple Clusters

You can deploy multiple clusters using a single Terraform configuration by using workspaces.

Create a new workspace:

Shell

 

terraform workspace new cluster_2

cluster_2 is just an example name for a workspace. It can be anything.

Deploy the New Cluster:

Shell

 

terraform apply

The original cluster will remain in the workspace named default.

List workspaces:

Shell

 

terraform workspace list

Switch between workspaces:

Shell

 

terraform workspace select default

Delete a workspace:

Shell

 

terraform workspace delete cluster_2

Destroying the Cluster

To delete a ScyllaDB cluster and all associated entities, use the following command in the desired workspace:

Shell

 

terraform destroy

This will clean up all the resources created by Terraform for that cluster.

Conclusion

With this guide, you can confidently set up, manage, and expand ScyllaDB clusters on AWS using Terraform. The step-by-step instructions provided ensure a seamless deployment experience, allowing you to focus on your application’s performance and scalability. 

Additionally, the flexibility of Terraform empowers you to easily adapt and scale your cluster as needed, whether it’s adding new hosts or managing multiple clusters with workspaces. For further details and advanced configurations, consult the official documentation for Terraform and ScyllaDB, which offer a wealth of resources to help you maximize the potential of your infrastructure.

Source:
https://dzone.com/articles/setting-up-a-scylladb-cluster-on-aws-using-terraform