How An EC2 Instance Became My Go-To Remote Development Solution Using Terraform

8 Min Read

The Story

Due to strict network security policies enforced by tools like ZScaler causing endless installation failures and security warnings for Python dependencies on local development machine (work laptop). The constant need to reconfigure settings and whitelist packages to bypass these restrictions did not only cause disruption but also pose potential risks to my work laptops security. Faced with these challenges and the alternative being carrying 2x laptops left me with a bitter taste on my mouth. I considered an alternative solution, initially working on a development container. However, even with containers, security warnings persisted. Finally, I decided to shift my development work to a remote environment using an EC2 instance which offered me a more stable and secure solution without constant security warnings.

A remote development environment, such as an EC2 instance, offers several advantages.

First, it provides enhanced security and isolation. By separating development activities from your local machine, you reduce the risk of security breaches and maintain a controlled environment for managing dependencies.
Also, remote environments allow for precise resource management. You can allocate resources based on your project’s specific needs, which is particularly beneficial for handling resource-intensive tasks that might otherwise strain your local hardware such as running multiple Docker containers (Airflow, PySpark and etc)
Lastly, remote environments ensure consistency across different workstations since the infrastructure is defined in code. This uniformity helps prevent the common issue of “it works on my machine” making it easier for teams to collaborate effectively.

In this post, I’ll share details on how I setup an EC2 instance as a remote development environment using Terraform. This setup has been a ket part of my personal Data Engineering End-to-End project I have been working on and my ongoing learning journey.

Special thanks to Ayanda Shiba and Theo Mamoswa, whose collaboration and insights were invaluable throughout this process. Ayanda Shiba and Theo Mamoswa, whom I had the pleasure of mentoring, played a significant role in refining the setup and ensuring its effectiveness. Their contributions were instrumental in shaping the final solution detailed below.

The How

Prerequisite

AWS account: Account needs to have the appropriate permissions to create and manage resources.
AWS CLI: The AWS command-line interface tool allows us to interact with AWS services.
Terraform: This infrastructure-as-code tool helps define and manage the cloud resources.
An SSH Client

This post assumes familiarity with these tools and their installation.

The Walk-through

Architectural Diagram

To better understand the setup, refer to the architectural diagram below:

Data-Engineering-Project-Remote Development Instance Arch drawio

The diagram above illustrates an AWS setup for a remote development environment consisting of an EC2 with automatic cost optimization. The instance is located in a public subnet and is periodically monitored by a Lambda function triggered by CloudWatch Events. If the instance is below a certain threshold defined during provisioning then the lambda function sends a stop-instance command via SSM

Setting Up Your EC2 Instance with Terraform

Terraform manages our infrastructure on AWS and all the configurations is defined in code. This approach ensures a clean, consistent, and dependency-controlled provisioning process by leveraging virtual environments within Terraform.

Directory Structure

The directory structure below gives a clear view of how the Terraform configuration is organized:

>> $ tree -L 4
.
├── elastic_ip.tf
├── iam_role.tf
├── instance.tf
├── internet_gateway.tf
├── lambda.tf
├── lambda_function
│   └── lambda_function.py
├── lambda_function_payload.zip
├── outputs.tf
├── provider.tf
├── routing_table.tf
├── scripts
│   └── ec2-manager.py
├── security_groups.tf
├── ssh_key_pairs.tf
├── subnets.tf
├── terraform.tfvars
├── userdata.sh.tpl
├── variables.tf
└── vpcs.tf

File Breakdown

Here’s a breakdown of the Terraform configurations files illustrated above.

Providers:

provider.tf: Configures the provider (e.g. AWS, Azure and/or GCP) and its settings. See https://developer.hashicorp.com/terraform/language/providers

$ cat provider.tf
terraform {
required_providers {
    aws = {
    source = "hashicorp/aws"
    # https://registry.terraform.io/providers/hashicorp/aws/5.37.0
    version = "~> 5.37.0"
    }
}

required_version = ">= 1.2.0"
}

provider "aws" {
region  = var.region
profile = var.profile
}

The AWS provider is configured to use version 5.37.0 and specifies that the AWS region and profile are derived from variables var.region and var.profile.

Network Configurations:

vpcs.tf: Defines Virtual Private Cloud (VPC) settings.
subnets.tf: Configures subnets within the VPC.
routing_table.tf: Sets up routing tables for network traffic.
internet_gateway.tf: Configures the Internet Gateway for VPC connectivity.
elastic_ip.tf: Manages Elastic IP addresses for public accessibility.

cat vpcs.tf subnets.tf routing_table.tf internet_gateway.tf elastic_ip.tf

# ------------ VPC ------------
resource "aws_vpc" "dev_instance_vpc" {
  cidr_block           = var.network_cidr
  enable_dns_hostnames = true

  tags = {
    Name = "${var.tag_name} VPC"
  }
}
# ------------ Subnets ------------
# Subnets with routes to the internet
resource "aws_subnet" "public" {
  vpc_id            = aws_vpc.dev_instance_vpc.id
  cidr_block        = var.public_subnet_cidr_block
  availability_zone = var.availability_zone
  # cidr_block        = cidrsubnet(aws_vpc.dev_instance_vpc.cidr_block, 4, 2)
  # Double check this by going through the logs:  https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/subnet

  tags = {
    Name = "${var.tag_name}: Public Subnet"
  }
}
# ------------ Routing Table ------------
# Route table with a route to the internet
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.dev_instance_vpc.id

  tags = {
    Name = "Public Subnet Route Table"
  }
}

resource "aws_route" "public_internet_gateway" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.dev_instance_igw.id
}

# Associate public route table with the public subnets
resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}
# ------------ Internet Gateway ------------
# Internet gateway to reach the internet
resource "aws_internet_gateway" "dev_instance_igw" {
  vpc_id = aws_vpc.dev_instance_vpc.id
}

# ------------ Elastic IP ------------
resource "aws_eip" "dev_instance_eip" {
  instance   = aws_instance.Remote_Dev_Instance.id
  domain     = "vpc"
  depends_on = [aws_internet_gateway.dev_instance_igw, aws_instance.Remote_Dev_Instance]

  tags = {
    Name = "${var.tag_name}: EIP"
  }
}

resource "aws_eip_association" "eip_assoc" {
  instance_id   = aws_instance.Remote_Dev_Instance.id
  allocation_id = aws_eip.dev_instance_eip.id
}

The config above, sets up a VPC with DNS hostnames enabled, creating the base network layer. A public subnet is also created within this VPC. We then configure the routing tables and routes to the internet via an Internet Gateway, ensuring that instances in the public subnet can access the internet. We provision the Internet Gateway for VPC connectivity, and setup a static public IP address. Finally, associating it with the EC2 instance for consistent public accessibility.

Security and Access:

security_groups.tf: Defines security groups for controlling inbound and outbound traffic.

$ cat security_groups.tf

# ------------ Security Groups ------------
# Create a security group for our instance
resource "aws_security_group" "dev_instance_security_group" {
  name   = var.dev_instance_security_group
  vpc_id = aws_vpc.dev_instance_vpc.id

  # Incoming traffic
  ingress {
    description = "Allow SSH access to the instance"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    # cidr_blocks = [var.network_cidr]
    # This is not secure for production environments.
    # Use a bastian host with more restrictions
    cidr_blocks = ["0.0.0.0/0"] # Allow all outbound traffic
  }

  ingress {
    description = "Allow HTTP access to the instance"
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"] # Allow all outbound traffic
  }

  # Outgoing traffic
  egress {
    from_port   = 0
    protocol    = "-1"
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"] # Allow all outbound traffic
  }

  tags = {
    Name = "allow network traffic"
  }
}

The config above, focuses on the security_groups.tf file that defines the security groups that control inbound and outbound traffic for our EC2 instance. We create a security group with rules that allow SSH access on port 22 and HTTP access to port 8080 (For Airflow UI) from any IP Address.

Note: This open access is primarily for the project development where SSH security concerns are minimal since access is managed through key pairs. For future implementations, we recommend restricting SSH access to specific IP addresses or using a bastion host to enhance security.

ssh_key_pairs.tf: Manages SSH key pairs for secure access to EC2 instances.

$ cat ssh_key_pairs.tf

# ------------ Key-Pair ------------
resource "tls_private_key" "dev_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "aws_key_pair" "generated_key" {
  key_name   = var.generated_key_name
  public_key = tls_private_key.dev_key.public_key_openssh
  tags = {
    Name = "${var.tag_name}: Key Pairs"
  }
}

# Not recommended
resource "local_file" "ssh_key" {
  filename        = "${aws_key_pair.generated_key.key_name}.pem"
  content         = tls_private_key.dev_key.private_key_pem
  file_permission = "0400"
}

The ssh_key_pairs.tf file above manages the creation of SSH key pairs for secure access to the EC2 instances. We generate a new RSA key pair and use it to create an AWS key pair resource. The private key is saved locally for easy access.

Note: For future implementations, it is crucial to manage private keys securely and following best practices (MFA, Key rotation and Secure Storage).

iam_role.tf: Configures IAM roles and policies for permissions and access control.

$ cat iam_role.tf

# ------------ IAM ------------
# IAM Role for EC2 with SSM SendCommand permission

resource "aws_iam_role" "ec2_ssm_role" {
  name = "ec2-ssm-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

# IAM Role Policy for SSM SendCommand

resource "aws_iam_role_policy" "ssm_send_command_policy" {
  name       = "ssm-send-command-policy"
  role       = aws_iam_role.ec2_ssm_role.id
  depends_on = [aws_vpc.dev_instance_vpc]
  policy     = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssmmessages:SendCommand"
      ],
      "Resource": [
        "arn:aws:ssm:${var.region}:${aws_vpc.dev_instance_vpc.owner_id}:instance/*"
      ]
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "ssm_send_command_policy" {
  role       = aws_iam_role.ec2_ssm_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Create IAM Instance Profile
resource "aws_iam_instance_profile" "ec2_ssm_profile" {
  name = "ec2-ssm-instance-profile"
  role = aws_iam_role.ec2_ssm_role.name
}

# ---------------------------------------------------------------
# IAM Role for Lambda

resource "aws_iam_role" "lambda_role" {
  name = "lambda-role-ssm"
  path = "/service-role/"
  assume_role_policy = jsonencode(
    {
      Statement = [
        {
          Action = "sts:AssumeRole"
          Effect = "Allow"
          Principal = {
            Service = "lambda.amazonaws.com"
          }
        },
      ]
      Version = "2012-10-17"
    }
  )
}

# ---------------------------------------------------------------
# IAM Role for Lambda logging
resource "aws_iam_policy" "lambda_logging" {
  name        = "lambda_logging"
  path        = "/"
  description = "IAM policy for logging from a lambda"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Effect   = "Allow"
        Resource = "arn:aws:logs:*:*:*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_logs" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = aws_iam_policy.lambda_logging.arn
}

resource "aws_cloudwatch_log_group" "lambda_log_group" {
  name              = "/aws/lambda/${var.lambda_function_name}"
  retention_in_days = 3
}

# IAM additional policies for Lambda
data "aws_iam_policy" "ssm_access" {
  arn = "arn:aws:iam::aws:policy/AmazonSSMFullAccess"
}

data "aws_iam_policy" "ec2_readonly_access" {
  arn = "arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess"
}

data "aws_iam_policy" "lambda_execution_acces" {
  arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

resource "aws_iam_role_policy_attachment" "ssm_access_role_attachment" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = data.aws_iam_policy.ssm_access.arn
}
resource "aws_iam_role_policy_attachment" "ec2_readonly_access_role_attachment" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = data.aws_iam_policy.ec2_readonly_access.arn
}
resource "aws_iam_role_policy_attachment" "lambda_execution_acces_role_attachment" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = data.aws_iam_policy.lambda_execution_acces.arn
}

The iam_role.tf file above configures IAM roles and policies required for managing various permissions and access control. We are setting up IAM roles for our EC2 instances and Lambda functions, attaching the necessary policies to grant permissions for SSM (AWS System Manager) commands and logging (via CloudWatch). The EC2 role is assigned permissions to interact with SSM, while the Lambda role has permissions for logging and accessing EC2 and SSM resources.

EC2 Instance Configuration:

instance.tf: Defines the EC2 instance, including its type and configurations.

$ cat instance.tf

# ------------ EC2 Instance ------------
data "template_file" "userdata" {
  template = file("${path.module}/userdata.sh.tpl")
  vars = {
    github_auth_token = var.github_auth_token,
    github_repo_branch = var.github_repo_branch,
    github_repo_name = var.github_repo_name
  }
}

resource "aws_instance" "Remote_Dev_Instance" {
  ami           = var.ami
  instance_type = var.instance_type

  root_block_device {
    volume_size = var.root_storage_size
    volume_type = var.root_storage_type
  }

  subnet_id                   = aws_subnet.public.id
  vpc_security_group_ids      = [aws_security_group.dev_instance_security_group.id]
  associate_public_ip_address = true

  user_data = data.template_file.userdata.rendered

  key_name = var.generated_key_name
  # This approach guarantees that the key pair is generated and
  # available before the instance launch, preventing potential
  # errors due to missing key pairs.
  depends_on = [
    aws_key_pair.generated_key
    , aws_iam_instance_profile.ec2_ssm_profile
  ]

  iam_instance_profile = aws_iam_instance_profile.ec2_ssm_profile.name

  tags = {
    Name = "${var.tag_name}"
  }
}

# Terraform module to configure AWS SSM Default Host Management
# Read more: https://docs.aws.amazon.com/systems-manager/latest/userguide/managed-instances-default-host-management.html
module "ssm_default_host_management_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version = "5.20.0"

  create_role = true

  trusted_role_services = [
    "ssm.amazonaws.com"
  ]

  role_name         = "AWSSystemsManagerDefaultEC2InstanceManagementRole"
  role_requires_mfa = false

  custom_role_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonSSMManagedEC2InstanceDefaultPolicy",
  ]
}

resource "aws_ssm_service_setting" "default_host_management" {
  setting_id    = "arn:aws:ssm:${var.region}:${aws_vpc.dev_instance_vpc.owner_id}:servicesetting/ssm/managed-instance/default-ec2-instance-management-role"
  setting_value = "service-role/AWSSystemsManagerDefaultEC2InstanceManagementRole"
  depends_on    = [aws_vpc.dev_instance_vpc]
}

The configuration above, defines an EC2 instance with specific settings including AMI, instance type, and storage. The instance is configured to use a user-data script during instance initialization, which is provided by the userdata.sh.tpl file detailed below. The instance is also associated with a security group and an IAM instance profile for permissions defined above. The setup includes integration with AWS Systems Manager for ensuring the instance is managed efficiently.

userdata.sh.tpl: Provides a template script for initializing the instance (userdata).

$ cat userdata.sh.tpl

#!/usr/bin/env bash

# Log file to track the execution of the user data script
touch /tmp/userdata_script.log
echo "Starting user data script execution: $(date)" >> /tmp/userdata_script.log

# Update package list and install necessary packages
apt-get update -qq
apt-get install -y \
    build-essential \
    ca-certificates \
    curl \
    glibc-source \
    libc6 \
    libstdc++6 \
    lsb-release \
    net-tools \
    gnupg \
    python3-pip\
    python3-venv \
    tar

# Set up Docker repository and install Docker
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update -qq
apt-get install -y \
    containerd.io \
    docker-buildx-plugin \
    docker-ce \
    docker-ce-cli \
    docker-compose \
    docker-compose-plugin

# Set up Docker group and start Docker service
su - ubuntu
groupadd docker
sudo usermod -aG docker ubuntu
sudo systemctl enable docker
sudo service docker start

# Set up Python environment and project repository
cd /home/ubuntu
python3 -m venv /home/ubuntu/.pyenv
.pyenv/bin/python -m pip install -U pip
git clone -b ${github_repo_branch} "https://oauth2:${github_auth_token}@github.com/${github_repo_name} /home/ubuntu/Data-Engineering-Project"
.pyenv/bin/python -m pip install -r /home/ubuntu/Data-Engineering-Project/requirements.out

# Set permissions and ownership for project files
sudo chown -R ubuntu:ubuntu /home/ubuntu/Data-Engineering-Project
sudo chmod -R a+rwx /home/ubuntu/.pyenv

# Add Python virtual environment activation to .bashrc for convenience
echo 'source /home/ubuntu/.pyenv/bin/activate' >> /home/ubuntu/.bashrc

# Create a script to check for active SSH connections on port 22
echo '#!/usr/env bash' > /home/ubuntu/check_ssh_connections.sh
echo 'netstat -an | grep "ESTABLISHED" | grep ":22 "' >> /home/ubuntu/check_ssh_connections.sh
chmod a+x /home/ubuntu/check_ssh_connections.sh

# TODO: Parameterize should I not want to also run Airflow on startup
# Create directories for Airflow configuration and set permissions
mkdir -p /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/{dags,logs,plugins,config}
chmod -R 777 /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/{dags,logs,plugins,config}

# Configure and run Airflow using Docker Compose
echo -e "AIRFLOW_UID=$(id ubuntu -u)" >> /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/.env
echo -e "AIRFLOW_GID=0" >> /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/.env
chmod -R 777 /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/{dags,logs,plugins,config}
docker-compose -f /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/airflow-docker-compose.yaml up airflow-init
docker-compose -f /home/ubuntu/Data-Engineering-Project/004_Orchestration/AirFlow/airflow-docker-compose.yaml up -d

# Perform system upgrade and clean up
sudo apt upgrade -y
sudo apt-get autoclean
sudo apt-get autoremove

# Log completion of the user data script
echo "User data script execution complete: $(date)" >> /tmp/userdata_script.log

The userdata.sh.tpl file provides a script to set up the EC2 instance upon launch. It installs all the necessary packages, sets up Docker, configures Python environments, clones the provided project repository and deploys project-specific Python packages. The script also includes configurations for running Airflow using Docker and performs system maintenance tasks.

The combination of instance.tf and userdata.sh.tpl ensures that the EC2 instance is properly configured and ready for use in the development environment. Ensuring that all package dependencies are installed, repository is installed and Airflow is configured during instance initialization.

Lambda Function:

lambda.tf: Configures the Lambda function and its triggers.

$ cat lambda.tf

# ------------ Lambda ------------
data "archive_file" "lambda" {
  type        = "zip"
  source_file = "lambda_function/lambda_function.py"
  output_path = "lambda_function_payload.zip"
}

resource "aws_lambda_function" "stop_instance" {
  architectures = [
    "x86_64",
  ]
  description      = "An AWS Lambda function that automatically stops an EC2 instance if there hasn't been an SSH connection to it in over x minutes."
  function_name    = var.lambda_function_name
  handler          = "lambda_function.lambda_handler"
  memory_size      = 128
  package_type     = "Zip"
  role             = aws_iam_role.lambda_role.arn
  filename         = "lambda_function_payload.zip"
  runtime          = "python3.10"
  skip_destroy     = false
  source_code_hash = data.archive_file.lambda.output_base64sha256

  tags = {
    "lambda-console:blueprint" = "stop-instance-lambda"
  }
  timeout = 300

  ephemeral_storage {
    size = 512
  }

  logging_config {
    log_format = "Text"
  }
  depends_on = [
    aws_iam_role_policy_attachment.lambda_logs,
    aws_cloudwatch_log_group.lambda_log_group,
  ]
}

# CloudWatch Events rule to trigger an AWS Lambda function at regular intervals
resource "aws_cloudwatch_event_rule" "stop_instance" {
  description         = "Trigger an AWS Lambda function at regular intervals"
  name                = "stop_instance"
  schedule_expression = "rate(${var.lambda_event_rate})"
}

resource "aws_cloudwatch_event_target" "check_instance_every_x_minutes" {
  rule = aws_cloudwatch_event_rule.stop_instance.name
  arn  = aws_lambda_function.stop_instance.arn

  # Pass the event data as a JSON string
  input = jsonencode({
    "tag_key" : "Name",
    "tag_value" : aws_instance.Remote_Dev_Instance.tags.Name
  })

  depends_on = [
    aws_instance.Remote_Dev_Instance,
    aws_cloudwatch_event_rule.stop_instance,
    aws_lambda_function.stop_instance,
  ]
}

resource "aws_lambda_permission" "allow_cloudwatch_to_call_stop_instance" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.stop_instance.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.stop_instance.arn
}

The lambda.tf file defines the Lambda function and its event-rule triggers. It packages the Lambda code into a zip file, configures the Lambda function with required properties, and sets up a CloudWatch Events rule to trigger the function at regular intervals. Permissions are also configured to allow CloudWatch to invoke the Lambda function.

lambda_function/lambda_function.py: Contains the Lambda function written in Python.

$ cat lambda_function/lambda_function.py

import logging
import time

import boto3

# Initialize AWS clients
ec2_client = boto3.client("ec2")
ssm_client = boto3.client("ssm")

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()


def wait_for_command_invocation(command_id, instance_id, max_retries=5, wait_interval=4):
    for _ in range(max_retries):
        try:
            get_cmd_invocation = ssm_client.get_command_invocation(
                CommandId=command_id, InstanceId=instance_id
            )
            if get_cmd_invocation["Status"] in ["Success", "Failed"]:
                return get_cmd_invocation
            time.sleep(wait_interval)
        except Exception as e:
            logger.error(f"Could not get command invocation: {str(e)}")
            time.sleep(wait_interval)
    raise RuntimeError(f"Command {command_id} failed after {max_retries} retries.")


def check_recent_ssh_connection(instance_id):
    try:
        response = ssm_client.send_command(
            InstanceIds=[instance_id],
            DocumentName="AWS-RunShellScript",
            Parameters={"commands": ["bash /home/ubuntu/check_ssh_connections.sh"]},
        )
        command_id = response["Command"]["CommandId"]

        invocation_response = wait_for_command_invocation(command_id, instance_id)
        standard_output_content = invocation_response.get("StandardOutputContent", "")

        return "tcp" in standard_output_content
    except Exception as e:
        logger.error(f"Failed to check SSH connection: {str(e)}")
        return False


def get_instance_id(reservations):
    return [
        instance["InstanceId"]
        for reservation in reservations
        for instance in reservation["Instances"]
        if instance["State"]["Name"] == "running"
    ]


def lambda_handler(event, context):
    tag_key = event.get("tag_key")
    tag_value = event.get("tag_value")

    if not tag_key or not tag_value:
        logger.error("Missing tag_key or tag_value in the event")
        return

    custom_filter = [{"Name": f"tag:{tag_key}", "Values": [tag_value]}]
    try:
        reservations = ec2_client.describe_instances(Filters=custom_filter)["Reservations"]
        instance_ids = get_instance_id(reservations)

        if len(instance_ids) != 1:
            logger.info("No or multiple instances found. Skipping stop.")
            return

        instance_id = instance_ids[0]
        if not check_recent_ssh_connection(instance_id):
            logger.info(f"No SSH connection detected on instance {instance_id}. Stopping instance.")
            ec2_client.stop_instances(InstanceIds=[instance_id])
        else:
            logger.info(f"Instance {instance_id} has had an SSH connection. Skipping stop.")
    except Exception as e:
        logger.error(f"Error in lambda_handler: {str(e)}")

The Lambda function above, checks if an EC2 instance with a specific tag name has had recent SSH connections by executing a script check_ssh_connections.sh via SSM. It retrieves instance IDs based on tags, checks SSH activity, and stops the instance if no recent connections are detected. The function handles edge cases where no or multiple instances are found and manages retries for command execution. This setup helps automate the management of EC2 instances to optimize resource usage and costs.

Scripts:

scripts/ec2-manager.py: A custom script for managing the EC2 instance.

import argparse
import logging

import boto3


def parse_arguments():
    """Parses command-line arguments using argparse."""
    parser = argparse.ArgumentParser(description="Manage EC2 instances based on tags")
    parser.add_argument(
        "action",
        choices=["start", "stop"],
        help="Action to perform (start or stop)",
    )
    parser.add_argument(
        "--tag-name", default="Name", help="Tag name to match (Default: %(default)s)"
    )
    parser.add_argument(
        "--tag-value",
        default="Dev Remote Instance",
        help="Tag value to match (Default: %(default)s",
    )
    parser.add_argument(
        "--log-level",
        choices=["debug", "info", "warning", "error", "critical"],
        default="info",
        help="Logging level (Default: %(default)s)",
    )
    return parser.parse_args()


def get_instances(client, tag_name, tag_value):
    """Retrieves EC2 instances matching the specified tag."""
    filters = [{"Name": f"tag:{tag_name}", "Values": [tag_value]}]
    reservations = client.describe_instances(Filters=filters)["Reservations"]
    return reservations


def manage_instances(client, action, instance_id, state):
    """Starts or stops an EC2 instance based on action and current state."""
    if (state == "running" and action == "stop") or (
        state == "stopped" and action == "start"
    ):
        if action == "start":
            response = client.start_instances(InstanceIds=[instance_id])
            logging.info(f"Starting instance {instance_id}")
        else:
            response = client.stop_instances(InstanceIds=[instance_id])
            logging.info(f"Stopping instance {instance_id}")
        return response
    else:
        logging.info(f"Instance {instance_id} is already {state}")
        return None


def main():
    args = parse_arguments()
    logging.basicConfig(level=getattr(logging, args.log_level.upper()))

    client = boto3.client("ec2")

    reservations = get_instances(client, args.tag_name, args.tag_value)
    if not reservations:
        logging.info("No instances found with matching tag.")
        return

    for reservation in reservations:
        for instance in reservation["Instances"]:
            instance_id = instance["InstanceId"]
            state = instance["State"]["Name"]
            manage_instances(client, args.action, instance_id, state)


if __name__ == "__main__":
    main()

The script provided above needs to be executed manually once the development instance is no longer needed, it allows you to manually start or stop your EC2 instance based on tags, which can be useful for managing your development environment. Further ensuring that resources are managed effectively.

Outputs:

outputs.tf: Defines outputs to provide information about the deployed resources.

# ------------ Outputs ------------

output "public_dns" {
  description = "Public DNS name of the EC2 instance"
  value       = aws_instance.Remote_Dev_Instance.public_dns
}

output "public_ip" {
  description = "Public IP address of the EC2 instance"
  value       = aws_instance.Remote_Dev_Instance.public_ip
}

output "instance_connection_parameters" {
  description = "SSH connection parameters for the EC2 instance"
  value       = "-i ${aws_key_pair.generated_key.key_name}.pem ubuntu@${aws_instance.Remote_Dev_Instance.public_dns}"
}

The outputs.tf config above, defines outputs that are printed to STDOUT once the EC2 instance is deployed. It includes the public DNS name and public IP address which is essential for remote access. Additionally, it also provides the SSH connection paramater string for connecting to the EC2 instance, which details the command needed to connect to the instance using SSH, including the keypair and username.

Usage: ssh $(terraform output -raw instance_connection_parameters), this will connect to the EC2 instance.

Variables:

variables.tf: Defines variables used across the Terraform configuration.

# ------------ Variables ------------
variable "generated_key_name" {
  type        = string
  default     = "remote-dev-instance-tf-key-pair"
  description = "Key-pair generated by Terraform"
}

variable "ami" {
  type = string
  # List of available AMIs can be found here: https://cloud-images.ubuntu.com/locator/ec2/
  default     = "ami-0e95d283a666c6ea0"
  description = "AMI ID for Ubuntu 22.04 LTS"
}

variable "instance_type" {
  type        = string
  default     = "t2.medium"
  description = "EC2 instance type"
}

variable "region" {
  type    = string
  default = "eu-west-1"
}

variable "network_cidr" {
  default     = "10.0.0.0/16"
  description = "CIDR block for the VPC"
}

variable "public_subnet_cidr_block" {
  default     = "10.0.1.0/24"
  description = "CIDR block for the public subnet"
}

variable "availability_zone" {
  default     = "eu-west-1a"
  description = "Availability zone to deploy the resources"
}

variable "tag_name" {
  description = "Tag name for all resources"
  default     = "Remote Dev Instance"
}

variable "dev_instance_security_group" {
  type        = string
  default     = "terraform-security-group"
  description = "Allow HTTP/SSH traffic from interweb"
}

variable "profile" {
  type        = string
  default     = "Terraform_credentials"
  description = "profile for terraform credentials"
}

variable "root_storage_size" {
  type        = number
  default     = 20
  description = "AWS root storage size"
}

variable "root_storage_type" {
  type        = string
  default     = "gp2"
  description = "root storage type"
}

variable "lambda_function_name" {
  type        = string
  default     = "stop_instance"
  description = "Lambda function that stop instance if it's running and with no SSH connection"
}

variable "lambda_event_rate" {
  type        = string
  default     = "rate(30 minute)"
  description = "Lambda event rate"
}

variable "github_auth_token" {
  type        = string
  description = "GitHub token used for authentication"
  sensitive   = true
}

variable "github_repo_branch" {
  type        = string
  description = "GitHub repository branch"
  default     = "main"
}

variable "github_repo_name" {
  type        = string
  description = "GitHub repository name"
  default     = ""
}

The variables.tf config above, defines various variaobles that customize the deployment of our EC2 instance and other related resources. These variables enable flexibility and resusability of the terraform scripts.

terraform.tfvars: Contains values for the variables defined in variables.tf. This files behaves like a .env file where one would define or store sensitive information that they do not wish to version control or if one is working on multiple environment without wanting to make changes on the variables.tf file.

Deploying the EC2 instance

Once we have defined our configuration and resources required for our EC2 instance, we can initialize Terraform thereafter deploy our instance.

Initialize Terraform

Initialize Terraform: download the required provider plugins:
```
terraform init
```
Review the terraform.tfvars file and update the variables with your desired values. You can customize parameters such as region, database instance identifier, instance class, storage size, and more in this file.

See examples of variables you can set manually: ./variables.tf
Review the changes that will be made:
```
terraform plan
```
When satisfied with the resources to be deployed, apply the Terraform config to create the resources:
```
terraform apply
```

You will be prompted to confirm the resources creation, Type in yes and hit Enter

Once resources are created, Terraform will display the following as per outputs.tf file:
- Public DNS name
- Public IP address of the EC2 instance and,
- The SSH connection parameters for the EC2 instance.
Navigate to your aws console to check your instance running or run the following command to check the status of the instance:
```
aws ec2 describe-instance-status --include-all-instances
```
To test, if the instance is running, connect to the new public DNS and verify your config:
```
ssh $(terraform output -raw instance_connection_parameters)
```
Once logged in, you will need to configure Git:

git config --global user.email "<your email>@gmail.com"
git config --global user.name "Your name"

Check whether the user data script was ran successfully during initialization

You can verify using the following steps:

Check the log of your user data script in:

less /var/log/cloud-init.log and
less /var/log/cloud-init-output.log

You can see all logs of your user data script, and it will also create the /etc/cloud folder.

Connecting to the Instance with VS Code

Configure SSH Access:
- Ensure your EC2 instance has SSH access enabled on port 22 for your security group.
- If you haven’t already, create an SSH key pair on your local machine using ssh-keygen command.
Install the Remote SSH Extension:
- Open VS Code and navigate to the Extensions tab (Ctrl+Shift+X).
- Search for “Remote-SSH” extension and install it.
Configure VS Code Remote Settings:
- Open the Command Palette (Ctrl+Shift+P).
- Search for “Remote-SSH: Open SSH Configure File” and select it.
- Choose an option to create a new SSH configuration file (usually the default option is recommended). This file will store your EC2 instance connection details.

Add Your EC2 Instance Configuration:

The configuration file will open in your VS Code editor.
Add a new configuration for your EC2 instance following the format below, replacing placeholders with your actual values:

Host "your_instance_hostname", // Replace with your EC2 instance hostname or IP address
  HostName "your_instance_hostname", // Replace with your EC2 instance hostname or IP address
  IdentityFile "~/.ssh/your_key_pair.pem" // Replace with the path to your private key file
  User "your_username", // Replace with your username on the EC2 instance (e.g., ubuntu)

Connect to Your Remote Environment:
- Click on the Remote Status bar indicator (left corner of VS Code) or navigate to the Command Palette again.
- Search for “Remote-SSH: Connect to Host” and select it.
- Choose the name you assigned to your EC2 instance configuration in the previous step.

Once connected, your VS Code workspace will switch to the remote environment on your EC2 instance. You should see the hostname of the EC2 instance in the status bar.

For more details on Remote Development using SSH, read: https://code.visualstudio.com/docs/remote/ssh

Managing the EC2 Instance

Start/Stop Instance Manually

When you are done using the instance ensure that it is stopped, to avoid incurring costs when the instance is not in use. Run the script provided with the desired action and optional arguments:

  python scripts/ec2-manager.py start   # Start instances with the matching tag.
  python scripts/ec2-manager.py stop    # Stop instances with the matching tag.

  # Optionally specify tag details:
  python scripts/ec2-manager.py stop --tag-name MyCustomTag --tag-value my-instance

  # Optionally control logging verbosity:
  python scripts/ec2-manager.py start --log-level debug

Automatic Stop Instance (Cost Management)

Every 30 minutes, a CloudWatch Events rule triggers an AWS Lambda function, this function will perform the following steps:

Runs a command via AWS Systems Manager (SSM) on the EC2 instance to check for active SSH connections.
Parses the command output to determine if there has been any SSH activity within the last 30 minutes.
If there hasn’t been any SSH activity the EC2 instance is stopped.

Clean Up

To avoid incurring charges, ensure that you have destroyed your infrastructure after use - should you wish not to use it anymore:

terraform destroy

Conclusion

Setting up a remote development instance using Terraform not only simplifies the provisioning of resources but also enhances the scalability and managebility of your complete instrastructure. By leveraging infrustructure as code (IaC), you ensure consistancy and detemenistic deployments, minimize clickops and make your environment reproducible.

In this blog post, we explored the process of setting up an EC2 for development purposes from defining various resources to provisioning them via Terraform as well as managing them effeciently. We also intergrated various services such as AWS Lambda, SSM and CloudWatch Events for cost management ensuring that our instance is stopped when not in-use.

As you continue to refine your environment, also consider integrating additional AWS services or automating more aspects of your infrastructure management including cost optimization and management.