Sergii Vershynskyi
Creator of this blog.
Jul 26, 2023 17 min read 3483 words

TEMPLATE FOR APPLICATION GROUP

Abstract: A natural evolution of the growing platform is splitting it into almost isolated pieces of the infrastructure deployed from separate TF repositories. These TF repositories contain a lot of similar code, and having a template makes their creation easy and fast, as well as provides additional benefits.

Introduction

Having infrastructure monorepo for the whole platform works well in its early life stage but usually creates challenges during its growth. To name a few of them, I can mention that TF code becomes harder to read and understand due to its complexity increasing over time, TF releases become slow as a consequence of a large number of resources to manage, teams start to compete for access to the repository heavily, the monorepo is shared between teams with different skill levels and agreed practices. In addition to this, monorepo makes it challenging to release infrastructure in parallel, which is important in the DR scenario, etc. On top of it, such big shared repositories end up in a “dirty” state from time to time. Because of these and other reasons, as well as the increased risk of breaking infrastructure during bad deployment (big “blast radius”), the monorepo is usually split into smaller, relatively isolated pieces of infrastructure (application groups), each of which lives in a separate repository and is managed by a dedicated team.

Note: The dirty state is when the current developer sees unapplied changes in the TF plan introduced by developers who worked on the repository before.

TF repositories of different application groups contain a significant portion of the same or similar code, and in the past, the creation of a new application group involved copying the code of the existing one and modifying it. This process takes time and is error-prone. Moreover, repositories begin to differ over time and include different types of improvements. At the same time, having a configurable template containing all the existing improvements can significantly simplify and speed up the creation of new repositories.

This article will describe the template for creating the application group code.

Design considerations

The application group is split into two layers: base and services layers. The base layer contains the ECS cluster, networking and shared resources, such as RDS clusters, etc. The services layer contains all other resources, including ECS services, Lambda functions and other needed resources. A separate base layer reduces the “blast radius” and simplifies code in the application layer. In addition to this, the base layer changes very rarely and is usually taken care of by infrastructure teams, while services layers are frequently changed by the application developers teams. Each layer lives in its own repository.

Let’s define the requirements for the template as follows:

The template should contain feature flags to enable the generation of desired code. For example, if the service uses SQS queues, we can enable the SQS feature flag to add the required code.
Both base and services layers should be generated at once using the same configuration file since both require common parameters and having one config file reduces the possibility of misconfiguration and amount of copy-pasting.
The template should generate code in a way so that adding more applications to it should be an easy task.
The common functionality of each application should be grouped into its own module instead of having all ECS services, lambda functions and other resources (elasticsearch clusters, etc.) be direct children of the application parent module and grouped by type. For example, suppose the AWS Lambda payment-processor is part of the payment application functionality. In that case, it should be the child of the application TF module instead of being defined inside file lambdas.tf in the parent TF module applications, together with Lambda modules from all the other applications. This allows improved clarity by reflecting infrastructure composition in TF code, increases the quality of code, and opens the opportunity to simplify removing unused AWS resources from environments by simply adding count on the whole service module declaration instead of adding separate conditional expressions in many places for each of the related modules.

Template internals

I chose cookiecutter as a templating engine. It requires to have specific directory structure, which will be the following for our template:

hooks
  pre_gen_project.py
  post_gen_project.py
{{cookiecutter.parent_dir}}
  {{cookiecutter.base_repo_name}}
  {{cookiecutter.services_repo_name}}
cookiecutter.json

The template files of the base layer and services layer live inside {{cookiecutter.base_repo_name}} and {{cookiecutter.services_repo_name}} directories, respectively.

cookiecutter.json is a configuration file for the template containing feature flags and a common set of parameters. These parameters are validated by the pre_gen_project.py script. For example, the short_name parameter is validated using the following code:

def check_context(contexts):
    for context, rendered in contexts.items():
        ctx_rendered = rendered[0]
        regex = rendered[1]

        if not re.match(regex, ctx_rendered):
            print("ERROR: {0} is not a valid {1}".format(ctx_rendered, context))
            sys.exit(1)

def main():
    contexts = OrderedDict()

    ...

    contexts["short_name"] = (
        "{{cookiecutter.short_name}}",
        r"^([a-z]+-?[a-z]+){2,20}$"
    )

    check_context(contexts)

After validation, the templating engine generates the code and places it inside the output directory with child directory names cookiecutter.base_repo_name and cookiecutter.services_repo_name. These are variables whose values are derived from user inputs and are defined as follows:

{{ cookiecutter.update({"process_name_underscored": cookiecutter.process_name | replace('-', '_')}) }}

{{ cookiecutter.update({"base_repo_name": cookiecutter.process_name_underscored + "_tf"}) }}
{{ cookiecutter.update({"services_repo_name": cookiecutter.process_name_underscored + "_services_tf"}) }}

The derived values are put inside pre_gen_project.py not to pollute cookiecutter.json, as well as to calculate them once and use them where they are needed.

The last step is the post-processing of the output directory, performed by the post_gen_project.py script.

Note: You might wonder about the shared TF modules’ source format. It is determined by the TinyCache global TF cache used in the template. You can read about it in great detail here.

How do feature flags work

Let me describe how feature flags work on the example of is_sqs. In the variables.tf, you can find the following conditional code

{% if cookiecutter.is_sqs == "true" %}

  sqs_queues = FILL_ME{% endif %}

This way sqs_queues = FILL_ME will be added to the output directory if the is_sqs feature flag in the cookiecutter.json is set to true. All over the template, I use placeholders FILL_ME for places where input is needed, but the default value is not applicable since it should always be custom. Indeed, the names of SQS queues are different for different applications, and providing some sample default queue names can lead to creating unnecessary infrastructure since editing of sqs_queues can be overlooked. At the same time, running TF after code generation with placeholders FILL_ME will lead to syntax errors and draw engineers’ attention to fill it with desired values. We are not using {} as default since it makes no sense. Filling sqs_queues can be overlooked and result in having the sqs module declared and policies added to the application to access queues, but no queues created, which is similar to having dead code, creates unused infrastructure and at least is confusing. Here the idea is simple: if someone would like to have queues created - is_sqs should be set to true, and sqs_queues should be filled with the names of the queues; otherwise, is_sqs should be set to false.

The other solution would be to add parameters for every FILL_ME to the cookiecutter.json file. I have chosen not to go this path since it leads to:

unnecessary pollution of cookiecutter.json with values used only when certain feature flags are set;
possible formatting issues when the templating engine places contents of variables (some of them would be multi-line) to the actual code - better to edit them where they are used;
increasing the documentation size since the developer will require additional context to understand what it is for and what might be the expected values.

There is another piece of code related to the SQS functionality for the application, which is stored in the separate file messaging.tf. This is done on purpose to:

create a better code structure by placing related functionality into separate *.tf files;
reduce the amount of conditional statements in the *.tf files inside the template. In our example, it is done by simply deleting file messaging.tf from the output directory if is_sqs is false using the following code in the post_gen_project.py script:

def del_file_in_service(services_repo_dir, is_needed, filename):
    if is_needed == "false":
        os. remove(os.path.join(services_repo_dir, "modules", "apps", "{{ cookiecutter.service_name }}", filename))

def main():
    services_repo_dir = os.path.join(cwd, "{{ cookiecutter.services_repo_name }}")
    del_file_in_service(services_repo_dir, "{{ cookiecutter.is_sqs }}", "messaging.tf")

Notes on formatting

The template contains code like this:

service = merge(local._service, var.service){% if cookiecutter.is_ecs_migration == "true" %}

ecs_migration = merge(var.ecs_migration, {
  enabled = lookup(var.ecs_migration, "app_name", "") == local.names.kebak_case
}){% endif %}

secrets = [
  {
    name = "JAVA_OPTS",
    arn  = module.parameters.parameters["java_opts"]["arn"]
  }
]

Cookiecutter’s conditional expression looks pretty weird and harder to read, although not really hard due to primarily using basic conditional statements in the template. This is done purposefully: here, we are prioritizing generated code formatting over template formatting. Indeed, if is_ecs_migration is true, then we’ll get

service = merge(local._service, var.service)

ecs_migration = merge(var.ecs_migration, {
  enabled = lookup(var.ecs_migration, "app_name", "") == local.names.kebak_case
})

secrets = [
  {
    name = "JAVA_OPTS",
    arn  = module.parameters.parameters["java_opts"]["arn"]
  }
]

otherwise

service = merge(local._service, var.service)

secrets = [
  {
    name = "JAVA_OPTS",
    arn  = module.parameters.parameters["java_opts"]["arn"]
  }
]

which is perfectly formatted code.

The other possibility would be to prioritize template formatting over generated code formatting and run terraform fmt command after the templating engine is done its work. While this can be a good option, sometimes it produces an undesired format, and thus previous option was chosen.

Environments directories handling

Environment-specific directories env-{ENV} are used as root TF modules for provisioning infrastructure for the specific environment and contain constant configuration files (which are symlinks to the files in the root of the repository), as well as unique configuration files.

Unique configuration files include main.auto.tfvars files used to set cpu & memory for the ECS tasks for each of the services, RDS instance sizes, etc. It is worth mentioning that each service can have the following structure of its parameters:

payment_analysis = {
  service = {
    task_count = 0
  }

  task = {
    cpu    = 512
    memory = 1024
  }

  rds = ...
  es  = ...
  ec  = ...
}

service and task maps are not merged into a single map service because they represent separate input variables for the service TF module:

module "service" {
  source = "../../../.modules_tf/ecs-service-13.0.1//modules/v0.12/ecs-service"

  service_config = local.service
  task_config    = var.task
  ...
}

This way, it becomes possible to add other parameters to these maps by simply adding them to main.auto.tfvars of the specific environment, and they will be automatically passed to the module itself. The same applies to rds, es and ec maps.

Let’s review how each of these categories is handled inside the template.

Handling unique configuration files

Template environment-specific directories contain the following unique configuration files:

backend.tf for running TF on secure bastion or CI/CD pipeline;
backend-local.hcl and local-dev.tfvars for running TF on a local machine;
main.auto.tfvars for expressing environment-specific services configuration.

While in the example template’s code, the contents of the backend.tf, backend-local.hcl and local-dev.tfvars files are very similar, in the real environments, they might differ due to historical reasons. Contents of the main.auto.tfvars will always be different, except when having very simple trivial services. Due to these reasons, all these configuration files are placed inside env-{ENV} directories in the template for the purpose of easy customization and are not generated by the template.

When certain environments are not needed, their directories are simply deleted in the post_gen_project.py script:

ALL_ENVS = [
    "alpha",
    "dev",
    "staging",
    "prod",
    "dr"
]

def get_envs():
    envs = "{{ cookiecutter.envs }}"

    if envs == "All":
        envs = ALL_ENVS
    else:
        envs = envs.split("|")
        wrong_envs = list(set(envs) - set(ALL_ENVS))

        if wrong_envs:
            raise ValueError('Invalid env(s) specified in cookiecutter.json: ', wrong_envs)

    return envs

def remove_all_envs(envs, base_repo_dir, services_repo_dir):
    envs_to_remove = list(set(ALL_ENVS) - set(envs))

    remove_envs(base_repo_dir, envs_to_remove)
    remove_envs(services_repo_dir, envs_to_remove)

def remove_envs(repo_dir, envs_to_remove):
    for env_ in envs_to_remove:
        dir_to_remove = os.path.join(repo_dir, "env-" + env_)

        try:
            shutil.rmtree(dir_to_remove)
        except Exception as e:
            raise e

def main():
    cwd = os.getcwd()
    base_repo_dir = os.path.join(cwd, "{{ cookiecutter.base_repo_name }}")
    services_repo_dir = os.path.join(cwd, "{{ cookiecutter.services_repo_name }}")

    envs = get_envs()

    remove_all_envs(envs, base_repo_dir, services_repo_dir)

Handling constant configuration files

We are creating symlinks for the files initialize.tf, main.tf and variables.tf in each of the env-{ENV} directories. This helps us have the same contents of these configuration files in all env-{ENV} directories guaranteed. Based on my experience, situations where these files should have environment-specific content, is very rare and usually is the indication that code might need improvements and design re-evaluation. Using symlinks also means we can change one file, and its contents will be changed in all the environments. In addition to this, this saves some disk space :)

Unfortunately, Cookiecutter does not handle symlinks well and replaces them with file contents when it writes to the output directory. To fix it, we create symlinks in the post_gen_project.py using the following code:

def create_all_symlinks(envs, base_repo_dir, services_repo_dir):
    files = [
        "initialize.tf",
        "main.tf",
        "variables.tf"
    ]

    create_symlinks(envs, base_repo_dir, files + ["outputs.tf"])
    create_symlinks(envs, services_repo_dir, files)

def create_symlinks(envs, repo_dir, files):
    for env_ in envs:
        os.chdir(os.path.join(repo_dir, "env-" + env_))

        for file_ in files:
            try:
                os.symlink(os.path.join("..", file_), file_)
            except Exception as e:
                raise e

        os.chdir(repo_dir)

def main():
    ...
    create_all_symlinks(envs, base_repo_dir, services_repo_dir)

How to use template

Typical workflow

The typical workflow for generating code using the existing template is the following:

Edit cookiecutter.json.
Delete the output folder (if you already run Cookiecutter).
Run Cookiecutter to generate base and services layers: cookiecutter --no-input ./.
Open the output directory and edit generated files as needed.
Verify generated code by running TF init/plan/apply.

Note: The template does not include sources for shared modules for obvious reasons. Thus, you won’t be able to perform step 5 with the provided template code. At the same time, the code will allow you to explore the solution.

Description of input parameters

All input parameters live inside cookiecutter.json. Below you can find the description of them:

process_name - the name of the process. Should use small letters “a-z” and “-” as a delimiter, for example: “int-payments”.
short_name - needed for the base layer for some resources, which have restrictions on the names’ length. Should use small letters “a-z” and “-” as a delimiter, for example: “int-pay”.
service_name - the name of the ECS service to create. Should use small letters “a-z” and “-” as a delimiter, for example: “api-retention”.
ECS service feature flags (valid options: “true” or “false”):
- is_es: whether the service needs OpenSearch;
- is_api: whether the service is API;
- is_sqs: whether the service needs SQS queues;
- is_base_rds: whether the service will share RDS cluster with other service(s);
- is_service_rds: whether the service needs its own dedicated RDS cluster;
- is_rds_secret: whether the service needs Secrets Manager secret for RDS;
- is_apigw: whether the service needs API GW;
- is_dynamo: whether the service needs Dynamo DB;
- is_ecs_migration: if you need to migrate existing service from another application group;
- is_autoscaling: whether the service uses autoscaling.
Adds the ability to create a Redis or Memcached on the base and/or service layer
- ec_base:
  - create: set this value to “true” to create the elasticache on the base layer;
  - engine: the engine for the elasticache (“redis” or “memcached”).
- ec_service:
  - create: set this value to “true” to create the elasticache on the service layer;
  - engine: the engine for the elasticache (“redis” or “memcached”).
envs: environments to create the configuration for. Possible values are:
- All: creates configurations for all environments, i.e. alpha, dev, staging, prod, dr;
- Custom list of environments, delimited by |, for example: “alpha|prod”. All possible values are: “alpha|dev|staging|prod|dr”.

How to evolve the template

When we would like to add new functionality to the template or perform code improvements, we might be tempted to modify it directly, but since code inside {{cookiecutter.parent_dir}} contains a lot of Cookiecutter statements, in many places, it can’t be properly parsed by the IDE, which makes adding significant portions of the new code inconvenient. I use the following workflow to overcome this:

generate code and backup it;
change generated code using IDE and test it;
compare new code with backup using some tool for automatic directory comparisons, and transfer new code to the template;
test code generation of the new feature;
submit PR, address suggestions and merge.

When adding a new feature, it is important to differentiate unique functionality, which will always be used inside the application TF module, and functionality, which might be needed by other applications, and place it in the appropriate TF modules. Let me elaborate on the example. In the template, you can see that {{cookiecutter.services_repo_name}}/modules/apps/data.tf file contains TF modules, which data are passed to service TF module {{cookiecutter.services_repo_name}}/modules/apps/{{cookiecutter.service_name}}. The question is why file data.tf is not placed inside the service TF module? The answer is that application groups tend to grow and rarely contain only one application, and when other applications are added to it, they will likely require the same data. If we place all these modules inside one application, we will either find ourselves extracting them to the parent of the service module or duplicating modules for the new services by copy-pasting. At the same time, we place module "tags" inside modules/apps/{{cookiecutter.service_name}} since tags are unique for each application. In short, the code is optimized for multiple services in the application group rather than one service.

Due to this there is intermediate TF module applications, which is used to declare all shared code, services, resources and data structures. For the same reason main.auto.tfvars files in the service layer have the following structure:

services = {
  {{cookiecutter.service_name_underscored}} = {
    ...
  }
}

rather than

{{cookiecutter.service_name_underscored}} = {
  ...
}

which makes it easy and natural to add configurations for other services:

services = {
  payments = {
    ...
  }

  customers_retention = {
    ...
  }
}

Why not use a new repository for each service?

Another approach is to have an individual repository per each service instead of using the services layer to house multiple related applications. While it is supported by the template out-of-the-box and can be viewed as a good option due to the reason of having all the code of each application in its separate repository, the CONS IMHO much outweigh the PROS, especially when the platform is growing, as it:

Adds additional overhead of creating/deleting application repositories, CI/CD pipelines for them, etc.
Makes copy-pasting practice standard: each of the repositories will contain a lot of common code similar to the one which you can find in the template, for example: variables, common environment configuration, data from data modules and specific values/data structures to pass to TF submodules. Instead of reusing the common code, it needs to be duplicated for each service since each requires a separate codebase.
Makes the platform too granular. This isn’t good due to the following reasons:
- Harder and more time-consuming to add changes to the common functionality or update shared modules. Indeed, let’s say we need to add a new environment, update shared module data_only_global_vpc to use the new version ASAP due to security reasons, or even bump AWS provider/Terraform versions due to a recently found Critical Vulnerability. It is much easier to do it for 10 repositories with 7 applications in each than for 70 repositories with 1 application in each.
- Significantly increases the work required to add code improvements to the IaC repositories.
- Due to a large number of repositories, the code in them can have the tendency to differ a lot over time, which hurts code standardization and developers’ convenience (imagine the necessity to browse the code for a big portion of the repository to figure out how to make necessary changes to it each time compared of having all repositories standardized when switching to another repository make you feel that everything is familiar).
- Makes harder to organize access to the shared functionality.
Makes IaC repositories not represent the platform well anymore:
- All ECS services live in a shared ECS cluster and can use shared resources (RDS and Redis clusters, etc.) but are defined in separate repositories as if they were completely independent and self-contained, which is not the case.
- Forces to define all shared resources in a base layer, even if logic says these should live in the services layer. For example, modules api_integration and api_routes, shared S3 bucket for applications JVM heap dumps, etc.

Conclusion

In this article, we discussed the solution for the rapid and easy creation of base and services repositories for the application group. It allows to standardize the code in the repositories and promote best practices. The template includes an extensive selection of feature flags, which allow you to conditionally add code for various features, such as RDS cluster, DynamoDB, SQS queues, API GW, etc. Using the provided code, you can create a customized solution for your own needs for real-life usage.

I hope you enjoyed this article and that you will find it helpful.

Happy coding!

Disclaimer: Code and article content are provided ‘as-is’ without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of code or article content.

You can find the sample source for the template here and the example of the generated code here.

« DYNAMIC RUNBOOK FOR COMPLEX INFRASTRUCTURE DEPLOYMENTS GLOBAL TERRAFORM CACHE »

Working with the cloud