Sergii Vershynskyi
Sergii Vershynskyi
Creator of this blog.
Jul 13, 2023 14 min read 2884 words

GLOBAL TERRAFORM CACHE

Abstract: Global infrastructure platform, which uses shared modules, is demanding Terraform to provide a global cache solution for saving disk space and developers’ time, which directly translates to cost savings. Due to the absence of an out-of-the-box solution baked into Terraform, the custom global cache has been developed. It significantly saves time to perform TF init process and dramatically reduces TF cache size by making it shareable across all TF repositories and environments. In addition, it allows easily navigating TF module source in the IDE.

Motivation

Note: This article assumes you have solid experience in Terraform and bash scripting.

Note: In this article, TF abbreviation stands for Terraform.

TF codebase grows together with the company platform growing and is usually split into numerous TF repositories owned by different teams. To promote code standartization and reusability, it is common to create shared TF modules, which are used in all TF repositories. While doing development using Terraform, we run tf init command a lot. This process creates or refreshes the local cache of TF providers and TF shared module sources. Current TF AWS provider size is > 350Mb, while the size of TF shared modules sources can vary based on different factors, for example, the number of references to shared modules in the particular repository. When the platform began to be used in several subcontinents, it made sense to have a dedicated staging environment for each production environment. For 5 subcontinents, we can have up to 12 environments per each TF repository: 5 production and 5 staging environments, together with a shared alpha and dev environments. While with current sizes of disks, it might seem to be a little, small things are constantly adding up, leading to colossal disk space occupied by the TF cache. Having 200 TF repositories, the hard requirement to store all TF modules in the mono repo with its size of about 20Mb, and an average number of shared TF modules of 30 per repository gives us the following total estimate of TF cache size for the whole platform: (30 modules * 20Mb + 350Mb TF AWS provider) * 12 environments * 200 repos ~ 2.3Tb. Please remember that TF shared modules cache usually consists of a vast number of small *.tf files, which means that the size on a machine’s disk will be even more significant. Due to this, purging cache on the local machine or TF bastion host is typical to save money or disk space. This solution is trading disk space for the developer time since, as we know to use Terraform, we need to have TF cache present.

The problem of slow/frequent TF inits and huge TF cache sizes can be solved by using a global TF cache, which is shared across all environments and repositories. Indeed, the contents of TF cache for each of 12 environments of the same repository is precisely the same but is stored 12 times (!). Moreover, all other TF repositories can reuse TF module sources downloaded for one repository. In addition, if the repository contains 200 TF module declarations with the same source version, which can be the case for some specialized repositories, TF downloads the source 200 times. Global cache stores module source only once and reuses it in all places.

Currently, Terraform does not provide a global cache out of the box. In this article, it will be shown how to build a global TF cache, which has been named TinyCache.

Requirements

Let’s come up with the requirements for global cache, which should:

  • be immutable, as:
    • it is shared across all TF repositories, and we don’t want to introduce infrastructure drift/errors when someone updates TF module source;
    • we don’t want to re-download all TF modules each time when we do TF init;
    • using immutable shared TF modules is generally considered a good practice.
  • be easy to use;
  • be fast;
  • be transparent (do not require the developer to run any additional commands);
  • not require to change repository sources for it to work after conversion is finished.

It would be nice if the cache would allow us to browse TF sources on our machines easily.

Global TF cache internals

How shared modules sources referenced

A natural candidate for allowing developers to browse shared TF modules internals would be to use local paths as their sources. Many modern IDEs with installed TF plugins allow you to navigate local TF modules easily and conveniently. And the good news is that we can use them as a foundation for building a global cache!

The cache should live in one specific path to be shareable across all of the repositories, and we want the local paths of modules sources in all repositories to be independent of the user’s decision of where to place TF repositories on their machines. We will use relative paths for it.

Let’s find out how to use relative paths for these purposes. For example, imagine our shared TF modules live in git repository modules_tf and are released using git tags. Thus, each module release is immutable, which satisfies requirements. If cache lives inside /Users/Shared/tf_cache/.modules_tf, and we clone the same repository repository_1_tf to the following paths

/Users/John/Tickets/1084/repository_1_tf
/Users/John/Hotfixes/repository_1_tf

then the relative paths for module with source

ssh://git@bitbucket.org/<workspace_ID>/modules_tf.git/modules/v0.12/ecs/service?ref=ecs-service-1.0.0

which lives in ./repository_1_tf/modules/application-a/app.tf file would be

../../../../../../Shared/tf_cache/.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service
../../../../../Shared/tf_cache/.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service

repository_1_tf cannot contain both path variants for the same module, not to mention that the portion Shared/tf_cache looks weird. To solve it, we can use symlinks. Indeed, creating symlinks

/Users/John/Tickets/1084/repository_1_tf/.modules_tf -> /Users/Shared/tf_cache/.modules_tf
/Users/John/Hotfixes/repository_1_tf/.modules_tf -> /Users/Shared/tf_cache/.modules_tf

gives us the same module paths in both repositories:

../../.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service
../../.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service

The number of ../../ in the module source path can vary. It allows us to easily estimate the nesting level of the module in the repository.

Here we expect modules_tf git repository source with tag ecs-service-1.0.0 to be in the directory /Users/Shared/tf_cache/.modules_tf/ecs-service-1.0.0.

Populating cache automatically

The example above expects that the directory /Users/Shared/tf_cache/.modules_tf needs to be automatically populated. For this purpose, bash script fill_tf_cache.sh is written. Let me describe how it is working.

First we create $TF_MODULES_CACHE_DIR and repository_1_tf/.modules_tf symlink to it:

modules_dir_name='.modules_tf'
 
if [[ -z "${TF_MODULES_CACHE_DIR}" ]]; then
  TF_MODULES_CACHE_DIR="/Users/Shared/tf_cache/$modules_dir_name"
fi
mkdir -p $TF_MODULES_CACHE_DIR
 
ln -sf $TF_MODULES_CACHE_DIR '../'
cd "../$modules_dir_name"

This code allows us to easily override the default TF modules cache directory without editing the script if needed by defining it in the environmental variable, for example: export TF_MODULES_CACHE_DIR="/home/tf/tf_cache/.modules_tf".

Second, we search the sources in the whole repository for the lines, which include .modules_tf:

search_result=$(grep -rh "/$modules_dir_name/" .. --include \*.tf --exclude-dir=.terraform --exclude-dir=$modules_dir_name | grep -v '^ *#' | grep -vE '^ *(\/)')

Note: We don’t want to process commented modules sources here, so we added grep -v '^ *#' | grep -vE '^ *(\/)'. Also, cache directories should not be processed, which is achieved by using the following command-line switches --exclude-dir=.terraform --exclude-dir=$modules_dir_name.

Such lines will come from module declarations, for example:

module "ecs_cluster" {
  source = "../../../.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service"

  ...
}

Finally we populate the cache:

while IFS= read -r line ; do
  module_git_tag=$(echo ${line##*/$modules_dir_name/} | cut -d'/' -f 1)
 
  if [ ! -d $module_git_tag ]; then
   git clone --depth=1 -c advice.detachedHead=false --branch $module_git_tag git@bitbucket.org:<workspace_ID>/modules_tf.git $module_git_tag
   echo "---"
  fi
done <<< "$search_result"

Note: Here we are using --depth=1 to reduce the size of .git directory by truncating history to 1 commit.

Populating cache transparently

As the introduction mentions, each repository can have up to 12 environments. Environment-specific configuration is stored in env-{ENV_NAME}, and the main TF code is located inside the modules directory. Thus, the typical directory structure in repository_1_tf can look as follows:

env-alpha
env-dev
env-stagus
... [other staging environments here]
env-produs
... [other production environments here]
modules

When the developer provisions infrastructure, they run terraform init, terraform plan and terraform apply commands in one of the env-{ENV_NAME} directories. Due to the requirements for the cache to be transparent and easy to use, terraform init command needs to be somehow modified to run fill_tf_cache.sh script. This is easy to do with the help of TF CLI helper script described here. This script can be used on the bastion host to provision infrastructure, as well as on local machines. We can add the following code to the helper script:

if [[ ${@:1:1} == 'init' ]]; then
  if [[ -f ../is_tiny_cache_repo ]]; then
    sh fill_tf_cache.sh
    #place code for TF plugin cache here
  fi
fi
 
$binary "$@"

This way, every time the developer runs tf init command, fill_tf_cache.sh script will be invoked first, populating TF cache, and then TF init command will be run, thus finalizing the process.

If you have many TF repositories, transitioning to the global cache will take time. During this transition, other repositories should still be possible to use. For this reason, we place is_tiny_cache_repo empty marker file at the root of every converted TF repository. The use of marker file allows to avoid scanning *.tf files inside the repository for the signs that it is already converted, and also allows developers to see if the repository is already using TinyCache easily.

TF plugin cache

TF natively supports downloading its providers to a shared directory and re-using them instead of downloading them every time when they are needed. It makes perfect sense to use this cache since the current TF AWS provider size is > 350Mb, and it occupies > 4Gb just for one repository with 12 environments. We can set up this cache by adding the following code after sh fill_tf_cache.sh:

if [[ -z "${TF_PLUGIN_CACHE_DIR}" ]]; then
  TF_PLUGIN_CACHE_DIR="/Users/Shared/tf_cache/.providers"
fi

mkdir -p $TF_PLUGIN_CACHE_DIR
export TF_PLUGIN_CACHE_DIR=$TF_PLUGIN_CACHE_DIR

This code allows you to easily override the default TF plugin cache directory without editing the script.

Conversion

Let’s imagine that you are using git as a source for your shared TF modules in your repositories. Then modules sources will look something like this:

module "ecs_cluster" {
  source = "ssh://git@bitbucket.org/<workspace_ID>/modules_tf.git/modules/v0.12/ecs/service?ref=ecs-service-1.0.0"

  ...
}

While it is possible to rewrite all git references to TinyCache format manually, let’s help teams do the conversion by automating this process.

First, let’s find all git sources in the repository:

search_result=`grep -r 'git::ssh://git@bitbucket.org/<workspace_ID>/modules_tf.git' . --include \*.tf --exclude-dir=.terraform --exclude-dir=.modules_tf`

Second, let’s parse the results, transform git source to TinyCache and update *.tf files in the repository to use the new format:

while IFS= read -r line ; do
  file_path=`echo $line | cut -d':' -f 1`
 
  module_source=`echo $line | cut -d'"' -f 2`
  module_git_tag=`echo $module_source | cut -d'=' -f 2 | cut -d'&' -f 1`
  module_path=`echo ${module_source##*/modules_tf.git} | cut -d'?' -f 1 | tr -s '/' `
 
  file_dir=`dirname $file_path`
  relative_path_to_modules_dir=`realpath -m --relative-to=$file_dir ./.modules_tf`
  new_module_path="$relative_path_to_modules_dir/$module_git_tag/$module_path"
 
  sed -i '' "s#$module_source#$new_module_path#g" $file_path
done <<< "$search_result"

Finally, let’s create TinyCache marker file in the root of the repository:

touch is_tiny_cache_repo

Note: The conversion script should be run from the root of your TF repository.

If you use git for your repositories, it makes sense to add .modules_tf to .gitignore.

During conversion, the shared modules source should be changed to be path-agnostic since absolute paths on the local machine, bastion host and CI/CD pipeline will most likely be different, and this will generate drift. In my case, there was only one such place in the whole modules_tf:

resource "aws_s3_bucket_object" "lambda_source" {
  source = format("%s/source.zip", path.module)
  ...
}

After changing the code for:

resource "aws_s3_bucket_object" "lambda_source" {
  content_base64 = data.local_file.source_stub.content_base64
  ...
}

data "local_file" "source_stub" {
  filename = "${path.module}/source.zip"
  ...
}

and applying the drift went away.

How to run TinyCache

Follow the next steps to run TinyCache locally:

  • Place scripts tf and fill_tf_cache.sh into ~/bin directory (or any other directory in your PATH environment variable). Make tf shell script executable (chmod +x ./tf);
  • Download TF CLI, rename it (for TF v1.5.1 name should be tf1.5.1), and place it into ~/bin;
  • Clone TF repository, which is configured to use TinyCache, open zsh in the environment folder of your repository (i.e., env-dev, etc.) and run tf init. First, TF init will take some time, as TinyCache will create and populate the cache. You can run tf plan to verify that init worked.
  • Navigate to env-staging and run tf init again. Usually this finishes in less than 20 seconds.
  • Open your IDE with TF plugin installed, and use it to navigate the code of shared TF modules downloaded from modules_tf by TinyCache.

To run TinyCache on a secure bastion or TF CI/CD pipeline, you can use the same scripts or modify them depending on their architecture and implementation details. For example, you can use some directory on CI/CD pipeline runners, which is set as shared caching mounting volume to share cache between all runners.

Procedure to develop new shared modules

When developing a new shared TF module version, we create a new git branch, test it and submit the PR, as a result of which we might need to make code changes and test again. TinyCache is immutable, the same as TF cache inside .terraform directory, which means that the old module source needs to be deleted from the cache for the branch update to happen. Also, when merging the PR we delete the development branch, and if TinyCache is used, then we will pollute the cache with source, which won’t be used ever again. Due to this, we usually switch to using git during development and change back to TinyCache in consumer TF repositories when a new module is released in modules_tf.

For example, let’s imagine that we are using the ecs service TF module in repository_1_tf, and we are tasked to add new functionality requiring module source modification. To do this, we create a new branch in the git repository modules_tf named JIRA-1234, where we put the required features. To test our changes, we switch the ecs service TF module source in repository_1_tf to use code from our branch by commenting old source and adding git branch reference like this:

module "ecs_cluster" {
  //source = "../../../.modules_tf/ecs-service-1.0.0/modules/v0.12/ecs/service"
  source = "ssh://git@bitbucket.org/<workspace_ID>/modules_tf.git/modules/v0.12/ecs/service?ref=JIRA-1234"

  ...
}

Let’s say we found some issues during the testing. In this case we fix the code in our branch, retest and submit the PR. During the peer review process we might need to modify code in our branch to address suggestions and retest. After concluding peer review process we release the new ecs module version in modules_tf with the new git tag ecs-service-1.1.0. Finally, we switch code to use git tag in target TF repository repository_1_tf:

module "ecs_cluster" {
  source = "../../../.modules_tf/ecs-service-1.1.0/modules/v0.12/ecs/service"

  ...
}

Results and discussion

After switching from git to TinyCache TF init became almost instant (tens of seconds now vs tens of minutes with the old cache) regardless of the size of the TF repository and number of references to modules in it, and modules cache size is significantly reduced (of the order of hundreds of megabytes vs tens of gigabytes per repository). Time savings are especially important, given the developer needs to populate or refresh TF cache for all environments (12 in the article), even for releasing a small change.

TinyCache is helping developers in day-to-day activities, as well as will be a huge advantage and can lead to faster restoring process in the DR scenarios. It can work on the laptop, secure bastion host, and TF CI/CD pipelines.

On the laptop, it gives the additional advantage of not having to spend time cleaning it from the old TF cache to spare some disk space, which was scattered in many TF repositories and can be found in every env-* directory inside them. Another huge benefit is that it is easy to navigate shared TF module sources in the IDE after running tf init. I have found this feature very useful since when working with many TF repositories, which use different versions of shared modules, it is frequent to have the need to browse the code of shared TF modules and the need to do separate operations just for it distracts from doing the actual work.

TinyCache is flexible and can work with almost any service, which is able to store TF modules source, not just git. It is as simple as replacing git clone in fill_tf_cache.sh with another command. For example, you can easily switch to using S3, or even to the service, which TF does not natively support.

Another interesting feature of TinyCache is that if code in modules_tf references other modules using git, then this will work as well, but will be cached inside .terraform directory, i.e. per repository and per environment. This is happening because we call TF init after fill_tf_cache.sh script, and the source is processed by TF natively as a whole; thus any source type supported by TF or their mix will work.

In summary, TinyCache saves a lot of time, allows to move faster, and to work and focus on other activities. And as we all know, time is money.

Conclusion

In this article, we discussed the solution for building a global TF cache, which speeds up the developer’s work by making the TF init process almost instant while also hugely reducing TF cache size. Using the provided code, you can create a customized solution for your own needs for real-life usage.

I hope you enjoyed this article and that you will consider starting to use TinyCache for your infrastructure platform.

Happy coding!

Disclaimer: Code and article content are provided ‘as-is’ without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of code or article content.

You can find sample sources for building the solution here.