Sergii Vershynskyi
Sergii Vershynskyi
Creator of this blog.
Aug 13, 2023 8 min read 1536 words

DYNAMIC RUNBOOK FOR COMPLEX INFRASTRUCTURE DEPLOYMENTS

Abstract: Working with an established infrastructure platform includes performing complex deployments requiring careful planning and execution. Static documents, in this case, are unsuitable and can easily result in production incidents. This challenge can be solved by using a custom interactive runbook solution.

Introduction

After the initial creation of the infrastructure in the production environments, all subsequent work is devoted to supporting the platform evolution by performing iterative changes. The requirement to do it with minimum downtime can lead to complex multistep deployment procedures, which require thorough planning to avoid production incidents and unexpected platform downtime. The use of static documents to describe deployment steps, in this case, is inconvenient as:

  • It is frequent to have resource names environment-specific, which can be solved by creating static documents per each environment. This is a manual process of copy-pasting, which is time-consuming and error-prone. For example, if the deployment spans 3 infrastructure repositories with 12 environments in each of them, then in case if after creating all documents and before doing the release, something changes in the platform, it can lead to the need to re-evaluate and change the steps in all or portion of 36 static runbooks.
  • Sometimes steps depend on other steps in a complex way, which can become wordy in a static document.

I have been trying to use different existing solutions to solve this problem, but have found all of them unsuitable for my purpose: use familiar technologies, which allow me to easily create flexible and highly customized solution with UI for building dynamic runbooks to perform complex infrastructure deployments.

Dynamic runbook internals

I decided to use HTML + JS to build the solution described below. It is basically a one-page static website that guides developers through a series of steps. Firstly, it gathers required input information and then provides deployment steps. The page is designed in such a way so that it:

  • is self-contained;
  • is simple to use and modify;
  • can be easily reused.

Each step in the runbook can be created using the following HTML:

<div class="section" id="section_1">
</div>

where id contains the unique section number.

There is a button “Next step” at the bottom of the page, which is used to go to the next step by calling the following code:

currentStep = 1;

function showNextStep(button) {
  switch(currentStep) {
    case 1:
      //do something here
      break;

    case 2:
      //do something here
      break;

    ...   
  }

  getById('section_' + currentStep).style.display = 'none';
  getById('section_' + (currentStep + 1)).style.display = 'block';

  currentStep++;
}

function getById(id) {
  return document.getElementById(id);
}

Since the logic for each dynamic runbook is different and custom, the developer is expected to express it with JS code and call it inside the switch statement above for each section.

A section can contain different elements. Let’s review their functionality below.

Text input fields

Text input fields are used to obtain information from the user, for example:

<div class="input_div"><label>Enter old repository name: </label><input type="text" id="old_repo_name" value="customer_apps_services_tf"></div>

When using the runbook, it is crucial to validate that developer at least has filled required information. For cases when runbook steps contain >2 text inputs, the validation code is structured in such a way so that it checks all of them at once, giving all the errors to the developer immediately, rather than getting an error for the first of them, filling it with the value, pressing the “Next step” button only to get the next error, and so on. This is a small thing, but it adds additional convenience. Here is how it works: let’s say we have 6 inputs in the runbook step. Then we can write the following code to read values from them:

resetValidation();
state.appName = readInput('app_name');
state.oldRepoName = readInput('old_repo_name');
state.modulePaths.old = readInput('old_module_path');
state.modulePaths.new = readInput('new_module_path');
state.oldClusterName = readInput('old_cluster_name');
state.taskDefinitionArn = readInput('task_definition_arn');
checkValidationOkOrThrow();

The function readInput reads the value from the input with the id equal to inputId and performs basic validation.

Let’s say we forgot to fill old_module_path and old_cluster_name inputs. In this case, this function changes the colour of inputs to red, places a cursor inside the old_cluster_name input, and sets the global variable state.allInputsValid to false:

function readInput(inputId) {
  const input = getById(inputId);
  value = input.value;

  if (value == '') {
    input.focus();
    input.className = 'error';
    state.allInputsValid = false;
  } else {
    input.className = null;
  }

  return value;
}

After reading all inputs, we call the function checkValidationOkOrThrow, which stops the execution of the script by throwing the exception:

function checkValidationOkOrThrow() {
  if (!state.allInputsValid) {
    throw("Input(s) should not be empty");
  }   
}

Then we fill in empty inputs and press the “Next step” button again. Since there were errors, we need to reset the state of state.allInputsValid before reading inputs again, which is done by the function resetValidation:

function resetValidation() {
  state.allInputsValid = true;
}

These are used to obtain user input from a set of predefined values, for example:

Select environment: <select id="environment">
  <option>alpha</option>
  <option>dev</option>
  <option>stag1</option>
  <option>stag2</option>
  <option>stag3</option>
  <option>stag4</option>
  <option>stag5</option>
  <option>stag6</option>
  <option>stag7</option>
  <option>prod1</option>
  <option>prod2</option>
  <option>prod3</option>
  <option>prod4</option>
  <option>prod5</option>
  <option>prod6</option>
  <option>prod7</option>
</select>

Its value can be obtained in the following way:

state.env = getById("environment").value;

DIVs with commands

These are used to store commands to execute, for example:

<div class="copyable">tf init</div>

They have class copyable, which automatically adds “Copy” buttons using the following code:

function addCopyButtons() {
  for (const el of document.getElementsByClassName('copyable')) {
    el.appendChild(createCopyButton());
  };
}

function createCopyButton() {
  const button = document.createElement("BUTTON");
  button.innerHTML = "Copy";
  button.className = 'copy';
  button.contentEditable = false;

  button.onclick = function() {
    copy(this);
  };

  return button;
}

function copy(button) {
  let element = button.parentElement;
  let text = element.firstChild.nodeValue;

  navigator.clipboard.writeText(text);
}

This is done for convenience during the deployment so that developer does not have to select and copy commands to execute but just click on the button instead. Also, it simplifies writing the deployment steps since JavaScript code is not mixed together with HTML. All you have to do is to add the class to the div element, and the button will be automatically added to it.

Commands frequently depend on user input or environment. To automatically change them, I use simple text templates, for example:

<div id="mvStatements" class='copyable'>...
tf state mv -state=../../../[OLD_REPO_NAME]/[ENV]/env-[ENV]/state.tfstate -state-out=state.tfstate '[OLD_MODULE_PATH].module.sqs' '[NEW_MODULE_PATH].module.sqs'
<div>

Then I modify their contents before showing to the developer using the following code:

getDiv("mvStatements").nodeValue = getDiv("mvStatements").nodeValue
  .replaceAll('[ENV]', state.env)
  .replaceAll('[OLD_REPO_NAME]', state.oldRepoName)
  .replaceAll('[OLD_MODULE_PATH]', state.modulePaths.old)
  .replaceAll('[NEW_MODULE_PATH]', state.modulePaths.new);

DIVs with TF plans

The other example of the convenient feature is hidden divs with TF plans, which become visible when clicking on the button “Show typical TF plan” and can be hidden again by clicking on the “Hide typical TF plan” button. These are useful when you would like to compare TF plan from the environment with the expected TF plan from the runbook. Below you can find an example of how to add this functionality:

<div class="tfplan">Terraform will perform the following actions:

  # module.applications.module.fraud_analysis.module.service.aws_alb_listener_rule.host_rule[0] will be created
  + resource "aws_alb_listener_rule" "host_rule" {
      + arn          = (known after apply)
      + id           = (known after apply)
      + listener_arn = "{LISTENER_ARN}"
      + priority     = 8802
      + tags_all     = (known after apply)

      + action {
          + order            = (known after apply)
          + target_group_arn = (known after apply)
          + type             = "forward"
        }

      + condition {
          + host_header {
              + values = [
                  + "fraud-analysis.app.*",
                ]
            }
        }
    }

    ...
</div>

As you can see from the example above, the div has class tfplan, and the JavaScript code below automatically adds all the other necessary functionality for your convenience:

function addToggleButtons() {
  for (const el of document.getElementsByClassName('tfplan')) {
    button = createToggleButton();
    insertAfter(el, button);
    toggle(button);
  };
}

function createToggleButton() {
  const button = document.createElement('BUTTON');

  button.onclick = function() {
    toggle(this);
  };

  return button;
}

function insertAfter(referenceNode, newNode) {
  referenceNode.parentNode.insertBefore(newNode, referenceNode.nextSibling);
}

function toggle(button) {
  let div = button.previousElementSibling;

  if (div.style.display === "none") {
    div.style.display = "block";
    button.innerHTML = "Hide typical TF plan";
  } else {
    div.style.display = "none";
    button.innerHTML = "Show typical TF plan";
  }
}

How to use the solution

The text above describes the functionality available in the runbook but does not provide the full picture of how to create one. For your convenience, I included the simplified example of the runbook, which was used to perform complex deployment, at the end of this article.

The code can also be helpful when you start creating your runbook. Indeed, each dynamic runbook is a custom solution, which we make to facilitate a particular complex infrastructure release process, and due to this, it requires coding. At the same time, having all the required functionality already created streamlines this process a lot.

Conclusion

This article describes the solution you can use to code a dynamic runbook to perform complex infrastructure deployments and avoid writing wordy static documents for every environment. It has a convenient UI, simple self-contained code and required building blocks for creating multi-step deployment procedure, with commands, which depend on environment and/or user input. Using the provided code, you can create a customized solution for your own needs for real-life usage.

I hope you enjoyed this article and that you will find it useful.

Happy coding!

Disclaimer: Code and article content are provided ‘as-is’ without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of code or article content.

You can find the full source for building this solution here.