Is Your Terragrunt Truly DRY? A Smarter Approach to Infrastructure Management

Infrastructure as code, most commonly implemented by Terraform, is the most common way of managing modern cloud environments. But let’s face it: as your infrastructure grows, so does its complexity. When you’re juggling multiple environments and modules, keeping your Terraform or OpenTofu code clean, consistent, and manageable often feels like an impossible balancing act. This is where Terragrunt comes in. It is a thin wrapper above Terraform meant to help make your infrastructure DRY (Don’t Repeat Yourself) and maintainable. But here’s the catch—does it really? While Terragrunt boldly promises to eliminate redundancy and streamline your codebase, many users find themselves falling back into the familiar pattern of copy-pasting, with folders multiplying as more and more environments are added and configurations quietly drifting apart. So, is Terragrunt as “DRY” as it claims to be?

In this article, we’ll dig into the common pitfalls of a typical Terragrunt project setup, challenge its weaknesses, and explore a smarter, more scalable architecture. Throughout, we’ll demonstrate key points using example code. You can access the full code examples here. Let’s start by looking at a common structure for a Terragrunt project:


.
├── envs
│   ├── _envcommon
│   │   ├── ec2.hcl
│   │   └── vpc.hcl
│   ├── dev
│   │   ├── ec2
│   │   │   └── terragrunt.hcl
│   │   ├── vpc
│   │   │    └── terragrunt.hcl
│   │   └── env.hcl     
│   ├── staging
│   │   ├── ec2
│   │   │   └── terragrunt.hcl
│   │   ├── vpc
│   │   │    └── terragrunt.hcl
│   │   └── env.hcl
│   └── prod
│       ├── ec2
│       │   └── terragrunt.hcl
│       ├── vpc
│       │    └── terragrunt.hcl
│       └── env.hcl
├── modules
│   ├── ec2
│   │   └── main.tf
│   └── vpc
│       └── main.tf
└── root.hcl

What are the weak points of this architecture?

1. Too Many Folders and Files
  Each environment requires its own folder with sub-folders for every stack component. While manageable for small setups, this structure quickly becomes bulky and hard to navigate as the number of environments and components grows.
2. Duplicated terragrunt.hcl Files
  Even with a shared root.hcl, each component still needs its own terragrunt.hcl. This violates the DRY principle and increases maintenance, leading to inevitable configuration drift as changes are not made to all files.
3. Scattered Input Variables
  Input variables are spread across multiple files like env.hcl, _envcommon.hcl, and terragrunt.hcl. Inconsistent definitions using inputs or local blocks make tracking inputs or comparing configurations between environments difficult.
4. Limited Flexibility for Shared Variables
  There’s no easy way to define variables for subsets of environments. You’re limited to global or environment-specific variables, making it hard to manage shared inputs for all non-prod or dev environments.

So, the solution we are looking for is a cleaner architecture that reduces clutter, eliminates duplication, and simplifies variable management across environments while staying true to the DRY principle. Let’s see how to achieve it.

Let’s begin.

As a first step, instead of a folder for each env, we will create a global template that will be used for all envs. A bit like a terraform module, we want one Terragrunt folder that can be referenced multiple times to create different environments. That way, adding a new env will be very simple – just use the existing template! No need to copy-past. And when you want to change or add something to the terragrunt.hcl, You only need to do it in this one file. No need to go into each environment’s terragrunt.hcl separately.

Let’s understand what I mean. Our new folder structure will look like this:


.
├── environment
│   ├── _envcommon
│   │   ├── dev.tfvars
│   │   ├── prod.tfvars
│   │   └── staging.tfvars
│   ├── ec2
│   │   ├── terragrunt.hcl
│   │   └── tfvars
│   │       ├── default.tfvars
│   │       ├── non-prod.tfvars
│   │       └── prod.tfvars
│   ├── vpc
│   │   ├── terragrunt.hcl
│   │   └── tfvars
│   │       └── default.tfvars
│   └── envs.yaml
├── modules
│   ├── ec2
│   │   └── main.tf
│   └── vpc
│       └── main.tf
└── root.hcl

The changes are:

Instead of a folder for each env, one environment folder that will be used for all terragrunt runs. In it are the same sub-folders for the different components of our infra stack.
In each of the components folders there is a folder called tfvars, in which are stored all the configuration variables for that component. Wherever possible, Inputs are declared in .tfvars files, and not directly in input blocks or in local blocks.
In each environment/*/tfvars/ folder there is a default.tfvars that stores global variables that are shared among all environments.
Other than the default.tfvars, there can be tfvars for specific envs (e.g. dev.tfavrs, staging.tfvars), or for some combination of envs (e.g. non-prod.tfvars, qa.tfvars)
Variables that are specific to one environment, but are shared among all the env’s components are stored inside _envcommon/.
In the inputs block in each components terragrunt.hcl we will keep only the inputs that are declared from the dependency. Any other inputs are declared in the relevant .tfvars file.

OK, so we have our new structure, and we have put all the variables into tfvars files in an orderly fashion. But how do we tell terragrunt how to run?

For that, we have the two main files: root.hcl and envs.yaml.

Lets look at the envs.yaml:


dev:
 aws_region: "us-east-1"
 vpc: 
   inputs:
     vpc_name: "dev-vpc"
     vpc_cidr: "10.1.0.0/16"
   tfvar_files:
     - default.tfvars
     - dev.tfvars
 ec2:
   inputs:
     instance_name: "dev-instance"
   tfvar_files:
     - default.tfvars
     - non-prod.tfvars

prod:
 aws_region: "us-east-1"
 vpc: 
   inputs:
     vpc_name: "prod-vpc"
     vpc_cidr: "10.2.0.0/16"
   tfvar_files:
     - default.tfvars
     - non-prod.tfvars
 ec2:
   inputs:
     instance_name: "prod-instance"
   tfvar_files:
     - default.tfvars
     - prod.tfvars

The structure is pretty simple – for each env we want Terragrunt to deploy, we create a new top-level key and give it the environment’s name. Nested inside are all the configurations Terragrunt needs to know about in order to deploy the resources: the AWS region to run in, the components to create and the inputs to pass. You can see that in addition to passing the tfvars files to use, I can also put direct inputs under the inputs key. This is a personal decision, you can configure the yaml file to your needs and conventions.

What I like about this file is how easy it is to read. I can see in one glance what envs I have, what components I have in each, and where all the values are passed to the environment. Also, adding a new environment is as simple as adding a new top-level block, calling it qa, and assigning it the right values files. No need to create a whole new folder and copy the terragrunt.hcl files into it.

We still don’t understand how this is run by Terragrunt, as we configured everything using yaml and not hcl. All the pieces fall into place when we look at the root.hcl file. This is where the real power of Terragrunt shines.

The first thing that Terragrunt needs to know when we run any command like Terragrunt apply, is in what context it is running in. The context is the env we want to update, or more specifically – the top-level key in the envs.yaml that Terragrunt should use in this specific run. We will use an environment variable that we will export in advance and access in the locals block of the roo.hcl file.


locals {
 env = get_env("TF_VAR_ENV")
}

We can now use local.env and some basic Terragrunt functions to extract the configuration for this run from the envs.yaml.


locals {
 env = get_env("TF_VAR_ENV")
 # Extract the top level block with the env name as the key
 env_config       = yamldecode(file("${get_terragrunt_dir()}/../envs.yaml"))[local.env]
 # Extract the aws_region from the env_config
 aws_region       = local.env_config["aws_region"]
 # Extract the config for the current folder (component) we are in
 component        = basename("${get_original_terragrunt_dir()}")
 component_config = local.env_config[local.component]
 # Extract the direct inputs from the component_config
 inputs           = try(local.component_config["inputs"], {})

 # Extract the env tfvar file
 env_tfvars_file = "${get_terragrunt_dir()}/../_envcommon/${local.env}.tfvars"
 # Extract the additional tfvars files from the component_config
 tfvars_files     = try([for file in local.component_config["tfvar_files"]: "${get_terragrunt_dir()}/tfvars/${file}"], [])}

Example:


# Set the current context before running terragrunt:
cd environment/vpc
export TF_VAR_ENV=dev


# The local values for this terragrunt run will be:
local.env = “dev”


local.env_config = { 
  dev: {
    aws_region: "us-east-1",
    vpc: {
      inputs: {
        vpc_name: "dev-vpc",
        vpc_cidr: "10.1.0.0/16"
      },
      tfvar_files: ["default.tfvars", "dev.tfvars"]
    }
  }
}


local.env_region = "us-east-1"


local.component = "vpc"


local.component_config = { 
  inputs: {
    vpc_name: "dev-vpc",
    vpc_cidr: "10.1.0.0/16"
  }
}


local.inputs = {
  vpc_name: "dev-vpc",
  vpc_cidr: "10.1.0.0/16"
}

To tell Terragrunt to use the local.tfvars_files , we will add them to the terraform block:



terraform {
 extra_arguments "tfvars" {
   commands = get_terraform_commands_that_need_vars()


   optional_var_files = concat([local.env_tfvars_file], local.tfvars_files)
 }
}

Making it optional will allow Terragrunt to run even if it doesn’t find the file.

I also added some terminal output that will echo the context Terragrunt is running in, which can be useful for making sure we are not making changes to the wrong environment:


terraform {
 …


 before_hook "before_hook" {
   commands     = ["apply", "plan", "init"]
   execute      = ["echo", "-e", "\\n\\033[1;33m====================================\\n🚀 STARTING TERRAGRUNT IN ENV: ${local.env}\\n====================================\\033[0m\\n"]
 }


 after_hook "after_hook" {
   commands     = ["apply", "plan", "init"]
   execute      = ["echo", "-e", "\\n\\033[1;32m====================================\\n✅ FINISHED TERRAGRUNT IN ENV: ${local.env}\\n====================================\\033[0m\\n"]
   run_on_error = true
 }
}

The output will look like this:

The remote state will use the local.aws_region. Also, we must update the way Terragrunt stores the tfstate files. In common Terragrunt usage, we use the folders structure as the paths for storing the tfstate:


key            = "${path_relative_to_include()}/terraform.tfstate"

The result bucket key, as an example, is:dev/ec2/terraform.tfstatestaging/ec2/terraform.tfstate

If we use this in the new structure, the key will always result to environment/ec2/terraform.tfstate, which will cause conflicts and result in terraform destroying our resources.
To solve the problem of running different environments from the same folder path, we must change the way we structure the key:


remote_state {
 backend = "s3"
 generate = {
   path      = "backend.tf"
 }
 config = {
   bucket         = "terraform-state"
   key            = "${local.env}/${local.component}/terraform.tfstate"
   region         = local.aws_region
   encrypt        = true
   dynamodb_table = "terraform-locks"
 }
}

This will now result in the same keys as we had before by using the${path_relative_to_include()}.

And to wrap things up, lets not forget to pass the local.inputs


inputs = merge(local.inputs, {})

And there you have it! A real “Dry and maintainable” Terragrunt project. To run this all you need to do is the cd to any folder in the environment dir, export the TF_VAR_ENV, and run terragrunt apply. Then you can change the value of TF_VAR_ENV, and run again to deploy a different env.

This is a simple setup, and the options on top of this are endless. Once you understand the relationship between the terragrunt.hcl, root.hcl and envs.yaml, you can template and automate almost anything.

I’ll give you a small last example, if you made it this far with me:
If you look at the envs.yaml file above, currently, we are passing the vpc_name and instance_name as inputs for each environment.


dev:
 vpc: 
   inputs:
     vpc_name: "dev-vpc"
 ec2:
   inputs:
     instance_name: "dev-instance"


prod:
 vpc: 
   inputs:
     vpc_name: "prod-vpc"
 ec2:
   inputs:
     instance_name: "prod-instance"

The names have a known pattern. They always start with the env name and continue the same. So, instead of hardcoding the names, we can template it inside the terragrunt.hcl files, like this:


# environment/vpc/terragrunt.hcl
inputs = {
 vpc_name = "${include.root.locals.env}-vpc"
}


# environment/ec2/terragrunt.hcl
inputs = {
 vpc_id              = dependency.vpc.outputs.vpc_id
 subnet_id           = dependency.vpc.outputs.private_subnets[0]
 security_group_id   = dependency.vpc.outputs.security_group_id


 instance_name = "${include.root.locals.env}-instance"
}

The include.root.locals.env references the local block of the root.hcl file, where we get the env name from the TF_VAR_ENV variable.


locals {
 env = get_env("TF_VAR_ENV")
}

With this small update, we reduced redundancy and automated conventions, which can help keep our cloud environment clean, well-organized, and free of human error.

By restructuring your Terragrunt project with this DRY and modular approach, you’ll simplify your environment management and reduce maintenance overhead and the risk of configuration drift over time. This setup ensures clarity, scalability, and ease of use. With Terragrunt, the possibilities for automation and optimization are virtually limitless—your infrastructure can finally work smarter, not harder.

Is Your Terragrunt Truly DRY? A Smarter Approach to Infrastructure Management

Next Readings

Atlassian Forge: The Platform DevOps Engineers Should Know About (And Why I Built an App on It)

What Starts as a Simple Jenkinsfile Can Become a Scaling Problem

Mismatched Requests and Limits Are Lying to Your Scheduler – And It’s Costing You