From Monolithic to Microstack: Revolutionizing Pulumi for Scalable Infrastructure

Infrastructure as Code (IaC) is crucial for managing modern, scalable infrastructure. But managing infrastructure in a dynamic and growing organization is no easy feat. Pulumi is a powerful IaC tool that allows you to write infrastructure code in familiar programming languages like TypeScript, Python, Go, and more. This innovative approach enables you to manage cloud resources effectively and efficiently. However, as projects grow and teams expand, managing Pulumi stacks efficiently becomes essential to avoid bottlenecks, conflicts, and inefficiencies. In this article, we present our solution for managing Pulumi stacks in a team-based environment to address the challenges of scaling and maintaining infrastructure. This article provides valuable practical insights and solutions whether you’re considering using Pulumi or already facing similar infrastructure management issues.

Challenges using Pulumi

We chose to use Pulumi for Infrastructure as Code (IaC) and adopted TypeScript as our programming language. Our project was organized with a clear folder structure:

Components Folder: Contains all the common models for all projects.
Admin Folder: Manages all the Pulumi-related administration tasks.
Utils Folder: Houses various utility scripts.
Projects Folder: Contains all the infrastructure code for each individual project.

Initially, for each project, we had three stacks: dev, stg, and prod. However, as the project grew, we encountered increasing difficulties in managing these stacks. These challenges were not unique to Pulumi; we had faced similar issues with other IaC tools as well.

Scalability Issues: As projects expand, state files inevitably grow, causing scalability challenges similar to what we encountered.
Collaboration Conflicts: Concurrent work on shared state files often leads to integration conflicts and delays, hindering overall progress.
Troubleshooting Complexities: Managing large, complex state files complicates troubleshooting, increasing downtime and team stress.

Many teams, especially those managing large and growing projects, face challenges similar to ours. Whether you’re currently managing or planning to handle substantial resources, recognizing and addressing these challenges is crucial for maintaining efficiency and reliability.

Our Experience and the Difficulties with Using Monostack

Concurrent Work Delays: Working on the same stack concurrently meant we had to wait for one run to finish before adding new resources, causing delays and reducing productivity.
Large and Cumbersome State File: As the state file grew larger, it became cumbersome to manage. Loading the state file took a lot of time, slowing down our workflow.
Diverse Resource Update Frequencies: We had resources that changed frequently and others that rarely needed updates. Running a stack that included both types of resources was inefficient.
Difficult Troubleshooting: When problems occurred, it was challenging to address them with everything in the same stack. Troubleshooting was complicated and time-consuming due to the intertwined nature of the resources.

Understanding these challenges is pivotal for creating a more manageable and scalable infrastructure. By learning from our experiences, you can:

Gain Insightful Strategies: Avoid common pitfalls and adopt effective strategies for managing large-scale projects.
Implement Relevant Solutions: Apply solutions directly applicable to your infrastructure challenges, ensuring efficiency and scalability.
Enhance Collaboration: Improve team collaboration by addressing issues arising from shared state files, fostering a more productive work environment.

Managing a growing project with a single Pulumi state

Initially, our team managed all resources under a single Pulumi project, divided into the typical development (dev), staging (stage), and production (prod) environments. At first, this approach seemed straightforward and organized, as everything related to a particular project was kept under one umbrella. However, as the project grew, so did the number of resources and the size of the state file. This approach quickly became cumbersome, making it difficult for multiple team members to work on the project simultaneously and complicating the troubleshooting process.

The Monolithic stack challenges we faced with Pulumi

1. Increasing State File Size

Problem: As the project expanded, the state file grew larger and larger. This increase in size made the state file unwieldy and time-consuming to manage.
Impact: Large state files slowed down deployments and made it challenging to track changes or roll back to previous states. This complexity also increased the risk of errors during deployments and updates.

2. Concurrent Work on the Same State

Problem: With multiple team members working on the same environment and state file, conflicts became frequent. This concurrent work on the same state led to integration issues and potential overwrites, causing delays and increasing the risk of deployment failures.
Impact: This made it difficult for team members to work independently without affecting each other’s progress, leading to a bottleneck in the development process and slowing down overall project velocity.

3. Complicated Troubleshooting

Problem: When something broke, pinpointing the issue became a complex task due to the intertwined nature of the resources and the large state file. This complexity was especially challenging in production environments where quick resolution is critical.
Impact: Debugging and resolving issues required sifting through a massive state file and understanding the interdependencies between various resources, which significantly increased the time to resolution and impacted system reliability.

The solution: Splitting stacks by resource and environment

To overcome these challenges, we decided to reorganize our stack structure to support team-based development and scalable infrastructure management better.

1. Stack Organization: `org -> project -> resource-environment`

Instead of managing all resources under a single stack, we adopted a more modular approach:

Organization: The top-level entity representing the company or a major team.
Project: Represents an application or department within the organization.
Resource-Environment: Each stack is dedicated to a specific resource type and environment. For example:


my_org/my_project/eks-dev
	my_org/my_project/eks-prod
	my_org/my_project/db-dev
	my_org/my_project/db-prod

2. Isolated State Files

Each stack now has its own state file, isolating the state of different resources and environments. This reduces the size of each state file, improves deployment speed, and makes it easier to manage changes.

3. Independent Resource Management

By segregating resources, each team member can work on a specific resource or group without affecting others. This allows for parallel development and reduces conflicts.

Implementing the new stack structure

To understand our approach, it’s important to first grasp the basic steps required to run Pulumi with TypeScript. You start by creating an index.ts file, which serves as the main entry point for your Pulumi program. Then, for each stack, you create a YAML file named Pulumi.<stack-name>.yaml to define stack-specific variables. Additionally, you have a separate YAML file named Pulumi.yaml for project-level variables. When you run pulumi up, the program first executes the index.ts file.

1. Centralized Resource Management

We created a central index.ts file to manage all resources within the project. This file reads the stack configuration and applies the necessary settings based on the stack name. The index.ts file primarily calls a function named runFactory, passing in the project name and a map of resources. The runFactory function, located in a utility file, ensures that only the relevant resources for the specific stack are executed.

Index File Example (TypeScript)


// index.ts
import { runFactory } from './utils/utils';

import * as api from './api';
import * as queue from './queue';
import * as db from './db';
import * as schedule from './schedule';
import * as repository from './repository';

type TResources = 'api' | 'queue' | 'db' | 'schedule' | 'repository';

export const data = runFactory('my_project', {
  schedule: schedule,
  db: db,
  api: api,
  queue: queue,
  repository: repository,
});

2. Utility Mechanism for Stack Management

We developed a utility mechanism to manage our stacks efficiently. Here’s how it works. When runFactory is executed, it first calls the fetchStack function, which extracts the stack name and splits it into two parts: the resource and the environment. The function returns these as a list [resource, env]. With the resource extracted from the stack name, runFactory then selects and runs only the relevant resources for that stack. This mechanism ensures that only the necessary resources for the specific stack are deployed, optimizing the process and avoiding unnecessary actions. Stack Naming Convention: We adopted a naming convention that includes the organization, project, resource, and environment, e.g., my_org/my_project/my_resource-env.

Utility Script: We created a utility script to parse and handle stack names, ensuring consistency across our infrastructure.

Utility Script Example (TypeScript)


c


  return [params.join('-'), env] as const;
};

export const runFactory = (projectName: string, factory: TResourceFactory) => {
  const [resource, env] = fetchStack();

  const factoryObject = factory[resource];

  const $projectName = factoryObject.projectName || `${projectName}-${resource}`;

  return factoryObject[env](`${$projectName}-${env}`);
};

3. Environment-Specific Resource Scripts

To further enhance flexibility and separation, we created environment-specific scripts for resource management. These scripts ensure that common resource configurations are shared across environments while allowing for environment-specific customizations.

Environment-Specific Scripts Example (TypeScript)



// repository.ts
import { FledgedRepository } from './components/ecr/ecr;

function common(projectId: string) {
    new FledgedRepository(projectId);
}

// Development environment-specific resources
export function dev(projectId: string) {
    common(projectId);
    // Additional dev-specific resources can be added here
}

// Production environment-specific resources
export function prod(projectId: string) {
    common(projectId);
    // Additional prod-specific resources can be added here
}

By splitting the resource management into environment-specific scripts, we can handle common configurations in a shared function and apply environment-specific settings as needed. This approach reduces duplication and ensures consistency across environments.

4. Stack-Specific Configuration Files

Each stack has a dedicated configuration file named pulumi.<stack-name>.yaml. This file stores the configuration specific to the stack, ensuring that changes are isolated and easy to manage.

Configuration File Example


# pulumi.repository-dev.yaml
config:
  aws:defaultTags:
    tags:
      Environment: Development
      Project: My_project:Repository
  aws:region: us-east-1

5. Managing Common Changes Across Multiple Stacks

While this modular approach brings many benefits, it also introduces a challenge: applying common changes across multiple stacks can be cumbersome. If you need to update a shared configuration or resource definition, you must do so individually for each stack. There is no command to automate updates across all stacks, requiring manual intervention to ensure consistency.

Benefits of the Micro stack

1. Improved Scalability

The modular stack structure allows our infrastructure to scale more easily. Adding new resources or environments doesn’t require modifying large state files or complex configurations.

2. Enhanced Collaboration

With isolated stacks, team members can work on different parts of the infrastructure simultaneously without causing conflicts or integration issues.

3. Simplified Troubleshooting

Isolating resources into specific stacks makes it easier to identify and resolve issues. Each stack is smaller and more focused, making debugging and troubleshooting more straightforward.

Conclusion

Managing Pulumi stacks by organizing them into org -> project -> resource-environment has significantly improved our ability to scale and maintain our infrastructure. This approach allows for better team collaboration, reduces the complexity of managing state files, and simplifies the process of deploying and managing resources. However, this method’s limitation is the lack of a command to simultaneously apply changes across all stacks. Future enhancements may include developing a custom tool or script to automate updates across stacks, further streamlining the process and ensuring consistency. By adopting a structured approach and leveraging Pulumi’s capabilities, we have created a more efficient and scalable infrastructure management system that meets the needs of our growing project.

Additional Resources

Pulumi Documentation

From Monolithic to Microstack: Revolutionizing Pulumi for Scalable Infrastructure

Challenges using Pulumi

Our Experience and the Difficulties with Using Monostack

Managing a growing project with a single Pulumi state

The Monolithic stack challenges we faced with Pulumi

The solution: Splitting stacks by resource and environment

1. Stack Organization: `org -> project -> resource-environment`

2. Isolated State Files

3. Independent Resource Management

Implementing the new stack structure

1. Centralized Resource Management

Index File Example (TypeScript)

2. Utility Mechanism for Stack Management

Utility Script Example (TypeScript)

3. Environment-Specific Resource Scripts

Environment-Specific Scripts Example (TypeScript)

4. Stack-Specific Configuration Files

5. Managing Common Changes Across Multiple Stacks

Benefits of the Micro stack

Conclusion

Additional Resources

Next Readings

Atlassian Forge: The Platform DevOps Engineers Should Know About (And Why I Built an App on It)

What Starts as a Simple Jenkinsfile Can Become a Scaling Problem

Mismatched Requests and Limits Are Lying to Your Scheduler – And It’s Costing You

From Monolithic to Microstack: Revolutionizing Pulumi for Scalable Infrastructure

Challenges using Pulumi

Our Experience and the Difficulties with Using Monostack

Managing a growing project with a single Pulumi state

The Monolithic stack challenges we faced with Pulumi

The solution: Splitting stacks by resource and environment

1. Stack Organization: org -> project -> resource-environment

2. Isolated State Files

3. Independent Resource Management

Implementing the new stack structure

1. Centralized Resource Management

Index File Example (TypeScript)

2. Utility Mechanism for Stack Management

Utility Script Example (TypeScript)

3. Environment-Specific Resource Scripts

Environment-Specific Scripts Example (TypeScript)

4. Stack-Specific Configuration Files

5. Managing Common Changes Across Multiple Stacks

Benefits of the Micro stack

Conclusion

Additional Resources

Next Readings

Atlassian Forge: The Platform DevOps Engineers Should Know About (And Why I Built an App on It)

What Starts as a Simple Jenkinsfile Can Become a Scaling Problem

Mismatched Requests and Limits Are Lying to Your Scheduler – And It’s Costing You

1. Stack Organization: `org -> project -> resource-environment`