Why Secrets Rotation in Kubernetes Clusters is Essential and How to Do It Right

Why Secrets Rotation in Kubernetes Clusters is Essential and How to Do It Right

January 20, 2025
367 views
Get tips and best practices from Develeap’s experts in your inbox

Rotating secrets is not just a recommendation—it is a critical requirement for maintaining the security and resilience of modern systems. Kubernetes clusters, often at the core of organizational infrastructure, rely on sensitive credentials like API keys, access tokens, and passwords to ensure secure communication between services, users, and external systems. Over time, these secrets can become vulnerable due to accidental exposure, human error, or malicious activities. Without regular rotation, the risk of unauthorized access or security breaches increases exponentially.

Imagine a scenario where an exposed API key is exploited—attackers could gain unrestricted access to your system, compromising sensitive data and potentially causing irreparable damage to your reputation. By implementing a robust secrets rotation process, you limit the lifetime of these credentials, ensuring that even if a secret is leaked, its usefulness to an attacker is minimal.

If you’re part of a startup or company, this article is for you. Whether your product is already in production or still in development, implementing secret rotation now is a decision that will save you from potential headaches down the road. Why wait until you’re in production to secure your system? Start early, adopt best practices, and ensure your environment is prepared for the challenges of scaling securely. Taking action today guarantees that your organization is not only secure but also ready for the future.

In this article, I will share how we designed and implemented a seamless secret rotation process, the tools and methods we utilized, and how we achieved automation without compromising on simplicity, flexibility, or security. By the end, you’ll understand why secret rotation is not only essential but also how to implement it effectively in your Kubernetes clusters.

How we built an efficient and precise rotation process

Our rotation process runs every three months and is structured to ensure maximum security and reliability. The process follows a clear sequence: first, new keys are created, then securely updated in Vault. We wait two minutes to ensure the updated values propagate across the secrets in the cluster and the pods that depend on them. Following this, automated tests verify the keys’ functionality in the clusters. Only after successful validation are the old keys deleted. If any step fails, the rotation process halts and does not proceed to the next step. Instead, an email and a Slack alert are sent immediately to notify the team of the failure.

Inside our rotation process: steps, tools, and strategies for success

  1. Creating new keys: New keys are generated for platforms such as MongoDB, Kafka, and others through API calls to the respective services. This ensures the keys are securely created and ready for use.
  2. Updating keys in hashiCorp vault: The newly created keys are updated in HashiCorp Vault, a centralized secret management tool. Vault offers a secure and controlled environment for storing secrets, ensuring they remain protected and compliant with security standards. With Vault, we gain full visibility into secret access, reducing the risk of exposure and providing confidence in managing secrets across distributed systems.
  3. Waiting two minutes: After updating the keys in Vault, we wait two minutes to allow for the propagation of new values across services. We utilize External Secrets, which integrates seamlessly with Vault and Kubernetes. External Secrets automatically updates secrets across clusters at set intervals (every minute in our setup), ensuring consistency and eliminating the need for manual updates. This tool guarantees that services always use the latest secrets, maintaining consistency across development, staging, and production environments.

Automated pod updates with reloader: To ensure pods use the updated secrets without manual intervention, we rely on Reloader. This tool detects changes in secrets and automatically restarts the relevant pods. Reloader significantly reduces operational overhead, especially in large-scale environments with 40 to 50 clusters, by ensuring services always run with up-to-date configurations.

  1. Key testing: Automated tests are conducted to verify that the new keys are functioning correctly in the clusters. This step ensures the rotation process is successful and the new keys are effective.
  2. Deleting old keys: Once the new keys have been validated and confirmed to be functional, the old keys are securely deleted. This step is critical to eliminate any potential vulnerabilities associated with outdated keys and ensures that only the latest, securely managed keys are in use.

This combination of tools—Vault for secure management, External Secrets for seamless synchronization, and Reloader for automated updates—ensures our rotation process is robust, efficient, and scalable. Each tool contributes unique advantages, allowing us to manage secrets rotation confidently.


Rotation schedule

The rotation process runs every three months, targeting different environments sequentially. During the first week of the month, it is performed in dev. In the second week, it is executed in stg, and in the third week, in prd. This staggered approach allows us to validate the process in non-critical environments before moving to production, minimizing risks and maintaining system stability.

Let’s dive into the code

As we explored earlier, our rotation process operates within a GitHub Workflow. Currently, we manage the rotation of 12 secrets, with that number steadily growing as our infrastructure scales. Early on, we identified a recurring pattern in every rotation—a fixed base that remained consistent across all secrets. However, our initial implementation had a significant flaw: the entire rotation process was written as one lengthy, monolithic code block. This approach not only made debugging a nightmare when a failure occurred but also created unnecessary complexity in maintaining and updating the workflow. Recognizing the need for a cleaner, more efficient approach, we restructured and streamlined the codebase. By breaking it into modular, well-defined steps, we dramatically improved readability, maintainability, and reliability. In the following sections, I’ll share the techniques, strategies, and tools we employed to transform our rotation process, ensuring it is robust and future-proof. Stay tuned—you’ll see how these changes elevated our operations and why they can work for your setup too!   ​​


Eliminating code duplication with actions in GitHub workflows

Actions in GitHub Workflows are reusable, customizable components that automate tasks in workflows, making them ideal for streamlining processes like secrets rotation. By centralizing the logic and automating repetitive tasks, we simplified the process and improved error handling. Each Action performs a specific task, like sending notifications or fetching credentials from Vault, which reduces redundancy and keeps workflows clean. For example, our Email and Slack Notification Action alerts stakeholders if there’s an issue during the rotation process:


name: 'Send Email and Slack Alert'
inputs:
  subject:
    required: true
  body:
    required: true
runs:
  steps:
    - name: Send Email
      run: echo "Sending notification..."

This reusable Action ensures that notifications are consistent across workflows without duplicating code. Here’s how it’s used in a workflow:


- name: Send Email and Slack Alert
  if: ${{ failure() }}
  uses: ./.github/actions/send-email-and-slack-alert
  with:
    subject: "Job failed in ${{ env.ENVIRONMENT }}"
    body: "Check the logs for more details."

The modularity of this approach helps with:

  • Error management: Immediate alerts let us respond to failures quickly.
  • Scalability: Actions can be reused across workflows, keeping everything consistent.
  • Security: Sensitive data like tokens are fetched securely at runtime, never hardcoded.

Thus, we transformed secrets rotation into a streamlined and reliable process. This approach keeps us efficient, secure, and ready to handle even the most complex operations.

Creating a common workflow for credential rotation

To simplify and standardize the key rotation process, I created a common workflow that acts as a wrapper that can be used across all rotations. It combines key steps like key creation, Vault updates, waiting, and key deletion into one centralized process. While the specific implementation (e.g., MongoDB, Kafka) may differ, the overall flow remains the same. The common workflow serves as a template for all rotations, organizing the process, environments, and related tasks, making the entire workflow more efficient and easier to manage.

Advantages of using a common workflow

  • Consistency: All rotations follow the same basic process.
  • Flexibility: Easy to add/remove features like notifications and tests.
  • Centralized maintenance: Changes propagate across all workflows that use this common template.

How to create a workflow that wraps all workflows

The common workflow defines shared tasks such as creating keys, updating Vault, running tests, and deleting old keys. These steps are passed dynamically through parameters.

Here’s the minimalistic version of the common workflow:


name: Common Workflow
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
      create-keys-command:
        required: true
        type: string
jobs:
  common-tasks:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2
      - name: Create New Keys
        run: ${{ inputs.create-keys-command }}
      - name: Update Vault
        if: ${{ inputs.update-vault }}
        uses: ./.github/actions/push-to-vault
        with:
          json_file: updated-vault.json
      - name: Run Tests
        if: ${{ inputs.run-tests-command }}
        run: ${{ inputs.run-tests-command }}
      - name: Delete Old Keys
        if: ${{ inputs.delete-keys-command }}
        run: ${{ inputs.delete-keys-command }}

How to implement the common workflow

The common workflow can be invoked from another workflow like this:



name: Mongo Credentials Rotation
on:
  schedule:
    - cron: "0 8 1-7 2,5,8,11 *" # dev
  workflow_dispatch:
    inputs:
      environment:
        type: choice
        required: true
        options: [dev, stg, prd]
jobs:
  rotation:
    uses: ./.github/workflows/common-rotation.yaml
    with:
      environment: ${{ github.event.inputs.environment }}
      create-keys-command: echo "Creating Mongo Keys for $ENVIRONMENT"
      update-vault: "{...}"
      run-tests-command: echo "Running tests for $ENVIRONMENT"

This streamlined approach ensures a consistent process while allowing flexibility for changes and future expansions.

Managing parallel processes with matrix strategy

One powerful feature I used is the Matrix strategy in GitHub Actions, which allows for the parallel execution of tasks across multiple parameters, such as environments, clusters, or keys. This is particularly useful for rotating secrets across multiple Kubernetes clusters or other systems simultaneously.

In my case, I utilized the Matrix Strategy within the rotation job to manage the rotation of kubeconfig files across various clusters. The strategy takes a list of clusters, which is an output from a previous workflow stepset-environment, and runs the rotation step separately for each cluster. By using thefail-fast: false, the process continues for other clusters even if one fails, ensuring that the entire rotation process doesn’t stop because of a single failure.

This approach is especially effective for managing multi-cluster systems or multiple keys, where the process needs to remain consistent across all elements. It allows for faster, controlled updates across different environments (such as production and QA), while maintaining overall process reliability and control.


rotation:
   needs: set-environment
   secrets: inherit
   strategy:
     fail-fast: false
     matrix:
       cluster: ${{ fromJson(needs.set-environment.outputs.clusters) }}
   uses: ./.github/workflows/common-rotation.yaml
   with:
     environment: ${{ needs.set-environment.outputs.ENVIRONMENT }}

And it runs like this:



Conclusion

Automating secret rotation is crucial in securing your system. By ensuring that secrets are updated regularly and efficiently, you minimize the risk of unauthorized access and enhance the overall resilience of your infrastructure. Don’t wait until it’s too late – start implementing this process today and protect your environment for the future.
We’re Hiring!
Develeap is looking for talented DevOps engineers who want to make a difference in the world.
Skip to content