Why Don’t Leading Observability Tools Like Datadog or New Relic Develop Optimization Tools?

November 05, 2024

410 views

#cast.ai #cost optimization #datadog #monitoring #observability

I was recently asked why prominent observability tools like Datadog or New Relic haven’t ventured into the optimization space. These companies already have stellar reputations in observability, so wouldn’t it be easier for them to conquer the optimization market? Especially since they already provide recommendations based on their monitoring data. They would have an advantage over startups, which need to prove themselves from the ground up.

The answer lies in the nature of the challenge

Optimization is several steps beyond observability. To make the optimization possible, we first monitor systems, then build recommendations based on what we observe, and finally, apply those recommendations. The risk profiles of these steps are vastly different.

It’s important to understand that even though tools like Datadog and New Relic can provide excellent recommendations, there’s a significant leap from offering insights to actively making changes. For example, Datadog might alert you to underutilized resources or performance bottlenecks, but taking that next step—shutting down resources or optimizing processes automatically—is where the real risk lies. And that’s where the challenge comes in.

If there’s a critical bug in your monitoring system, the worst that happens is you miss insights or alerts—but your infrastructure itself remains unaffected. You simply didn’t know something went wrong, but nothing was broken as a result.

However, when it comes to optimization, you’re actively modifying the customer’s infrastructure. A single critical bug could have devastating consequences. Let’s imagine a few of them:

Shutting down machines that should have been left running
Failing to terminate resources that should have been removed
Overlooking a corner case that leads to a massive spike in resource usage

These issues can compromise stability or lead to massive cost overruns—where one missed bug could negate all the savings you’ve achieved.

The risks in optimization are significant

When optimization tools fail, they don’t just cause a missed alert; they can cause real financial damage. If an optimization tool incorrectly scales down resources, that could cause a service outage, harming the client’s business, or, in extreme cases, leading to contractual penalties. Alternatively, if it fails to scale down in time, you could end up with runaway resource costs. One minor bug could mean thousands—or even millions—of dollars in wasted expenses.

This is why the big observability players hesitate. They fear that a failure in optimization could damage their hard-earned reputations in observability. One misstep in optimization could destroy the trust built over the years in monitoring.

It’s not just about technical challenges. Reputation and trust are key to their business models. If a major optimization failure happens, users could start questioning whether they can trust the observability insights as well. Any breach of that would be a severe blow to the tools built on the premise of reliability.

You might wonder, isn’t reliability always important? Absolutely, but it’s especially critical for monitoring tools, which track the health and performance of other systems. If a monitoring tool itself isn’t dependable, it undermines its purpose. After all, how can you trust a monitoring system that struggles to maintain its own reliability?

Why Specialized Optimization Tools Succeed

Now, you may ask: why do specialized tools succeed in optimization where these giants fear to tread? The answer lies in focus and expertise. While observability companies are generalists in monitoring infrastructure, optimization tools require deep domain expertise and advanced algorithms. Companies like Cast.AI and PerfectScale specialize in finely tuning infrastructure management and resource scaling, relying on extensive testing and machine learning to minimize risks.

Proven Tools for Kubernetes Optimization:

Horizontal Autoscaling: KEDA is an open-source tool that handles event-driven autoscaling efficiently. It’s easy to implement and delivers results quickly. For a managed solution, use Kedify.io, founded by the original maintainer of KEDA, which includes a free tier option.

Vertical Workload Autoscaling: PerfectScale stands out for its reliability, improving stability while reducing costs with minimal effort. It also covers all critical edge cases.

Cluster Autoscaling: CAST AI offers intelligent upscaling and leads in downscaling capabilities. It’s highly configurable, easy to start with, and comes with excellent support.

Full-Stack Optimization: Turbonomic remains a comprehensive solution for enterprise-grade, full-stack optimization.

These tools are built with optimization as their core function. They’re designed to handle the intricate edge cases and the complex algorithms that make scaling efficient and safe, with thorough testing that covers all possibilities.

The cost optimization is deeply specialized. The algorithms and models that make these tools work are not easily scaled across different infrastructures, and this specialization is what allows startups to succeed.

The takeaway? Trust the companies brave enough to step into optimization. If they’re not afraid to tackle this complex challenge, that’s a good sign.

Why Don’t Leading Observability Tools Like Datadog or New Relic Develop Optimization Tools?

The answer lies in the nature of the challenge

The risks in optimization are significant

Why Specialized Optimization Tools Succeed

Follow us