20 Apr 2025 4 min read SRE

Production Readiness as a Product: How to Scale Reliability Frameworks Across Teams in Six Steps

I've been focusing in my role at Walmart on Production Readiness over the past few months, which has got me thinking a bit about the seasonality of this process. Historically, teams have employed operational reviews (ORRs) or Production Readiness Reviews as part of a one-time gate, a static checklist towards being permitted into production, often late in the development cycle, and often done hastily towards the end at that.

In the new world of continuously autonomous systems, how do we scale this readiness model? Well it doesn't, which is why we need to start thinking of PRRs as a productized framework, rather than a point-in-time process, a tool that is scalable, consistent or repeatable, and embedded throughout the development lifecycle. With that in mind, I have outlined some thoughts on how to tread readiness like a product. This article will cover how to:

Define your readiness product scope;
Modularize the framework;
Develop a self-serve delivery model;
Gamify adoption with badges;
Drive continuous improvement.

Step 1: Define Your Readiness Product Scope

Let's start off first by asking ourselves, what does production readiness mean in your org?

For me, when I began constructing the areas to cover, I came up with the following areas to cover:

CI/CD Readiness (e.g., deployment automation, rollback, code coverage)
Observability Readiness (e.g., golden signals, alerting, dashboards, runbooks)
Reliability Practices (e.g., autoscaling, failover, chaos simulation)
Security & Compliance (e.g., secrets management, data encryption, RBAC)
Incident Preparedness (e.g., paging setup, on-call rotation, escalation paths)
Data Governance (e.g., secure storage and handling of private data in production and lower environments)

Each of these becomes a module in your readiness product, in their own right.

Step 2: Modularize the Framework

We then encapsulate each readiness category as a feature in your product. I create almost like an API reference style page for each module, clearly outlining:

Clear objectives (e.g., “services should emit golden signal metrics to a shared dashboard”)
Standardized checklists
Scoring guidance (1–5 maturity scale or pass/fail)
Linked documentation

I provide examples, sample data (if its an update in a YAML file for instance), in a nicely discoverable Confluence page.

Step 3: Make it Self-Serve

I must admit I am not quite there yet, with this program still in its infancy for me, but my goal is to eventually shift from a centralized review committee to team-owned assessments. We simply can't scale through a single point of entry.

Build templated playbooks: Engineers can run a readiness check without a meeting.
Create a scorecard tool: Use Google Forms, Jira workflows, or even a GPT-based checklist reviewer.
Integrate in CI/CD: Trigger a readiness assessment for each major launch or milestone.
Use versioning: Treat your readiness criteria like a product with changelogs and version control.

This reduces bottlenecks while encouraging ownership and accountability.

Step 4: Add Badging and Gamification

I did start this really early just to entire teams to start participating, but I came up with a series of badges to gamify this program, starting with Bronze and going all the way up to Platinum and Diamond. Teams love progress and you can incentivize adoption by introducing readiness badges. The badges start from basic implementation of features, such as code being part of GitHub, all the way to the most advanced aspiring AI-driven observability and self-healing features.

The badges can be displayed in internal dashboards, shared in team channels, or even used as a launch gate or compliance artifact.

Step 5: Build Feedback Loops

A product mindset means your readiness program should evolve. I started this exercise early as we started with the first few applications, by:

Collecting feedback from teams early on: What’s unclear? What’s overkill?
Track completion vs incidents: Are PRRs reducing MTTR or false alerts?
Monitor badge adoption across orgs: Are some teams lagging?

Then iterate: Add clarity, remove friction, and improve enablement materials.

Step 6: Evangelize with TPMs and EMs

Scaling this model depends on embedding it into the ways of working. I needed buy in from teams, from technical allies, be it other TPMs, engineering managers, or directors. Work with them to evangelize this program early on, to advocate for its benefits so that they in turn champion it for their teams. Have them include readiness reviews as part of quarterly planning and launch retros.

For me, I embed PRRs as part of weekly and monthly operational reviews, not just to show progress but provide kudos to teams that have shown initiative.

Your readiness framework isn’t a side process—it’s part of the delivery engine.

Final Two Cents

Production readiness isn’t a one-time checklist. It’s a product—and like any good product, it should be modular, self-serve, iterative, and value-driven. By approaching readiness this way, you reduce launch risk, boost engineering confidence, and build a culture of reliability at scale.

I am still on my journey through the steps and will certainly share updates on this here as I progress, but thinking of this as a product, I wanted to share this out early and solicit feedback from you all as well.

Let me know what you think, and if you have done something similar in the past.

Sign up for my newsletters