How to Plan Work for an Infrastructure Team

Chris Riccomini on June 13, 2022

Building tooling and systems for other engineers at a company can be tough. Project planning, in particular, presents a challenge. Infrastructure teams have several unique traits that hinder predictable planning.

Infrastructure teams are less likely to have product managers to help with planning. Infrastructure engineering managers are left to do prioritization, requirements gathering, and quarterly/yearly planning on their own. Managers don’t always know how to do this effectively.

Such teams also sit directly next to their customers (or at least they used to before the pandemic). Close proximity to the customer is a double edged sword. Teams are able to gather feedback rapidly and have close daily contact with their users (the other engineers in the company).

But infrastructure teams are so easy to access that they are interrupted frequently. Imagine if customers could walk directly up to an application engineer and demand a product change. Infrastructure engineers live this experience. Managing these interruptions is challenging.

Despite close customer contact, infrastructure teams are often disconnected from product planning itself. Other teams don’t inform the infrastructure team that a new feature depends on the infrastructure teams until it’s too late. Engineers are left scrambling and frustrated.

A few years ago, an infrastructure team at WePay was having trouble planning their work. The team was handling constant interrupts–both operational issues and urgent requests–from other teams. The manager threw up their hands and declared it impossible to reliably plan and deliver projects of their own; the team was purely reactive.

At the time, I was running WePay’s data infrastructure team. I heard the manager’s frustration, and shared some of the practices I’d found helpful in planning for my own team.

The rest of this post is the (edited) email that I sent the manager. Everything here can (and should) be adjusted, but the structure presented below is a reasonable starting point. If it resonates with you and you want more, I highly recommend Will Larson’s book, An Elegant Puzzle.

Context: My data infrastructure team was about 6 people (including 2 site reliability engineers (SREs)). The company had roughly 300 people. We were using JIRA as our ticketing system and a loose Agile/Scrum planning methodology. We had 2-week sprints and both quarterly and yearly planning.


Going to share how I run data infrastructure (DI). Goal is to:

  1. Clarify how DI runs. Mostly just FYI.
  2. Give (the team manager) some thoughts on how he might get INFRA planning working properly.

I have iterated on this for about a year. I’m not done yet (never will be), but feel like we’ve hit a very good spot. My Q1 planned vs. executed was very good.

Some things I’m doing. Hope it’s useful. Take it or leave it.

Axioms

Sprint Planning

Quarterly Planning

Yearly Planning

On-call

One final note: I reject your premise that you can’t estimate or plan for surprises. (The engineering manager had this comment on Slack.) I have several devices to manage this:

  1. 80% capacity
  2. empty RTB at the start of each quarter
  3. two people on-call
  4. yearly planning

Cheers,
Chris

P.S. Here’s how I calculated the team’s capacity this quarter.

Inputs:
Sprints in Q2 6
Devs 4
SREs 2
Dev holiday (sprints) 2
Dev on-call (sprints) 6
SRE holiday (sprints) 1
SRE on-call (sprints) 6
Calculation:
Dev capacity (sprints) 16 = 4 devs * 6 sprints - 6 on-call sprints - 2 holiday sprints
SRE capacity (sprints) 5 = 2 SREs * 6 sprints - 6 on-call sprints - 1 holiday sprint
80% dev capacity 12 = 16 dev sprints * .8
80% SRE capacity 4 = 5 SRE sprints * .8
Final dev capacity 12 sprints
Final SRE capacity 4 sprints

Subscribe to my newsletter!