The 10 Practices of Healthy Engineering Teams – Part 2

Andrew Hao ·

In Part 1 of this series, we introduced a high-performing engineering team at SuperStartupCorp that had automated repetitive tasks, codified its engineering practices, and adopted a learning mindset, resulting in happy engineers and happy stakeholders. Read on to learn more traits and practices that make this team so successful, and how they keep their bus factor high. (If you’re feeling extra adventurous, you can head on over to Part 3).

4. They spread knowledge around

Pair programming image

SuperStartupCorp engineers were well-rounded engineers, each familiar with multiple aspects of the codebase. The team was familiar with the “bus factor,” the slightly morbid term for the number of stray buses that could take out members of their team – and perhaps their company. The bus factor represents the number of people that your team can sustainably continue working without – whether they are out attending to a personal emergency or simply on vacation.

How does a healthy team keep a high bus factor – making sure their teams are able to sustain unforeseen people emergencies?

Pair programming

Pair programming allows two developers to reason about the same codebase, working together to navigate tricky, open-ended problems (Carbon Five engineers are strong pair programming advocates!). Nothing spreads knowledge like the mind-meld of two developers, discussing and looking at the same code.

Do you know how to pair? Here are some practices that have worked for this team:

  • “Ping-pong” pair programming keeps both parties engaged, switching off test-writing and implementation responsibilities between programmers in the TDD loop.
  • Pairs will rotate frequently – the more you switch partners, the more knowledge gets spread around. Some folks even keep track of pair frequencies in a pair matrix.
  • Pairs often split into “Driver” and “Navigator” roles. The Driver is in command of the keyboard, writing the code, whereas the Navigator may be occupied with keeping the greater context of the objective in mind, pulling the team out of time-sucking rabbit holes, and doing research or documentation diving on the side.

Code reviews

Code reviews ensure that the team is aware of what all other members of the team are doing. Much like pair-programming, the code review is a place where the team can discuss new architectures, or share knowledge about the code that the original implementer may not have noticed.

Engineers at SuperStartupCo practiced code reviews in many forms, including:

  • Feature-complete code reviews – a Github Pull Request is opened when the developer is done with the feature on his or her branch, inviting comments on the code changes.
  • The team agreed that their code reviews would focus more on overall architecture, design approaches and domain modeling concerns, rather than get involved only at the syntactical level. They chose to automate syntax checking with a code linter, allowing them to focus on design choices.
  • Code reviews were often solicited early on in feature development. A pair would often put up a “WIP” (work in-progress) pull request, requesting early feedback on architecture or domain approaches.

Pull-based development

In the old days at SuperStartupCorp, tasks were preassigned to the team: “In this sprint,” the project manager might have said, “Jim is committing to building the Create Session API. Susan is committing to implementing the mobile login screen.” After all, Jim works on the backend and Susan works on the API. However, pre-assigning developers crippled organic cross-pollination that could have happened by working together on the backend and front-end together. As a result, Susan would subtly feel disempowered to work on a Session API story with Jim, and likewise, Jim with Susan on the mobile client.

Giving developers the ability to pull from the backlog empowers them with the freedom to work on systems they want to enhance. Together with product management, they decided to change their process. This time around, the team would be empowered and encouraged to “pull” and choose work items in the backlog on their own volition.

On his own, Jim chooses to work on the Create Session API, joined by Susan as a pair because they both know they need to implement the mobile login screen together. By working together, they have a smoother development process because they are continuously in dialogue about the code and the integration requirements. Susan walks away that week with backend API knowledge, and Jim walks away with mobile client experience.

Antipatterns

Signs your team may have a poor bus factor:

  • You have a clear expert on your team who “is the only one who knows about the database”, or who “is the only one who understands the e-commerce code”. If you lose this person, your team is sunk.
  • Developers complain about not being able to fix a subsystem, because they have no visibility into it or permission to modify it.
  • Your team has heuristics for who works on what system: “Cheryl works on ecommerce. Brandon does the client. Jennifer does the database.” While roles and clear responsibilities are important in teams, be sure that these roles do not create silos, and engineers feel empowered to cross-pollinate and spread knowledge around.

5. They make feature development and technical debt a package deal

The hallways of SuperStartupCode often reverberated with the anguished cries of pained engineers: “This code is a mess!”. Many a frustrated engineer had quit their job because their codebase only ever got messier and more complex with more feature work as the company rushed their product to market.

Healthy engineering teams address technical debt as first-class work. One effective way SuperStartupCorp’s team made a decision to change and accomplish this was to couple technical debt work with existing feature development: technical tasks like refactoring were scoped within the implementation of a new feature.

The team – product owner and engineers alike – understood that the checkout code was a mess of repeated code, bloated models and a dearth of test coverage. This team agreed that the next feature within the checkout code would include the work to refactor the existing code and add test coverage. When the product owner presents the next phase of work – implementing a Stripe-based checkout flow – the team assigns higher point estimates to those features, discussing the specific types of refactoring that would have to take place before they proceed with the feature.

The net result? The codebase becomes cleaner, more expressive and reliable with every new feature release, resulting in a higher overall throughput for the team.

What if this feature needs to ship ASAP otherwise the company risks bankruptcy?  In cases where technical debt must be paid down outside the context of building a feature, an explicit contract with product ownership may be necessary. Healthy teams will communicate the risk/reward of taking on technical debt tasks in the code.

Three more ways your team can pay down your technical debt:

  • Agree on a specific composition of velocity points to divide among tasks. Every iteration, the team agrees to do 70% feature work and 30% technical debt.
  • Consider whether technical debt is so endemic in the system that the organization requires a larger initiative behind it – its own dedicated team & resources. In these cases, these teams will need to seek a sponsor from the organization with the ability to allocate more engineers and have its own technical debt project devoted to it.
  • Communicate clearly with product owners the types of technical debt in the system and how they affect overall product metrics: technical debt may slow down team velocity, raise the rate of regressions, or negatively affect system performance. Use this type of language when communicating with product owners and business stakeholders so they can understand the true cost of ignoring debt.

6. They learn to love slack time

Recently, SuperStartupCorp began introducing “slack time” into their Scrum development process. Prior to this, the company had a history of running its development teams at 100% capacity developing features and fixing bugs. If a developer finished work ahead of schedule, they were given another task to begin. If the developer was running behind schedule, the developer would feel the pressure from the team to deliver, burning the candle at both ends to catch up. The company found that teams consistently over-promised and under-delivered on their sprint commitments, leading to declining morale and burnout. The high-pressure, commitment-driven nature of their process had caused the team to make commitments that they could not deliver on.

Slack, a concept taken from operations theory, states that efficient systems cannot always run at 100% capacity. They need slack – capacity wiggle room – to function optimally. For example, if the team is always scrambling to work long hours to reach their sprint goals, any critical bug that appears out of the blue within a landscape of a stressed-out team will cause more chaos in the project than if the team had been operating at 80%. Slack can take the form of making operational room for paying down technical debt, training an employee, or simply giving employees ample time to rest and work reasonable hours.

SuperStartupCorp’s engineering teams explicitly built-in slack time into their sprints by consciously making smaller Scrum commitments in a sprint. Stakeholders understood that, in this new reality, teams would be empowered to commit less, with the implicit understanding that extra slack velocity would be devoted to working on the system and improving processes. Teams planned to use their extra capacity to work on localized refactoring efforts, learn new languages, tools and techniques, speed up their CI builds, or improve test coverage.

The result? Teams began hitting their commitments. Unforeseen critical production bugs were attended to with fresh minds. Engineers were happier. The product team was impressed by the consistency of teams hitting their commitments.

A side effect of slack time is the freedom it introduces into the system for producers to manage their energy, take time to think about problems, and communicate clearly across teams. In other words, once engineers directly recognized slack time as a necessary part of their workflow, the pressure suddenly lifted. Engineers were doing more energized work because they realized they were empowered actors in the production process, and not just drones on an assembly line.

Other ways you can introduce slack time:

  • Institute a policy of a half-to-full day per iteration for developers to spend on their own professional development, to do refactors in the system, or perform exploratory work that may not be codified in the backlog yet are beneficial to the business. Google famously implemented this as “20% time” – the policy may not stand in the same way today, but the concepts are similar.
  • With a product manager, plan and devote an entire sprint/iteration working on team-directed chores or tasks. These may be technical debt-oriented, or focused on developer-initiated bug fixes. Make this type of sprint or iteration a consistent commitment.
  • As this team did before, build slack into your team’s commitments. Give the team the explicit permission to commit to less than they would normally commit.
  • Consider implementing ideas from Lean visualization techniques like Kanban, which institutes the notion of a limited WIP (work in progress) threshold to prevent the team from starting work that hasn’t been finished yet. If a team hits its WIP limit, then have engineers either help unblock existing efforts in progress, or allow some of the energies of the blocked teams flow toward refactoring or technical debt efforts.

Let’s continue the conversation.

In Part 3, we’ll discuss how successful engineering teams work well with their product owners. Additionally, we’ll introduce some lightweight approaches that help teams quickly build and review architecture designs. Finally, we’ll walk through an oft-overlooked piece of software development: deployment & monitoring.

In the meantime, we’d love to hear what’s worked for you and your teams. Leave your thoughts in the comments below!

Andrew Hao
Andrew Hao

Andrew is a design-minded developer who loves making applications that matter.