Devopsing a Legacy World
Devops has been picking up steam since around the year 2009. ( It could be argued it was around in other forms longer but for argument's sake let's go w
Devops has been picking up steam since around the year 2009. ( It could be argued it was around in other forms longer but for argument's sake let's go with 2009) Today it is mainstream and there are whole conferences built around devops. But not everyone is doing the devops thing.
Some companies large and small have legacy infrastructure, code and culture that takes longer to pivot. Some have tried to change but have run into problems that have slowed down or further delayed the changes. Blockers happen and no one is at fault.
Over the next few… pages? Scrolls? We’ll set a shared understanding of what devops is and what it isn’t. Then we’ll dive into how to change your legacy environment and what challenges that you might run into.
But who am I, and what qualifies me to talk about migrating legacy clients to devops. I have been working on/with computers for twenty plus years. But I started out professionally in IT as a Systems Administrator over ten years ago. I worked in a small IT team alongside the developers not knowing what devops was but actually practicing it.
From there I moved to Metal Toad where they were quickly adopting devops and putting it into practice. It started by merging the hosting company, Copper Frog into the development company, Metal Toad. (Hurray for tearing down walls.) Since then we have helped numerous clients, both large and small, by working alongside them and helping them with the transition.
What is devops?
There are numerous sites and blogs on what devops is. Most are very similar. I’m partial to the one listed at https://theagileadmin.com/what-is-devops/.
DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.
In short devops is developers and operations working together but I don’t like to stop there. What about the QA team that's constantly saying the code is broken. Or Security that wants no one to use the code because then it can’t be hacked.
Taking Devops as just operations and development, we leave barriers between team. And since DevQAOpsSec is unpronounceable and I probably missed some teams, we need to expand our definition.
DevOps is the practice of all teams on a project participating together in the entire service lifecycle, from design through production support.
What devops Isn’t
This is another concept that is beaten to death so I will only mention it briefly. DevOps is not:
- Continuous Integration
- Automated Testing
- Infrastructure as Code
- NoOps
- A new department
This list can be expanded but needless to say these are tools that are useful and are natural extensions of devops but the are NOT devops themselves. I will touch on some of these later.
Getting Started
I’m a huge fan of devops. It makes teams happier and more productive. But it’s very easy to start implementing with the best of intentions, but not get the results you desire. That’s why the rest of this blog is focused on how to get started on the right foot for your devops journey.
Culture not Tech
A coworker once told me that technology is easy, people and culture are hard. Not only is this statement true, it is also the foundation of why DevOps is a cultural movement and not a technology one.
To start, why is technology easy. It’s predictable and repeatable. If you push button ‘A’ you will always get response ‘B’. Maybe there is input that can change the output of button ‘A’ but even that would be repeatable. That repeatability has spurred blogs, forms, and libraries all dedicated to explaining how to solve technical problems easier and faster.
People are not easy. If you tell a joke to one person they may laugh, while another person scowls. And to make it worse the response could depend on how they are feeling in that specific moment.
Culture evolves around rules and procedures and how people interact with them. These rules could be unwritten and informal, or codified in handbooks and processes.
The uniqueness of culture and unpredictability of people make setting a goal and getting there a complicated task, filled with tweaking hundreds of little parameters, then course correcting as you go.
Tech is easy, let’s start with getting the new technology in place. That’s straightforward right? We’ll reap the immediate benefits and then work on the culture. Except that’s not true.
The theory of constraints:
The core concept of the Theory of Constraints is that every process has a single constraint and that total process throughput can only be improved when the constraint is improved.
https://www.leanproduction.com/theory-of-constraints.html
What does this mean? If your constraint is communication lag between groups, or missed requirements, then implementing continuous integration (CI) won’t make an improvement. Or worse, it may allow more work to pile up behind the constraint. This leads to people trying to go around the constraint, causing confusion and errors.
Meetings that count
So it’s a culture problem. What do you do? You start with changing it. The easiest way to get people to work together is to have them meet together.
Before we get into types of meetings, there is one rule for all meetings. Meetings need an agenda. It should be published beforehand, preferably in the meeting invite. Some of these agendas will repeat (daily standups are one example), others can change. The point is the agenda keeps everyone focused and lets them come prepared. Without one you might just sit around talking for an hour, then leave without actually solving the problem.
Now onto the meetings. I recommend starting with cross-functional meetings. A cross-functional meeting is one where there is a representative from every team on a project. Depending on the size of your organization, maybe all the skills are embedded on a small team. Maybe all the functions are isolated by department. Whatever the situation, you need a meeting together. I have seen this be a special meeting with clients, developers, or operations staff. Or just an internal scrum.
Cross-functional meetings normally come in two basic types.
Planning meeting: If it’s a planning meeting everyone should go over all of the tickets and their requirements. Members from each team should feel empowered to ask questions and point out potential pain points.
But remember your goal is to plan out the next segment of work and become aware of problems. Not solve all the problems. If it’s clear that a ticket has lots of issues or uncertainty It should be clarified before being committed. If waiting for the ticket to be clarified isn’t an option ( this can happen with tight deadlines ) it should be taken offline to work out with a small team. And most importantly the information should be communicated back to the team.
*Note: One thing that is overlooked. While planning meetings should plan out a set segment of work, like a sprint. You should also review one or two units of work in the future. If there is a dependency on something for that sprint, such as a new server or module of code, it should be noted in the current segment of work.
Status Meetings: The Status meetings (also known as scrums) is an ideal meeting to start getting everyone involved because it should be short-- thirty minutes or less, and ideally two to three minutes per person. Each person should give a quick update on what they have worked on since the last meeting, and any blockers they have. If someone has a solution for a blocker don’t try and solve it in the meeting. Instead take note and meet up after the meeting.
Process
Now that we’re meeting and identifying problems. How do we fix them? Or go about completing tasks? That would depend on your process. I know people just cringed reading this. That’s because most people do process wrong. It’s a handbook that gets passed down, and might be updated annually by a HR or legal and passed out. Or it's not written down at all and no one knows what it is.
How is process done right? There are a few criteria.
-
Process should be visible to everyone, or better yet documented and updated regularly.
-
Everyone should be encouraged to participate.
-
Process should be fluid--changing as needed.
For example, at Metal Toad, our development process map is printed and posted on a wall by the dev pit. All of our handbooks are in google docs shared to the whole company. (Process is visible). Everyone is encouraged to put Post-It notes on the process map or add comments to the google docs at any time. (Participation.) Quarterly the comments are reviewed by the leadership team and the process is updated. (Fluid)
There is no one right way to do process but these criteria allow everyone to know the process, encourages by-in from the teams, and allows change.
Post Mortems
When a doctor performs a post mortem, they are only looking for what caused the person to die. Whether it was a murder or an accident doesn’t really matter. Their job is to gather cause and evidence of what happened, regardless of who is at fault.
Done wrong, postmortems quickly turn into witch hunts, with everyone pointing fingers. This breeds a cover-your-ass culture which breaks trust.
Post Mortems wrong.
Facilitator: The site went down for approximately 15 minutes. What happened?
Ops: The database server was overloaded. This caused connections to pile up and the site gave errors.
Facilitator: What caused the database to get overloaded?
Dev: The database server needs more power.
Ops: The database server has always been fine. It’s probably some new query.
Blameless post mortem
Facilitator: The site went down for approximately 15 minutes. What happened?
Ops: The database server was overloaded. This caused connections to pile up and the site gave errors.
Facilitator: What caused the database to get overloaded?
Dev: I don’t think it’s the code because we haven’t deployed any new code recently. Could the database server need more power?
Ops: Nothing on the database server has changed. But the disks do look like they had more usage than normal. Were we getting more traffic then normal?
Do you see the difference in the two examples above. When done wrong the team spoke in absolutes and were accusatory. When the post mortem was done correctly, people stated facts. “We haven’t deployed any code recently” and “Nothing on the database server has changed.” But are important to gather information of the root cause. The fact statements are coupled with Inquiries that people think of. They are started as general questions without pointing fingers. The conversation follows the evidence until the root cause is found. Once found, the root cause is documented for future. It can lead to process change to prevent the error again, or maybe the problem was serendipitous and no long lasting change is needed.
You might have noticed in the example that I used the term, facilitator. There are three people you need at every post mortem.
-
First you need a facilitator. This is a person that will guide the conversation to the needed resolution. If it starts to shift from a blameless post mortem, it’s up to the facilitator to get it back on track.
-
Second you need a client representative. This could be an internal client or an external client. It could even be an internal employee representing the external client. They have two roles. They identify the business problem. Then they decide between different solutions, given the business constraints.
-
Lastly you need a representative from every team. (I lied, its more then three) Nothing will halt a postmortem faster then having to wait for feedback because someone critical to the process didn’t show up. Once it’s scheduled, ensure someone knowledgeable from every team is present.
Once blameless post mortems are fully in place, they will allow you to pivot your culture to one of collaboration and trust.
Tools, sort of
Earlier I mentioned that devops isn’t tools. That’s true. However, there are some technologies you will want to start looking into as you build your devops culture. These technologies will help build teams, knowledge, understanding, and trust of each other.
-
Infrastructure as code: There are lots of tools that facilitate infrastructure as code. Pick whichever one works best for your organization. What's important is, infrastructure as code is the one of the first I recommend ops teams start looking at. It not only streamlines ease of setup and maintenance of environments, but more importantly it gets your ops team writing code. This leads to using other dev tools, like version control, CI/CD, pull requests. These shared tools and knowledge build mutual understanding of tools and skills.
-
Automated testing: If you’re not automating your testing and you have CI/CD you will release bugs. It’s a simple law of numbers. If you can deploy multiple times a day but can only test once a day. Then you won’t be able to test all changes. More importantly, this gets devs thinking of writing tests, and starting down the path of test driven development-- building skills with the QA department.
-
Continuous Integration (CI): Normally you see Continuous Integration and Continuous Delivery together. They are actually different tools that are often coupled together. Start with CI. Ci is everytime a change is merged into a branch, it gets compiled and built. That's it. It ensures the code is in a state that can be compiled and potentially deployed.
-
Continuous Delivery (CD): This is the other side. Once CI builds the code. CD would automatically push it out to environment. It can be terrifying to know that as soon as code is merged, its compiled and deployed, this can happen multiple times. This makes Pull requests and QA even more important.
Challenges
If you haven’t figured it out, I don’t like being negative. The whole devops philosophy is built around optimism. Even with that you will run into challenges and setbacks along your devops journey. Here are a few that I have encountered.
Cowboys/Rockstars
Everyone who has been an in technology has worked with or known rockstars. They are the ones who have lots of tech skills but are absolutely toxic to devops teams. In the rockstars head, they are the best. Whether they are or not is besides the point. This leads them to talk down to other team members, and blameless post mortems are almost impossible.
Lack of Leadership
While some practices can be tweaked to make improvements, a holistic change of a company culture requires leadership to buy in and help set the course. Leading devops can be a paradigm shift. Some leaders don’t know how to change, others don’t want to.
If your leader doesn’t want to change there isn’t much you can do. Going over someone's head is risky. In my opinion it's better to start with a lateral peer, that can provide constructive feedback and nudge them along the right path. The same works for leaders that are unsure how to change. Help your leader identify sources and mentors that they can use to set them on the right path.
Poor Communication
Devops is about breaking down walls and building a team. The more people on a team the more channels of communication develop. This requires not only the right tools for communication but also having the teams educated on how to communicate to each other. When communication breaks down tensions start to run high, requirements are missed, and people feel left out.
Quick updates should go into an interoffice chat, such as, Slack, Hipchat, Skype. This allows quick communication between the team, while remote or in the same office.
Details on a task or project may be discussed offline or in chat, but are never official until they appear in a ticket. This keeps everyone aligned on the goals.
No time critical communications should go out via emails. These can include questions, announcements.
Meetings should be used to solve a specific problem (remember the agenda), should be scheduled as far in advance as possible. That could be five minutes or it could be 5 months. Please respect everyone's time and calendar. No one likes opening their calendar to see double bookings.
Fear
Frank Herbert said it best. “...Fear is the mind-killer. Fear is the little-death that brings total obliteration …” No we’re not on Dune but nothing kills teams quite like fear. If someone on the team is afraid it’s up to the whole team to swarm around the problem and help alleviate the problem.
Success
You now have fully functional teams. Bug rates are down. You’re delivering code daily. When something goes wrong you can execute flawlessly a post mortem. Enjoy it because changing culture is like an infant taking their first steps towards their parent. Its wobbly and hard and just as they get the hang of it their parent takes a few steps back. Why is that? Because people are hard. Your organization is going to go through changes, on boarding new people, slow quarters, crazy growth quarters. And that's just work, not personal stuff that people have happening.
You can relax though. Culture will always change. In the years I have been at Metal Toad the culture has shifted. But it would change even if you weren’t driving the change. By building the right process and habits the people in your organization you can set the course for your company's culture, noting that it may drift but it is always correctable.