Transitioning from team-centric to service-centric infrastructure

Watch

Listen

Show Notes

In our first episode, we sit down with Sr. DevOps Engineer, Timur Bublik from TIER Mobility.

As the engineering team at TIER Mobility started to grow, their approach to provisioning infrastructure became outdated. Listen to Timur describe the issues they encountered with a team-centric approach and how they successfully built a service-centric approach to replace it.

Chapters

0:00 Intro

2:10 Company background – TIER Mobility

3:30 TIER’s Infrastructure Landscape (image)

7:07 Downsides of a team-centric infrastructure

8:46 Service-centric approach (image)

11:25 Service-centric approach in practice

23:05 Summary

Transcript

Mike

Hello everyone. And welcome to the first episode of mission control today. I’m joined by Tim  senior dev ops engineer on the core infrastructure team at tier mobility team, or how’s it going? 

Timur

Hey, uh, thanks for having me. How are you doing, good.

Mike

So Timur, can you please tell us a little bit about yourself? How’d you get into engineering and just some background for, for all of us. 

Timur

Right. So like, like many other engineers, probably I started with playing video games and that sparkled some interests, some interest in computers and stuff like that.

Uh, eventually that led me to, studying a bachelor in engineering. It’s strictly computer science, but it was tightly coupled with it. Um, after which I continued with master’s degree with even more computer science involved and during my masters, I was also learning about programming stuff by myself, at home, doing online courses, doing some pet projects. And then I took an internship in a startup and got some solid experience with DevOps, with AWS, with Docker, with Kubernetes and stuff like that. Um, after which, when I finished my studies, I started working in the various AWS cloud consulting companies, here in Germany.

So I was helping different, uh, customers to, to deliver, their cloud solutions to build, to develop them. And after some time I decided I wanted to do something with more positive impact on society and environment. And that’s how I landed my job TIER, where I worked for almost one year already.

Mike

That’s awesome. And TIER is really interesting company in general. In the U S you’ve probably heard of Bird or Lyme, that’s kind of the best analogy to what tier is doing over in Europe. They are a micro mobility provider. They’re trying to help people have accessible and affordable, means of transportation.

Um, but I think there’s some other cool stats, like kind of background stats that you have, Timur, to share. 

Timur

Right. So like Mike said we run different vehicles like e-mopeds, e-scooters, e-bikes um, and we have over 135,000 vehicles on the streets worldwide. Our customers made more than 80 million trips and travel the distance of more than 160 million kilometers.

We operate in 18 countries in more than 170 cities. We are more than 1000 employees. And from IT perspective, we run 300 Kubernetes nodes in production. All our workload runs in Kubernetes. We also have, um, more than 131 terabytes of data stored in s3 and also have, um, different storage, backends.

Mike

Awesome. How has the infrastructure landscape at tier laid out?  I think you had something to share visually for us. So if you’re listening on the podcast, we record these as well, uh, on video. So you can head over to our website and watch it.

I’ll also include the link in the show notes. 

Timur

Sure. Let me share my screen and show you the slide. Here is the diagram that tries to show our infrastructure landscape in a simplified way. I will go real quick from left to the right, from top to the button.

Timur

We start with GitHub. This is the central place where all our service source code resides as well as all our infrastructure code resides. Then going from the top and the next column, which is the Okta. Okta is our central authentication and authorization provider. We provide access to our developers and other team members for example, to Vault, um, which is our main secrets engine. We give people access to Kubernetes via kubectl.

Also with the Okta credentials, we give people access to AWS accounts. We also give access to third party solutions such as Datadog and so on and so on and so on. Um, so we use Okta to authenticate humans against our tools and services. Then we have down below CirclCI, which is our CICD tool for building Docker images for our services that we eventually, uh, run in Kubernetes, um, in our Kubernetes clusters. Then we have next block is Spacelift, our CICD , provider for Terraform.

So our Infrastructure code is deployed with Spacelift. So we deploy Vault resources, Kubernetes resources, AWS resources and many more. Then we have our cloud, which is, um, currently AWS, the only cloud we use right now, we run our Vault clusters inside of AWS. We run our Kubernetes cluster in EKS.

We also use many, native, AWS services such as elastic cache, RDS, dynamo, DB, S3, uh, perimeter store, and so on and so on and document DB. Um, besides that we also have a set of solutions to give more control over what’s happening in with our services. So we use Datadog for, fetching some metrics and logs with century, for tracing Grafana for drawing the dashboards, uh, Prometheus and a lot of different stuff.

Um, and then eventually when something happens, we generate based on this data we collect from this observability stack we create alerts that then land in certain Slack channels and also that notify our on-call team, that something is wrong. 

Mike

That’s awesome. Originally when you guys had set up your approach to provisioning infrastructure it was a team centric approach, right?

Timur

Exactly. So we have had some, some sort of a team-centric approach from the beginning with the team centric approach, we believe that our teams are going to be there. They’re going to be immutable. They are going to own services. Services might come and go. Uh, but there’s going to be always one team that owns a service and it’s going to be the same service all the time. 

So we build our infrastructure in the way that we had the repository with infrastructure dedicated to this team and its services while the service repository itself had only the source code and some Helm yaml files and stuff like that.

And circleCI pipeline definitions. So it’s a pretty good approach. It probably works for many companies that are more stable from the team’s perspective. But during COVID time, we started to change. We also are a startup that is growing pretty fastly. We are reshaping our teams quite often, and we’ll also change the ownership of the services. And this brings some trouble when you try to move a service from one team repository to another team repository. You can do that, but it takes quite some time. That’s where we decided we need to change our team centric approach to something else. That’s when last year we started to develop our service centric approach where the service is self-contained. It has, still it’s source code. It still has its Kubernetes stuff. And it also has its own infrastructure defined in the same repository.

So eventually the service does not care who owns it. And if even if the team that was owning it doesn’t exist anymore, it can be easily reassigned to another team. 

 

Timur 

With a service-centric approach it’s easy to reassign service to another team, or also to give extra access to a team that doesn’t own it. For example, we had a team B that was split into two other teams and we give them, let’s say, guest access to those services A and B. And we also can easily remove the access or ownership of the team A to service C without any of these services ever knowing that this happened. 

And to achieve that we develop a solution, which is a central repository to register services and, and manage access to those services to teams. And that’s what we call a service registry. 

The service registry is managing two aspects of our infrastructure, the service aspect, and human aspect. From service aspect, we register a Spacelift pipeline or Spacelift stack that will deploy infrastructure Terraform code stored in the service repository.

It does not deploy the service infrastructure itself, but only the pipeline for it. So it will track the service repository to deploy those changes. 

And then from the human perspective, we create teams in Okta and give those teams access to the services that these teams are supposed to own, or to get access to.

We also manage observability and responsibility of a team regarding a service by defining the PagerDuty alerts, the Slack alerts and stuff like that. 

I have prepared a simplified version of our service registry to show you how it looks like in real life.

Let’s start our review from the main TF.

{video}

Mike

I mean, at the very least, I think this will be great inspiration for others, whether they want to kind of, um, see what TIER has done and how you guys have kind of approached this and your new philosophy of a service-centric approach.

I think it’s great inspiration for others to kind of take it and run with it. But I think we’re coming up on time here. So why don’t we maybe do a quick summary of kind of what we’ve talked about today and learned, and then let us know how people can stay up to date on what TIER is working on.

Timur

We switched to a service centric approach because it better fits our needs. In a service centric approach services are self-contained and we use service registry to register a service via Spacelift, and we use a service registry to control a team’s access or human access to the service and its secrets and it’ underlying infrastructure.  

Service registry is a central place for us. It’s a repository that creates teams in Okta. It creates AWS roles. It creates Kubernetes resources, access in Vault to a service that is deployed. And it also creates an infrastructure pipeline that we’ll just deploy the infrastructure from the service repository via Terraform, as well as that basically will only deploy the pipeline. And then the pipeline itself will already deploy the infrastructure, belonging to the service. At the same time, we also have a circleCI to deploy the service itself into this infrastructure.

Mike

If you guys want to stay up to date on everything TIER is working on you can check out their engineering blog. It’s at tier.engineer.