OpenTofu is now part of the Linux Foundation 🎉

Read more here →

General

What is a Service Mesh? Key Features, Benefits & Demo

Service Mesh: Intro, Demo & Glimpse into the Future

This blog post will provide an overview of the service mesh technology and its benefits. A service mesh is a technology rapidly gaining popularity in the cloud-native computing world. With security becoming an increasing priority, capabilities such as observability, reliability, security management, and zero trust will be critical components of any modern system, making service mesh technology a standard part of cloud-native platforms.

We will cover:

  1. What is a service mesh?
  2. Service mesh benefits
  3. Key components of a service mesh
  4. Service mesh landscape
  5. Service mesh quick start demo
  6. What’s next for the service mesh technology?

What is a Service Mesh?

A service mesh is a technology that adds reliability, security, and observability features to a platform by creating a dedicated infrastructure layer. This layer aims to simplify, secure, and facilitate the communication between microservices when dealing with complex, distributed applications at scale. 

It provides a way to manage and secure communication between services and monitor and observe the traffic flowing through the system. Until recently, Service mesh was typically implemented as a set of proxies deployed alongside the applications that intercept and manage the traffic between services. In modern microservices, these lightweight proxies are implemented as sidecar containers that sit next to your application containers. 

The collection of all these network proxies that handle and monitor traffic in our platform is called the data plane. To orchestrate, manage and coordinate the data plane, the service mesh maintains a centralized component, the control plane. Before we analyze its different parts in detail, let’s discuss the various benefits of a service mesh.

Service Mesh Benefits

In the modern world of distributed microservices of cloud-native systems, we often end up with systems that suffer from complexity, reduced explainability, and lack of operational agility. The architecture topologies are often hard to understand, visibility and troubleshooting become a burden, the attack surface of systems is constantly increasing, and networking resiliency is challenging.

To help us simplify and alleviate some of the pains described above, we can implement a service mesh at our platform level. Some of the most common capabilities provided include; fine-grained traffic management, load balancing, network resiliency, service discovery, security policy, rich observability, centralized control, authentication, and authorization. 

Service Discovery

One of its key functionalities is the provided ability to discover and connect other services in the mesh. This is usually done through the control plane via a service registry, which tracks information for the different services.

Traffic Management & Resiliency

Another useful functionality is a service mesh’s advanced traffic management and control capabilities. By leveraging this technology, we can quickly implement fine-grained traffic control inside our systems, load balancing, canary deployments, retries, failover mechanisms, circuit breaking, and routing based on the custom criteria we define.

Observability

Most service mesh solutions implement built-in observability tools that track metrics, logs, and traces and can provide a holistic observability view of a distributed system. The out-of-the-box visibility enables developers and operators to monitor the health of systems and related services, improve the understandability of environments, and troubleshoot bottlenecks and performance issues.

Security

Since the service mesh intercepts and manages all the traffic inside the system, it is also responsible for securing the communication within the mesh. The provided functionalities in this domain include enforcing encryption and mutual Transport Layer Security (mTLS), managing certificates, and providing tooling for fine-grained policies and access control.

All these concepts and functionalities are not new. They have been around and implemented by software development teams for years.

The most exciting point of the service mesh is that it abstracts these responsibilities and capabilities from the application layer into a common platform layer that sits at the infrastructure level. This allows these capabilities to be centralized, simplified, and uniform across all applications and tooling regardless of the underlying technology, programming language, and framework used and completely decoupled from the application code. 

Key Components of a Service Mesh

A typical service mesh solution involves several key components that compile the full service mesh implementation. Although the below elements apply to most cases, the exact details and specifics depend on the specific implementation and the particular service mesh technology picked. 

Later, we will go through some of the most common service mesh implementations and discuss the future of service meshes, which might change some key components below. For now, here’s what you need to know to get an overview of how a service mesh operates:

Data Plane

As mentioned, we refer to the data plane as the set of all the userspace proxies placed next to different services and handle and monitor traffic in our platform. The data plane intercepts all the traffic between different services for various reasons such as enforcing security best practices, providing observability, and network resiliency.

Control Plane

The control plane is a centralized component of different management processes orchestrating and coordinating the data plane. You can consider this the service mesh’s central ” brain ” that controls the network proxies’ behavior and management and provides an API layer to interact with. 

Sidecar Proxy

In most of the implementations so far, an instrumental component has been the sidecar proxy. A sidecar proxy container is deployed next to each service in a system that effectively handles all the service’s inbound and outbound traffic. The collection of all these proxies forms the data plane. Lately, there have been efforts to move towards sidecarless implementations that remove the need to manage and maintain sidecar proxies.

API Layer

Each service mesh implementation provides an API layer for the operators and developers to manipulate or interact with. Typically, this is used for automation in terms of configuration, custom tooling, integration with other systems, and maintenance.

The Service Mesh Landscape

Since the service mesh technology has increased in popularity over the last years, there has been a bloom of implementation in the cloud native landscape. We won’t review all the solutions but will mention some of the most prominent ones according to my experiences and taste. If your favorite service mesh is missing from the list, feel free to comment below and tell us more about it!

Istio

Istio is one of the most popular implementations out there. It provides application-aware networking by leveraging the powerful Envoy proxy. It is one of the most elaborate and feature-complete implementations, and it works with both Kubernetes and traditional environments. Some of its capabilities include features that enable universal traffic management, telemetry, and security.

Linkerd

Linkerd is another popular service mesh technology written in Rust that prides itself to be extremely lightweight, fast, and simple, developed primarily for Kubernetes setups. It is a graduated CNCF project. One of its main promises is the focus on simplicity with out-of-the-box functionalities in the security, observability, and reliability aspects of Kubernetes workloads.

Consul Connect

Consul is a multi-cloud service mesh solution that enables automatic service-to-service encryption and identity-based authorization that works with container orchestrators such as Kubernetes and Hashicorp’s Nomad. Consul ships with a built-in proxy but also supports Envoy. 

Finally, suppose you operate within a cloud provider such as AWS. In that case, you can consider leveraging its native service mesh implementation, AWS App Mesh, which provides excellent and seamless native integration with the rest of AWS services.

Service Mesh Quick Start Demo

Alright, now that we are aware of the basics and we have a good understanding of the service mesh technology, let’s have a walkthrough and a quick demo. We will install Linkerd on a Kubernetes cluster for this demo and explore its capabilities. 

To run this demo with me, you would need:

After satisfying these requirements, go ahead and run:

linkerd version

The output should look like this (depending on your version):

Client version: stable-2.13.3
Server version: unavailable

The next step is to validate that our Kubernetes cluster is configured correctly to install the Linkerd control plane.

Execute:

linkerd check --pre

This command will run different checks in your Kubernetes cluster, and if you are good to go, your output will be similar to this:

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

pre-kubernetes-setup
--------------------
√ control plane namespace does not already exist
√ can create non-namespaced resources
√ can create ServiceAccounts
√ can create Services
√ can create Deployments
√ can create CronJobs
√ can create ConfigMaps
√ can create Secrets
√ can read Secrets
√ can read extension-apiserver-authentication configmap
√ no clock skew detected

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

Status check results are √

Moving on, let’s install the Linkerd control plane. Execute these two commands:

linkerd install --crds | kubectl apply -f -
linkerd install --set proxyInit.runAsRoot=true | kubectl apply -f -

First, we install the necessary custom resource definitions and then the Linkerd control plane. Since we are using the local Kubernetes cluster of Docker Desktop and the docker container runtime, we have to run the proxy-init container as root.

You can skip this set flag on the second command depending on your Kubernetes cluster.

To verify the installation run:

linkerd check

and you should see:

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
√ control plane proxies and cli versions match

Status check results are √

If you are curious about the different control plane components, you can take a look at them by running:

kubectl get all -n linkerd

So far, so good.

For the needs of this demo, we will install a demo application provided by Linkerd, emojivoto. You can find more details on the Emoji.voto GitHub repository.

Basically, it’s a microservices application that allows users to vote for their favorite emoji, and tracks votes received on a leaderboard. 

The application is composed of 3 microservices:

To install the demo application, run:

curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/emojivoto.yml \
  | kubectl apply -f -

Then, let’s take a look into the components that we deployed in our cluster, execute:

kubectl get all -n emojivoto

And you should see the demo application’s deployments, pods, and services up and running. 

NAME                            READY   STATUS    RESTARTS   AGE
pod/emoji-78594cb998-cfhdz      1/1     Running   0          31s
pod/vote-bot-786d75cf45-l8vrc   1/1     Running   0          31s
pod/voting-5f5b555dff-pfkgs     1/1     Running   0          31s
pod/web-68cc8bc689-22ggw        1/1     Running   0          31s

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/emoji-svc    ClusterIP   10.98.210.142    <none>        8080/TCP,8801/TCP   31s
service/voting-svc   ClusterIP   10.110.117.148   <none>        8080/TCP,8801/TCP   31s
service/web-svc      ClusterIP   10.99.109.57     <none>        80/TCP              31s

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/emoji      1/1     1            1           31s
deployment.apps/vote-bot   1/1     1            1           31s
deployment.apps/voting     1/1     1            1           31s
deployment.apps/web        1/1     1            1           31s

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/emoji-78594cb998      1         1         1       31s
replicaset.apps/vote-bot-786d75cf45   1         1         1       31s
replicaset.apps/voting-5f5b555dff     1         1         1       31s
replicaset.apps/web-68cc8bc689        1         1         1       31s

To explore the sample application, we can use the port-forward command to expose the web-svc on a local port:

kubectl -n emojivoto port-forward svc/web-svc 8080:80

Head to http://localhost:8080 and play around with the sample application. Keep the port-forward command running on this terminal tab.

Next, open another terminal tab. Let’s apply the Linkerd service mesh by deploying the data plane proxies with a rolling deployment. 

This command injects the Linkerd proxies as sidecar containers in the pod of every deployment’s microservice for the emojivoto application. If you run the kubectl describe command on any application pod and check its Containers section, you will see that two containers are running. One container for the specific service and a linkerd-proxy.

To verify that you have deployed everything successfully, run the checks for the data plane this time:

linkerd -n emojivoto check --proxy

To get a visual representation of what Linkerd is actually doing, let’s install the viz extension with an on-cluster metric stack and dashboard:

linkerd viz install | kubectl apply -f -

And access the dashboard with:

linkerd viz dashboard &

This command will open a new browser tab with the monitoring dashboard. Select the emojivoto namespace and you should be able to see information about the network topology, metrics, and more.

Play around and explore the service mesh implementation. 

What’s Next for the Service Mesh Technology?

We discussed many benefits and functionalities that a service mesh offers, but there are also many nuances, shortcomings, and complexity that come with designing, building, maintaining, and operating a service mesh solution. 

By introducing a proxy in front of every service, we add extra components and processes that consume resources and additional latency that might impact performance.

Although these might be negligible in some cases, this might be a factor to consider in other situations. More importantly, by introducing a service mesh we add one more critical component handling all our traffic that could potentially fail or introduce security risks, and disrupt our running systems. 

The added complexity and operational burden that comes with adopting a service mesh is something that all the players in the market acknowledge. Looking at industry trends and the projects that most vendors are currently working on, there is one common theme; making the service mesh technology more straightforward and boring. There is a consensus that the future of service mesh should focus on simplicity and maintainability to a point where we don’t have to think about it. 

If you are interested in the subject, check out the recent session from KubeConEU Future of Service Mesh – Sidecar or Sidecarless or Proxyless? where a few industry experts discuss the topic. Towards that common goal, different vendors have taken different approaches. We have seen new projects that leverage eBPF and proxies at different levels per node or service account.

Istio has been working on its ambient mesh solution, a sidecar-less data plane approach that leverages proxies per node deployed as DaemonSets for layer 3 and 4 network functions, such as mTLS and on-demand per service accounts proxies for layer 7 primitives. The core promise of a sidecar-less service mesh is that it is easier to operate and retains the same functionalities but at a lower operational cost and complexity. 

Another prominent player in the space, Cilium has introduced its eBPF-based and Kubernetes native Cilium Service Mesh, which uses only a proxy per node and provides sidecar-free mTLS-based authentication. This implementation promises simplified operations, ease of maintenance, and improved network performance without sacrificing security. Cilium attempts to combine the best of both worlds, leveraging eBPF when possible and falling back to operations via the proxy when necessary.

On the other hand, Linkerd sticks to its sidecar model by leveraging its resource-efficient purpose-built proxy and has openly questioned the sidecar-less approach, the security isolation of shared proxies, and the suitability of eBPF for some of the service mesh features. Check out the session Life Without Sidecars – Is eBPF’s Promise Too Good to Be True? for more information on the debate. Only time will tell if any of the newer approaches will push for a shift in the basics of a service mesh traditional architecture. 

Another exciting development is that the enhancement proposal to integrate natively sidecar containers in Kubernetes was recently accepted. This means that in one of the future Kubernetes releases, this will be introduced to simplify the implementation of the sidecar pattern, make sidecars first-class citizens, and remediate some of the issues observed over the years. It will be interesting to see how the service mesh implementations will use and integrate this new functionality to their benefit.

Key Points

We dived deep into the world of service mesh and explored its concepts and components. We analyzed its basic parts and went over its main benefits, but also its shortcomings. Even more,  we went over a demo of a service mesh installation with Linkerd and saw how easy it is to get started. Finally, we discussed the approaches of some of the most prominent service mesh implementations and providers out there and had a glimpse into the future of service mesh. 

You can also take a look at how Kubernetes is integrated into Spacelift. Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change. If you aren’t already using Spacelift, sign up for a free trial.

Thanks for reading, and I hope you enjoyed this as much as I did.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial