In this article, we will delve into load balancing in Kubernetes and explain the various available types, before showing how to configure them with example configuration files. We will then discuss the different available load balancing strategies, when best to use them, and some general best practices for handling load balancing in K8S!
We will cover:
A Kubernetes load balancer is a component that distributes network traffic across multiple instances of an application running in a K8S cluster. It acts as a traffic manager, ensuring that incoming requests are evenly distributed among the available instances to optimize performance and prevent overload on any single instance, providing high availability and scalability.
Load balancers in K8S can be implemented by using a cloud provider-specific load balancer such as Azure Load Balancer, AWS Network Load Balancer (NLB), or Elastic Load Balancer (ELB) that operates at the Network Layer 4 of the OSI model.
Cloud-specific Ingress controllers that can operate at the Application Layer 7, include Application Gateway on Azure, or ELB or Application Load Balancer (ALB) on AWS. To use ingress, an Ingress controller must be installed on the cluster, as they are not included out of the box with K8S.
In addition, many different ingress controllers exist that can be installed into the K8S cluster. Each provides different features and can be configured to perform different load-balancing distribution strategies, such as round-robin, least connection, session affinity, source IP hash, or even custom strategies depending on the specific requirements of the application.
Popular ingress controllers include NGINX, HAProxy, Istio Ingress, and Traefik. You can find a list of available Ingress controllers on the official docs page.
This article will largely focus on layer 4 load balancing.
There are two types of load balancer types in Kubernetes – internal and external.
- Internal load balancer — routes traffic only within the cluster and does not allow any external traffic.
- External load balancer — exposes the application to external users or services outside the cluster.
To use a Kubernetes load balancer, first, we need something to place the load balancer in front of in terms of a deployment.
The example manifest below configures a simple deployment with five replicas which we will load balancer traffic between.
Notice each replica is labeled with
apiVersion: apps/v1 kind: Deployment metadata: name: webapp-deployment spec: replicas: 5 selector: matchLabels: app: webapp template: metadata: labels: app: webapp spec: containers: - name: webapp-container image: webapp-image:latest ports: - containerPort: 8080
We can add our load balancer configuration:
apiVersion: v1 kind: Service metadata: name: webapp-service spec: type: LoadBalancer selector: app: webapp ports: - protocol: TCP port: 80 targetPort: 8080
selector matches the deployment labels
app:webapp — this is how K8S links the load balancer to the deployment. The load balancer will listen on port 80 and target the container port on 8080.
To view the IP address of the load balancer, you can use the
kubectl get services command. The load balancer will be provisioned automatically by the cloud environment (for example, Azure load balancer if you are running in Azure Kubernetes Service). However, for a local or on-premises cluster, you may need to set up a separate load balancer infrastructure or use an ingress controller.
To create an internal load balancer that routes traffic only within the cluster and does not allow any external traffic, we need to amend the configuration file:
apiVersion: v1 kind: Service metadata: name: webapp-service annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true" spec: type: LoadBalancer externalTrafficPolicy: Local selector: app: webapp ports: - protocol: TCP port: 80 targetPort: 8080
Internal load balancer on AWS cloud
Notice that we added this line, specific to the AWS cloud:
annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true"
…which ensures that only the nodes within the cluster handle traffic for the service.
Internal load balancer on Google Cloud (GCP)
To create an internal load balancer on Google Cloud (GCP), use the GCP-specific annotation for the internal load balancer instead and remove the
annotations: cloud.google.com/load-balancer-type: "Internal"
Internal load balancer on Azure
The below example shows a configuration file for an internal load balancer on Azure, which also specifies the
loadBalancerIP (Internal IP address for the load balancer), and the
loadBalancerSourceRanges (Internal IP ranges allowed to access the load balancer).
apiVersion: v1 kind: Service metadata: name: webapp-service annotations: service.beta.kubernetes.io/azure-load-balancer-internal spec: type: LoadBalancer loadBalancerIP: 192.168.1.1 loadBalancerSourceRanges: - 192.168.2.0/24 selector: app: webapp ports: - protocol: TCP port: 80 targetPort: 8080
Note that you can also generate a load balancer using
kubectl on the command line.
kubectl create service loadbalancer NAME [--tcp=port:targetPort] [--dry-run=server|client|none] [options]
To generate the YAML required for your configurations, you can use the
--dry-run=client -o yaml options and modify it from there.
Before discussing load-balancing strategies, it’s important to note that the availability and features of load-balancing strategies may depend on the underlying infrastructure and the type of load balancer being used, whether it’s a cloud provider load balancer or an ingress controller.
AWS, GCP, and Azure load balancers will all support different features depending on the type used. Again note these are layer 4 only and do not provide application layer 7 capability.
For example, Azure load balancer supports:
- Round Robin
Distributes traffic in a round-robin fashion across the healthy pods in the AKS cluster. Each new request is forwarded to the next available pod.
- Source IP Affinity
Also known as client IP affinity, sticky sessions, or session affinity, this is used when you need to ensure that requests from the same client IP are routed to the same pod, maintaining session state if necessary. Also referred to as ‘Source IP Hash’, where the routing is based on a hash of the source IP address.
- Session Persistence
Building on source IP affinity, a timeout value can be configured to ensure that the same client IP is routed to the same pod for a specified duration only.
- Port-Based Load Balancing
Used to distribute traffic when services require traffic on different ports.
AWS Network Load Balancer (NLB) in addition to the above allows you to utilize Target Group-Level Load Balancing to define target groups that group multiple pods together.
Other K8S load-balancing strategies include:
- Least Connection
Routes traffic to the pod with the fewest active connections. It is based on the idea that the pod with fewer active connections can handle more traffic.
- Custom Load Balancing
Using external load balancers or ingress controllers can provide more advanced load-balancing algorithms and traffic shaping where required.
- Carefully consider your requirements. Is a layer 4 load balancer sufficient for your needs, or do you require the option for application layer 7 routing or more advanced features such as SSL termination? Is session affinity required, and if so, which mechanism is best for your application?
- Utilize your cloud provider to have it provision a load balancer automatically using the
type: LoadBalancerin the Service manifest.
- Implement readiness and liveness probes to check the health of your pods, enabling the load balancer to distribute traffic only to healthy instances.
- Consult cloud provider documentation, as Load Balancer Annotations will be different for each specific option used. For example, there might be specific annotations available to utilize features like session affinity and timeouts.
- Enable Connection Draining where supported. Connection draining ensures that existing connections are gracefully handled when a pod or instance is being terminated or scaled down.
- Configure horizontal pod autoscaling (HPA) to automatically scale the number of pods based on resource utilization or custom metrics.
- Regularly monitor and analyze load balancer metrics, such as request rates, latency, error rates, and backend server health. Prometheus and Grafana are popular choices for this.
- Apply security best practices, such as enabling SSL/TLS termination on the load balancer, and ensure proper access controls (IAM) are in place to prevent unauthorized access to the load balancer or backend services.
- Simulating failure scenarios and thoroughly testing your configuration can help validate the load balancer behavior. Testing can help you spot flaws in your configuration and give you pointers on where to add more resilience.
Load balancer services in K8S are linked to a deployment using labels. They specify the port the load balancer will listen on, and the port they will target. A K8S load balancer can be internal only to the cluster or to allow external traffic into the cluster. Load balancers operate at the network level of the OSI model (layer 4). For more advanced DNS-based routing, use a Layer 7 device such as an application gateway on Azure, or an Ingress controller, such as NGINX.
Load balancing in K8S is flexible! Depending on your platform, you implement a multitude of load-balancing strategies and use various types of load balancers or ingress controllers based on your requirements.
We encourage you to also check out how Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. Anything that can be run via kubectl can be run within a Spacelift stack. Find out more about how Spacelift works with Kubernetes, and get started on your journey by creating a free trial account.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.