Join experts to dive deep into IaC security and governance on August 27

What is Kubernetes Load Balancer? Configuration Example

21 Jun 2023·15 min read

Reviewed by: Flavius DinuFlavius Dinu

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

In this article, we will explore load balancing in Kubernetes and show how to configure the various types with example configuration files. We will then discuss various load-balancing strategies and when to use them, plus some general best practices for handling load balancing in K8S.

We will cover:

What is a Kubernetes load balancer?

A Kubernetes load balancer service is a component that distributes network traffic across multiple instances of an application running in a K8S cluster. It acts as a traffic manager, ensuring that incoming requests are evenly distributed among the available instances to optimize performance and prevent overload on any single instance, providing high availability and scalability.

Load balancers in K8S can be implemented using a cloud provider–specific load balancer such as an Azure Load Balancer, AWS Network Load Balancer (NLB), or Elastic Load Balancer (ELB) that operates at Network Layer 4 of the OSI model.

Cloud-specific ingress controllers that can operate at Application Layer 7 include Application Gateway on Azure and ELB or Application Load Balancer (ALB) on AWS. To use ingress, an Ingress controller must be installed on the cluster, as they are not included out of the box with K8S.

You can also choose from a range of different ingress controllers that can be installed in the K8S cluster. Each provides different features and can be configured to perform different load-balancing distribution strategies, such as round-robin, least connection, session affinity, source IP hash, or even custom strategies depending on the application’s specific requirements.

Popular ingress controllers include NGINX, HAProxy, Istio Ingress, and Traefik. The official docs page lists the available ingress controllers.

This article will largely focus on layer 4 load balancing.

Types of load balancers available in Kubernetes

Each type of load balancer serves a specific purpose and functions at different layers of the networking stack. The internal load balancer routes traffic only within the cluster and does not allow any external traffic, whereas the external load balancer exposes the application to external users or services outside the cluster.

Below are the main types of load balancers available in Kubernetes:

Load balancer type	Layer	External or internal	Use case
LoadBalancer	Layer 4	External	Expose services to the external network using cloud provider’s LB
NodePort	Layer 4	External	Expose services on node’s IP addresses and static ports
ClusterIP	Internal	Internal	Default internal load balancing within the cluster
Ingress Controller	Layer 7	External	HTTP/HTTPS routing with SSL and path-based routing
IPVS	Layer 4	Internal	Advanced load balancing algorithms for internal cluster traffic
MetalLB	Layer 2/4	External	External load balancing for bare-metal Kubernetes environments
Custom (Envoy, NGINX)	Layer 4/7	External/Internal	Custom traffic routing or advanced load balancing

How to configure a load balancer in Kubernetes

To set up a Kubernetes load balancer, we first need a deployment to place the load balancer in front of.

The example manifest below configures a simple deployment with five replicas which we will load balancer traffic between.

Notice each replica is labeled with app: webapp.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
spec:
  replicas: 5
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp-container
        image: webapp-image:latest
        ports:
        - containerPort: 8080

We can add our load balancer configuration:

apiVersion: v1
kind: Service
metadata:
  name: webapp-service
spec:
  type: LoadBalancer
  selector:
    app: webapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Notice the selector matches the deployment labels app:webapp — this is how K8S links the load balancer to the deployment. The load balancer will listen on port 80 and target the container port on 8080.

To view the IP address of the load balancer, you can use the kubectl get services command. The load balancer will be provisioned automatically by the cloud environment (for example, Azure load balancer if you are running in Azure Kubernetes Service). However, for a local or on-premises cluster, you may need to set up a separate load balancer infrastructure or use an ingress controller.

To create an internal load balancer that routes traffic only within the cluster and does not allow any external traffic, we need to amend the configuration file:

apiVersion: v1
kind: Service
metadata:
  name: webapp-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local
  selector:
    app: webapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Internal load balancer on AWS cloud

Notice that we added this line, specific to the AWS cloud:

annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"

And specified:

externalTrafficPolicy: Local

…which ensures that only the nodes within the cluster handle traffic for the service.

Internal load balancer on Google Cloud (GCP)

To create an internal load balancer on Google Cloud (GCP), use the GCP-specific annotation for the internal load balancer instead and remove the externalTrafficPolicy line:

annotations:
    cloud.google.com/load-balancer-type: "Internal"

Internal load balancer on Azure

The example below shows a configuration file for an internal load balancer on Azure, which also specifies the loadBalancerIP (Internal IP address for the load balancer) and the loadBalancerSourceRanges (Internal IP ranges allowed to access the load balancer).

apiVersion: v1
kind: Service
metadata:
  name: webapp-service
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal
spec:
  type: LoadBalancer
  loadBalancerIP: 192.168.1.1
  loadBalancerSourceRanges:
  - 192.168.2.0/24
  selector:
    app: webapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Note that you can also generate a load balancer using kubectl on the command line.

kubectl create service loadbalancer NAME [--tcp=port:targetPort] [--dry-run=server|client|none] [options]

To generate the YAML required for your configurations, you can use the --dry-run=client -o yaml</span> options and modify it from there.

💡 You might also like:

Load balancer traffic distribution strategies

Before discussing load-balancing strategies, remember that their availability and features may depend on the underlying infrastructure and the type of load balancer being used, whether it’s a cloud provider load balancer or an ingress controller.

The table below summarizes common load balancer traffic distribution strategies.

Strategy	Key feature	Best use case
Round Robin	Sequential request distribution	Similar server capacities
Least Connections	Directs to the server with the fewest connections	Uneven traffic loads
IP Hash	Consistent server for the same client IP	Session persistence
Weighted Round Robin	Balances load based on server weights	Different server capabilities
Random	Random selection of servers	Simple setups
Geographic Routing	Directs traffic based on client location	Lowering latency in global systems
URL Path-Based	Traffic routed based on URL patterns	Microservices or content separation
Session Persistence	Keeps clients on the same server	Web applications needing session
Failover	Backup servers used when primary fails	High availability
Priority-Based	Routes to high-priority servers first	Resource prioritization
Cloud-Native	Dynamic auto-scaling based on load	Cloud-hosted applications
Service Mesh	Load balancing at microservice level	Microservices architectures
Anycast Routing	Same IP used for multiple distributed servers	Reducing latency for global requests

Example: Traffic distribution strategies for cloud provider load balancers

AWS, GCP, and Azure load balancers will all support different features depending on the type used. Again note these are layer 4 only and do not provide application layer 7 capability.

For example, Azure load balancer supports:

Round Robin: Distributes traffic in a round-robin fashion across the healthy pods in the AKS cluster. Each new request is forwarded to the next available pod.

Source IP Affinity: Also known as client IP affinity, sticky sessions, or session affinity, this is used when you need to ensure that requests from the same client IP are routed to the same pod, maintaining session state if necessary. Also referred to as ‘Source IP Hash’, where the routing is based on a hash of the source IP address.

Session Persistence: Building on source IP affinity, a timeout value can be configured to ensure that the same client IP is routed to the same pod for a specified duration only.

Port-Based Load Balancing: Used to distribute traffic when services require traffic on different ports.

AWS Network Load Balancer (NLB) also allows you to utilize Target Group-Level Load Balancing to define target groups that group multiple pods together.

Best practices for handling a Kubernetes load balancer

Let’s look at some of the best practices for handling Kubernetes load balancers:

Carefully consider your requirements. Is a layer 4 load balancer sufficient for your needs, or do you require the option for application layer 7 routing or more advanced features such as SSL termination? Is session affinity required, and if so, which mechanism is best for your application?
Utilize your cloud provider to provision a load balancer automatically using the type: LoadBalancer in the Service manifest.
Implement readiness and liveness probes to check the health of your pods, enabling the load balancer to distribute traffic only to healthy instances.
Consult cloud provider documentation, as Load Balancer Annotations will be different for each specific option used. For example, specific annotations may be available to utilize features like session affinity and timeouts.
Enable connection draining where supported. Connection draining ensures that existing connections are gracefully handled when a pod or instance is being terminated or scaled down.
Configure Kubernetes Horizontal Pod Autoscaling (HPA) to automatically scale the number of pods based on resource utilization or custom metrics.
Regularly monitor and analyze load balancer metrics, such as request rates, latency, error rates, and backend server health. Prometheus and Grafana are popular choices for this.
Apply security best practices, such as enabling SSL/TLS termination on the load balancer, and ensure proper access controls (IAM) are in place to prevent unauthorized access to the load balancer or backend services.
Simulating failure scenarios and thoroughly testing your configuration can help validate the load balancer behavior. Testing can help you spot flaws in your configuration and give you pointers on where to add more resilience.

Kubernetes ingress vs load balancer

Kubernetes Ingress provides centralized L7 routing (e.g., path or domain-based) for multiple services via a single IP, with features like SSL termination. LoadBalancer offers L4 access with a dedicated IP per service, supporting basic traffic routing. Ingress can be used for advanced, cost-effective routing, and LoadBalancer for simple, direct service exposure.

	Ingress	LoadBalancer
Layer	Application layer (L7, HTTP/HTTPS)	Network layer (L4, TCP/UDP)
Use case	Centralized routing for multiple services	Direct exposure for individual services
External IPs	Shares a single external IP	Allocates a unique external IP per service
Features	Advanced routing, SSL termination	Basic load balancing
Cost	More cost-effective (shared IP)	Can be expensive for many services

How to manage Kubernetes with Spacelift?

If you need assistance managing your Kubernetes projects, look at Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change.

To take this one step further, you could add custom policies to reinforce the security and reliability of your configurations and deployments. Spacelift provides different types of policies and workflows that are easily customizable to fit every use case. For instance, you could add plan policies to restrict or warn about security or compliance violations or approval policies to add an approval step during deployments.

You can try Spacelift for free by creating a trial account or booking a demo with one of our engineers.

Key points

Load balancer services in K8S are linked to a deployment using labels. They specify the port the load balancer will listen on, and the port they will target. A K8S load balancer can be internal only to the cluster or to allow external traffic into the cluster. Load balancers operate at the network level of the OSI model (layer 4). For more advanced DNS-based routing, use a Layer 7 device such as an application gateway on Azure, or an ingress controller, such as NGINX.

Load balancing in K8s is flexible. Depending on your platform, you can implement many load-balancing strategies and use various types of load balancers or ingress controllers based on your requirements.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial

Written by

Jack Roper

Jack Roper is a highly experienced IT professional with close to 20 years of experience, focused on cloud and DevOps technologies. He specializes in Terraform, Azure, Azure DevOps, and Kubernetes and holds multiple certifications from Microsoft, Amazon, and Hashicorp. Jack enjoys writing technical articles for well-regarded websites.