Tuesday, 20 January 2026

What is the Kube-Scheduler?

kube-scheduler is a watchman. Its primary job is to monitor the API Server for newly created pods that have no nodeName assigned (the "Pending" state). Once it finds one, it evaluates every node in your cluster to find the best possible home based on resources, policies, and hardware constraints.

3-Step Core Workflow
1.Scheduling Queue
Whenever a pod is created, it enters a Pending state and is added to the Scheduling Queue. This isn't a simple FIFO (First-In-First-Out) line; it’s a Priority Queue where pods are sorted based on their PriorityClass. High-priority pods, such as system-critical components, jump to the front of the line to be processed first, while lower-priority pods wait their turn. The scheduler then pulls these pods from the queue one by one to begin the placement process.

2.Filtering:
In this phase, the scheduler runs a series of "Predicates." If a node fails even one of these checks, it is disqualified.
Resource Check (PodFitsResources): Does the node have enough free CPU and Memory to meet the Pod’s requests?
Port Check (PodFitsHostPorts): If a pod requires a specific port on the host (HostPort), is that port already taken by another pod on this node?
Taint/Toleration Check: Nodes can have Taints (repellants). Unless the pod has a matching Toleration, it cannot be scheduled there.
Node Selection: Does the node match the nodeSelector or nodeAffinity labels defined in the Pod spec?

3.Scoring:
After filtering, we might have five nodes that could run the pod. The Scoring phase determines which one should run it. Each node is given a score (usually 0–100) based on several factors:
Least Requested: Favors nodes with more free resources to balance the cluster.
Image Locality: Favors nodes that already have the container image downloaded (speeding up start times).
Affinity/Anti-Affinity: Soft preferences, like "I'd prefer not to be on the same node as other pods from this app for high availability."

The node with the highest score is selected as the "Winner."

Binding: Updating the Cluster State
Once the winner is selected, the scheduler doesn't actually "start" the pod. Instead, it completes a Binding request:
Request to API Server: The scheduler sends a "Binding" object to the kube-apiserver.
API Server Updates etcd: The API Server receives this request, validates it, and updates the Pod's definition in etcd (the cluster's database), setting the nodeName field to the winner's name.
Kubelet Takes Over: The Kubelet (the agent on the worker node) is also watching the API Server. It sees that a pod has been assigned to its node, pulls the image, and starts the container.
    
    

 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.