Tuesday, 20 January 2026

What is the Kube-Scheduler?

kube-scheduler is a watchman. Its primary job is to monitor the API Server for newly created pods that have no nodeName assigned (the "Pending" state). Once it finds one, it evaluates every node in your cluster to find the best possible home based on resources, policies, and hardware constraints.

3-Step Core Workflow
1.Scheduling Queue
Whenever a pod is created, it enters a Pending state and is added to the Scheduling Queue. This isn't a simple FIFO (First-In-First-Out) line; it’s a Priority Queue where pods are sorted based on their PriorityClass. High-priority pods, such as system-critical components, jump to the front of the line to be processed first, while lower-priority pods wait their turn. The scheduler then pulls these pods from the queue one by one to begin the placement process.

2.Filtering:
In this phase, the scheduler runs a series of "Predicates." If a node fails even one of these checks, it is disqualified.
Resource Check (PodFitsResources): Does the node have enough free CPU and Memory to meet the Pod’s requests?
Port Check (PodFitsHostPorts): If a pod requires a specific port on the host (HostPort), is that port already taken by another pod on this node?
Taint/Toleration Check: Nodes can have Taints (repellants). Unless the pod has a matching Toleration, it cannot be scheduled there.
Node Selection: Does the node match the nodeSelector or nodeAffinity labels defined in the Pod spec?

3.Scoring:
After filtering, we might have five nodes that could run the pod. The Scoring phase determines which one should run it. Each node is given a score (usually 0–100) based on several factors:
Least Requested: Favors nodes with more free resources to balance the cluster.
Image Locality: Favors nodes that already have the container image downloaded (speeding up start times).
Affinity/Anti-Affinity: Soft preferences, like "I'd prefer not to be on the same node as other pods from this app for high availability."

The node with the highest score is selected as the "Winner."

Binding: Updating the Cluster State
Once the winner is selected, the scheduler doesn't actually "start" the pod. Instead, it completes a Binding request:
Request to API Server: The scheduler sends a "Binding" object to the kube-apiserver.
API Server Updates etcd: The API Server receives this request, validates it, and updates the Pod's definition in etcd (the cluster's database), setting the nodeName field to the winner's name.
Kubelet Takes Over: The Kubelet (the agent on the worker node) is also watching the API Server. It sees that a pod has been assigned to its node, pulls the image, and starts the container.
    
    

 

Saturday, 17 January 2026

Kubernetes Authorization Modes

In Kubernetes, security is a multi-layered journey. Once a user or service proves their identity—a process known as Authentication—they face a second, more granular challenge: Authorization.

If Authentication asks, "Who are you?", Authorization asks, "What exactly are you allowed to do here?"
In this post, we’ll break down the mechanisms Kubernetes uses to control access and ensure your cluster remains a "Zero Trust" environment.  

1. Node Authorization: 
Node Authorization is a specialized, fixed-purpose authorizer designed specifically for Kubelets. It implements a graph-based check to ensure that a worker node only has access to the resources it absolutely needs to function.

Target: Requests coming from nodes (identified by the system:nodes group and system:node:<nodeName> username).
Technical Logic: It limits a Kubelet's ability to read Secrets, ConfigMaps, and PersistentVolumes. A Kubelet can only access these objects if they are associated with a Pod currently scheduled on that specific node.
Security Impact: This prevents a compromised node from "lateral movement"—it cannot reach out and steal secrets belonging to workloads on other nodes.

2. RBAC: 
Role-Based Access Control (RBAC) is the most common and recommended authorization mechanism. It allows for dynamic, API-driven permission management without requiring an API server restart.

Objects: * Roles/ClusterRoles: Pure sets of permissions (Verbs + Resources + API Groups).
RoleBindings/ClusterRoleBindings: Mapping objects that attach a Subject (User/Group/ServiceAccount) to a Role.
Technical Nuance: RBAC is additive-only. There are no "Deny" rules in RBAC; if no rule grants access, the request is denied by default. It also supports Aggregation, allowing you to combine multiple ClusterRoles into a single "super-role" dynamically.

3. ABAC: Policy-Driven
Attribute-Based Access Control (ABAC) grants access based on a combination of attributes (user, resource, and environment).

Implementation: Unlike RBAC, ABAC policies are defined in a local JSON file on the master node.

Technical Logic: Each line in the policy file is a "Policy Object."For example:
{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "alice", "namespace": "dev", "resource": "pods", "readonly": true}}

Downside: ABAC is difficult to manage at scale because any change requires a manual update to the file and a restart of the Kube-API server, making it less agile than RBAC.

4. Webhook Authorization: 
Webhook authorization allows Kubernetes to delegate the "Yes/No" decision to a remote HTTP service. This is the ultimate tool for integrating Kubernetes with enterprise-wide security policies.

Flow: When a request arrives, the API server sends a SubjectAccessReview (a JSON-serialized object) to an external REST endpoint.
 Technical Payload: The payload includes the username, groups, and the specific resource/verb requested. The remote service responds with an allowed: true or false status. 

Use Cases: * Integrating with Open Policy Agent (OPA) for complex logic.

5. AlwaysAllow:
As the name suggests, the AlwaysAllow mode grants every request, regardless of who is asking or what they are trying to do. It completely bypasses all security checks.

Technical Logic: It returns allowed: true for every single API call.
Use Cases: * Local Development: Used in very restricted, single-node local environments (like early-stage minikube setups) where security isn't a concern.
Unit Testing: Used by developers testing API server extensions where they want to isolate the logic from authorization interference.
Risk: Enabling this in a production cluster is a critical security failure. It effectively turns off the cluster's "immune system," allowing any unauthenticated "system:anonymous" user to delete the entire cluster.

6. AlwaysDeny:
The AlwaysDeny mode does exactly the opposite: it rejects every single request.

Technical Logic: It returns allowed: false for everything.
Use Cases: * Security Hardening: It is often used at the very end of a list of authorization modes. If the request doesn't match a Node rule, an RBAC rule, or a Webhook rule, it hits the "final wall" and is rejected.
Emergency Lockdown: In extreme scenarios, an administrator could theoretically set this to prevent any further changes to the cluster state during an active breach investigation.
Nuance: Even with AlwaysDeny, the API server may still allow certain "discovery" endpoints (like /healthz) depending on the version and configuration, but for all intents and purposes, the cluster becomes a "read-only/no-access" vault.
    
        

Saturday, 10 January 2026

Kubernetes API: Understanding apiVersion

apiVersion field is the first line of every manifest. While it may seem like a static piece of boilerplate, it is actually the most important instruction you give to the API server. It defines the schema, the validation rules, and the stability of the resource are about to create.

Kubernetes organizes its thousands of parameters into API Groups. The apiVersion string tells the cluster which "folder" and "version" of the API to look in.

There are two distinct patterns for these values:
1. The Core Group
These are the foundational objects of Kubernetes. Because they have existed since the beginning, they do not belong to a named group.
    Format: v1
    Resources: Pod, Service, Namespace, Node, ConfigMap, Secret.
    Example: 
    YAML
    apiVersion: v1
    kind: Service

2. Named Groups
As Kubernetes evolved, new functionality was added via specialized groups. These follow a "Group/Version" structure.
    Format: group.k8s.io/version
    Resources: Deployments, Ingress, CronJobs.
    Example:
    YAML
    apiVersion: apps/v1
    kind: Deployment

The Stability Lifecycle
Version     Stability           Description
v1alpha1    Experimental        May contain bugs. Can be dropped in future releases without warning.
v1beta1     Prerelease          Feature-complete and tested. Safe for non-critical environments.
v1          Stable              Production-ready.   

 
Many resources have migrated from "Beta" to "Stable" over the last few years. Here is the current standard mapping for common resources:
    Workloads: apps/v1 (Deployment, StatefulSet, DaemonSet)
    Batch: batch/v1 (Job, CronJob)
    Networking: networking.k8s.io/v1 (Ingress, NetworkPolicy)
    RBAC: rbac.authorization.k8s.io/v1 (Role, ClusterRole)

# See all resources and their associated API versions
kubectl api-resources

# List all enabled API versions on the server
kubectl api-versions