Important K8's Interview Questions for Experienced DevOps Engineer

Introduction

In this blog, we’ll cover some of the most commonly asked Kubernetes interview questions, breaking down complex topics into simple, digestible answers. Whether you’re a beginner looking to establish a strong foundation or an experienced professional aiming to refine your knowledge, these insights will help you prepare with confidence.

What is the Role of Control Plane Components in Kubernetes?

The control plane is like the brain of Kubernetes, managing the cluster's overall state. Here's what each part does:

kube-apiserver: The cluster's front door. It handles requests (like creating or deleting pods).
etcd: A database that stores all the cluster's data. Think of it as Kubernetes' memory.
kube-scheduler: Decides which node should run your workload based on resource availability.
kube-controller-manager: Ensures everything is running as it should (e.g., maintaining the right number of pods).
cloud-controller-manager: Works with your cloud provider to manage things like load balancers.

Example:
When you deploy an app, the kube-apiserver takes your request and stores it in etcd. The kube-scheduler picks a node to run the app, while the kube-controller-manager ensures the pods keep running. If you're in the cloud, the cloud-controller-manager might create a load balancer for your app.

Pro Tip: These components' behavior might change depending on your Kubernetes distribution or customizations.

How to Design a High-Availability Kubernetes Cluster?

A highly available cluster makes sure your apps stay online, even if something fails.

Use multiple master nodes in different zones for redundancy.
Spread etcd nodes across zones for data safety.
Add load balancers to evenly distribute traffic to your master nodes.
Set up auto-repair for nodes to fix them when they fail.

Example:
For an e-commerce app, we used 3 master nodes and distributed etcd nodes across availability zones. A load balancer directed traffic to healthy masters, and auto-repair ensured failed nodes were quickly replaced.

Pro Tip: High availability adds complexity and cost. Choose this setup only if your app needs near-zero downtime.

Can you walk us through the steps you would take to deploy an application in a Kubernetes cluster?

Package the Application
Begin by containerizing your application using Docker. This involves creating a Docker image that includes the application code, its dependencies, and any necessary configuration. The goal is to have a self-contained image that can run consistently in any environment.
Upload to a Container Registry
Push the Docker image to a container registry, such as Docker Hub, ECR, or an internal/private registry. This makes the image accessible to your Kubernetes cluster for deployment.
Define the Deployment Configuration
Describe the application’s desired state using a configuration file. This includes:
- Specifying the container image to use.
- Setting the number of replicas for scaling and high availability.
- Defining any required resources, such as CPU or memory limits, to ensure proper scheduling.
Deploy the Application
Use Kubernetes to deploy the application based on the configuration. This involves creating deployment objects that instruct Kubernetes to manage the application lifecycle, ensuring that the specified number of replicas are always running.
Expose the Application
To make the application accessible, configure a service in Kubernetes. This could be a ClusterIP, NodePort, or LoadBalancer service, depending on how you want users to access the application. For external access, you might use a LoadBalancer or an Ingress controller.
Monitor and Verify
Ensure that the application is running as expected by checking the status of the deployment, replicas, and pods. Monitor logs and metrics to confirm that the application is healthy and performing as intended.
Iterate and Update
As needed, update the application by modifying the deployment configuration. Kubernetes will handle rolling updates to ensure minimal disruption.

How to Achieve Zero-Downtime Deployments in Kubernetes?

Zero-downtime means users won't notice when you're updating apps. Use these techniques:

Rolling Updates: Gradually replace old pods with new ones.
Blue-Green Deployments: Run two environments (old and new) and switch traffic to the new one.
Canary Releases: Roll out changes to a small group of users first.

Example:
For a payment service, we first sent 10% of user traffic to the new version (canary release). After monitoring for issues, we gradually increased traffic until the update was complete.

Pro Tip: Always test updates in staging and monitor traffic during deployment to catch issues early.

Can you discuss your experience with using Kubernetes network plugins such as Calico, Flannel, or Weave Net?

Kubernetes network plugins, such as Calico, Flannel, and Weave Net, are essential components of a Kubernetes cluster that enable communication between the various components of the cluster, such as pods, nodes, and services.

Calico provides network segmentation and network policy enforcement. For example, you can define policies that allow only the payment service to communicate with the database service while blocking traffic from other services.
Additionally, Calico's high-performance routing ensures minimal latency for critical applications.

Flannel provides a simple overlay network with minimal configuration. It supports various backends like VXLAN for encapsulating traffic. You don’t need advanced features like network policies.

Weave Net offers encryption to secure data in transit between pods. It also supports multiple networking modes like overlay and VLAN, giving you flexibility depending on your infrastructure.

How to Handle Logging and Monitoring in Kubernetes?

Use centralized tools for visibility:

Logging: Use solutions like the ELK stack to collect logs from all pods.
Monitoring: Use Prometheus and Grafana for metrics and alerts.

Example:
We set up the ELK stack to centralize logs and Grafana dashboards for real-time cluster performance metrics.

Pro Tip: Tune your logs and metrics collection to avoid performance overhead.

How to Optimize Resource Usage in Kubernetes?

Make sure you're not wasting cluster resources:

Set requests and limits so each pod gets the right amount of CPU and memory.
Use Horizontal Pod Autoscalers to scale based on traffic.
Use Vertical Pod Autoscalers to adjust pod sizes based on their needs.
Monitor usage with tools like Prometheus and Grafana.

Example:
"We used Prometheus metrics to set up autoscaling for our frontend pods, ensuring we scaled up during traffic spikes and down during quiet times."

Pro Tip: Avoid setting requests or limits too high; it wastes resources and increases costs.

What is the Role of Service Meshes in Kubernetes?

Service meshes like Istio or Linkerd make managing microservices easier:

Provide security with mTLS (encryption).
Improve observability with metrics and traces.
Enable traffic control for canary deployments.
Add resilience with circuit breakers and retries.

Example:
We used Istio to secure service communication with mTLS and monitored traffic during canary deployments to ensure stability.

Pro Tip: Service meshes add complexity. Use them only if your app has many microservices.

How Do You Manage Stateful Applications in Kubernetes?

Stateful apps, like databases, need special care to handle data and identity.

Use StatefulSets for stable pod identities and ordered operations.
Attach storage with Persistent Volumes (PV) and Persistent Volume Claims (PVC).
Use Headless Services to give pods consistent network names.

Example:
We deployed PostgreSQL using StatefulSets for unique pod names and PVCs to keep data safe. A Headless Service ensured pods could talk to each other easily.

Pro Tip: Stateful apps need regular backups and disaster recovery plans since Kubernetes doesn’t handle that automatically.

How do you handle rolling updates in a Kubernetes cluster? Can you discuss the strategies you have used to manage application updates?

Rolling updates in a Kubernetes cluster refers to the process of updating an application without disrupting its availability to users. This is typically achieved by gradually updating the application pods one-by-one, rather than updating all of them at once.

There are several strategies that can be used to manage rolling updates in a Kubernetes cluster:

RollingUpdate: This is the default update strategy in Kubernetes, and involves updating the pods one-by-one, starting with a single pod, and proceeding to the next one only after the first one has successfully been updated.
Recreate: In this strategy, all pods are deleted and recreated at once, which results in a brief disruption in service availability.
Blue-Green Deployment: This involves running two versions of the application in parallel, with one version being updated while the other remains unchanged. When the update is complete, traffic is redirected to the updated version.
Canary Deployment: This involves gradually rolling out the updated application to a small subset of users, and gradually increasing the percentage of users who receive the update over time. This allows for testing and validation of the updated application before it is rolled out to all users.

How to Secure a Kubernetes Cluster?

Securing Kubernetes involves multiple steps:

Network Policies: Control which pods can communicate.
RBAC: Limit user access to cluster resources.
Pod Security: Enforce security rules (e.g., no root users).
Encrypt data with TLS and store sensitive data in Secrets.

Example:
We used RBAC to restrict access, network policies to isolate services, and an external secrets manager for secure credentials handling.

Pro Tip: Regularly audit your cluster and keep Kubernetes versions up to date for security patches.

How to Ensure High Availability for etcd?

etcd is critical for Kubernetes, so make it reliable:

Deploy multiple etcd nodes across zones for fault tolerance.
Use dedicated hardware for best performance.
Take regular backups for disaster recovery.
Monitor etcd health with tools like Prometheus.

Example:
We set up a 3-node etcd cluster across availability zones and automated snapshot backups every 6 hours.

Pro Tip: Keep etcd latency low. High latency can slow down the entire cluster.

What is GitOps, and How Does It Help in Kubernetes?

GitOps uses Git as the source of truth for your cluster configuration.

Makes deployments easier and more predictable.
Simplifies rollbacks using Git history.
Enhances security by reviewing changes in Git.

Example:
We used Argo CD to sync our Kubernetes cluster with manifests stored in Git. Any changes to the app were made in Git and automatically deployed.

Pro Tip: Protect your Git repo—it's the single source of truth.

A pod is unable to communicate with another pod on a different node. How would you troubleshoot this?

Check Pod IP Addresses: Verify the IP addresses assigned to the pods using kubectl get pods -o wide.
Ensure the pods have unique IPs within the cluster.
Inspect the Network Plugin (CNI): Confirm the CNI plugin (e.g., Calico, Flannel, Cilium) is running correctly. Use kubectl get pods -n kube-system to check the status of network-related pods.
Ping Across Pods: Use ping or curl from one pod to the other’s IP to check connectivity.
Example:
```
 kubectl exec -it <pod-name> -- ping <target-pod-ip>
```
Review Node Network Configuration: Check if nodes can communicate with each other. Use tools like telnet or traceroute to test inter-node connectivity.
Inspect Firewall Rules: Verify that no firewall rules or security groups are blocking traffic between nodes.
Check kube-proxy Configuration: Ensure kube-proxy is running correctly and the iptables rules are set up properly.
```
 iptables -L -t nat | grep KUBE
```

How would you troubleshoot DNS issues within a Kubernetes cluster?

Verify CoreDNS Pods: Check if the CoreDNS pods are running and healthy.
```
 kubectl get pods -n kube-system -l k8s-app=kube-dns
```
Test DNS Resolution: Use nslookup or dig within a pod to test DNS resolution
```
 kubectl exec -it <pod-name> -- nslookup <service-name>
```

Inspect DNS Configurations: Review the resolv.conf file in the pod to ensure it points to the cluster DNS.

 kubectl exec -it <pod-name> -- cat /etc/resolv.conf

Typical entries should include:

 nameserver <cluster-dns-ip> search <namespace>.svc.cluster.local svc.cluster.local

Examine CoreDNS Logs: Check the logs for errors or misconfigurations in CoreDNS.
```
 kubectl logs -n kube-system <coredns-pod>
```
Verify Service and Endpoints: Ensure the kube-dns service and endpoints are properly configured.
```
 kubectl get svc,ep -n kube-system
```
Check Network Policies: Ensure network policies are not blocking DNS traffic.

A service is not reachable from another namespace within the cluster. What could be causing this issue?

Check the Service Name and Namespace:
Ensure you’re using the correct service name and namespace in your application or while testing connectivity.
Validate DNS Configuration: Use nslookup or dig inside the cluster to verify DNS resolution.
```
 nslookup <service-name>.<namespace>.svc.cluster.local
```
Inspect Network Policies: Kubernetes network policies may restrict inter-namespace communication.
Use kubectl describe networkpolicy to check for policies blocking traffic between namespaces.
Cross-Check Labels and Selectors: Ensure that the service selectors match the labels of the intended pods. Misaligned selectors often lead to connectivity failures.
Verify Service Type and Ports: Confirm that the service is of an appropriate type (ClusterIP for internal communication).
Check if the correct ports are exposed and no firewall rules block traffic.

If a service is experiencing high latency, what steps would you take to diagnose the problem?

Check Pod Resource Utilization: Use kubectl top pod to monitor CPU and memory usage. High resource utilization can cause latency.
kubectl top pod -n <namespace>
Analyze Pod Logs: Inspect pod logs for errors or warnings using kubectl logs.
Examine Network Latency: Test network latency between pods using tools like ping or curl.
Inspect Readiness and Liveness Probes: Misconfigured probes can cause unnecessary restarts or delays.
kubectl describe pod <pod-name>
Monitor with APM Tools: Use tools like Prometheus, Grafana, or Jaeger to get insights into latency trends and trace issues.
Review Cluster Autoscaling and Load Distribution: Check if the cluster has sufficient resources or if horizontal pod autoscaling (HPA) is misconfigured.

What would you do if a NodePort service is not accessible from outside the cluster?

Verify NodePort Range: NodePorts are assigned from the range 30000–32767 by default. Ensure the client is accessing the correct port.
Check Service and Pod Status: Confirm the service and backend pods are running and healthy:
Inspect Firewall Rules: Ensure firewall rules allow traffic to the NodePort range. On cloud providers, check security groups or network ACLs.
Examine Cluster Networking: Confirm that the nodes’ external IPs are accessible and not blocked by NAT or routing issues.
Test Connectivity to NodePort: Use curl or telnet to test the service:
curl http://<node-ip>:<node-port>
Review Node Configuration: On some platforms, specific flags (--nodeport-addresses) restrict NodePort binding. Ensure it is correctly configured.
Fallback to ClusterIP Testing: Verify internal connectivity using the ClusterIP. If this fails, debug the pod, service, or networking stack.

Describe Kubernetes affinity and anti-affinity rules, and explain how they affect Pod scheduling.

Kubernetes affinity and anti-affinity rules are used to influence Pod scheduling.

Affinity: Specifies where Pods are preferred to be scheduled. This can be achieved through node affinity (based on node labels) or pod affinity (based on labels of other Pods). For example, you may want to schedule certain Pods on nodes with specific hardware or a specific version of the operating system.
Anti-Affinity: Specifies where Pods should not be scheduled. This is typically used to ensure high availability among Pods. For example, you can set anti-affinity rules to prevent Pods of the same service from being scheduled on the same node, thus preventing service interruption in case of node failures.

What is Ingress in Kubernetes and how does it work?

Ingress is a Kubernetes API object used to manage external access to services within the cluster over HTTP and HTTPS. It provides a way to route external URLs to services inside the cluster based on rules such as domain and path.

An Ingress controller is responsible for implementing the routing rules defined by the Ingress object. When an Ingress object is created, the Ingress controller reads its configuration and sets up the routing rules accordingly. Common Ingress controllers include Nginx Ingress Controller, Traefik, among others. These controllers translate Ingress rules into configurations for Nginx, HAProxy, or other load balancers to enable HTTP and HTTPS routing.

What are Taints and Tolerations in Kubernetes, and how do they affect Pod scheduling?

Taints are key-value pairs attached to nodes in Kubernetes, used to represent certain attributes or conditions on the node that might prevent Pods from running on that node. Tolerations are fields in the Pod specification that indicate which Taints the Pod can tolerate. When the scheduler attempts to schedule a Pod to a node, it checks the Taints on the node and the Tolerations on the Pod to ensure that the Pod can tolerate all the Taints on the node. This allows administrators to control Pod scheduling more finely, such as restricting certain types of Pods to nodes with specific hardware or software configurations.

What are Pod affinity and anti-affinity in Kubernetes, and what is their role in scheduling?

Pod Affinity ensures that certain Pods are placed on nodes near or with other Pods. It’s useful when Pods need to be scheduled together for performance or dependency reasons.

Pod Anti-Affinity ensures that certain Pods are not placed on the same nodes as others. This is useful for high availability and fault tolerance by spreading Pods across nodes.

What is a Service Account in Kubernetes, and how does it differ from a User Account?

A Service Account is a Kubernetes resource that provides an identity to a Pod (or set of Pods). This identity can be used for:

Authentication: When Pods need to interact with the Kubernetes API, they use a Service Account to authenticate.
Authorization: The Service Account determines what actions a Pod can perform (e.g., accessing secrets, interacting with other resources).

Each Pod in Kubernetes is associated with a Service Account. If no specific Service Account is mentioned, Kubernetes automatically uses the default Service Account in the namespace.

What is PodDisruptionBudget in Kubernetes, and how does it help ensure high availability of applications?

A PodDisruptionBudget (PDB) is a Kubernetes resource that defines the minimum availability of Pods for a deployment, StatefulSet, or ReplicaSet during voluntary disruptions. Voluntary disruptions are actions like node drain (e.g., for maintenance or upgrades) or manual pod eviction, where the system intentionally removes or evicts Pods.

A PDB helps to ensure that a certain number or percentage of Pods remain available during these disruptions, which is essential for maintaining high availability and service reliability.