Isolating (Cordoning) a Misbehaving Pod

TL;DR: You can remove a misbehaving pod from a service without deleting it. Use kubectl label pod ... cyberdyne-service- ... to remove a label / labels. Once the labels are gone it will be removed from the Kubernetes service that routes traffic to pods.

When a Kubernetes node is misbehaving, it's common to cordon that node via

kubectl cordon ip-10-10-9-94.ec2.internal

It turns out, it's not as simple to do this for a misbehaving pod. In order to do the same for a pod, we remove the labels used by the Kubernetes service that routes traffic to pods via a selector and the Kubernetes deployment that manages creation of pods, scaling, etc.

Per recommendations from kubernetes-dev, determine the pod labels that keep it in a deployment

$ kubectl get pods --namespace cyberdyne --selector cyberdyne-service=dhermes-echo
NAME                            READY   STATUS    RESTARTS   AGE
dhermes-echo-6c69bf4f49-6zbmx   3/3     Running   0          79m
$
$ kubectl get deployment \
>   --namespace cyberdyne \
>   dhermes-echo \
>   --output go-template='{{ range $k, $v := .spec.selector.matchLabels }}{{ $k }} -> {{ $v }}{{ "\n" }}{{ end }}'
cyberdyne-role -> service-instance
cyberdyne-service -> dhermes-echo
cyberdyne-service-env -> sandbox

then remove one (or all) of those labels, this will bring up a new pod and keep the old one running

$ kubectl label pod \
>   --namespace cyberdyne \
>   dhermes-echo-6c69bf4f49-6zbmx \
>   frozen=dhermes-experiment \
>   cyberdyne-role- \
>   cyberdyne-service- \
>   cyberdyne-service-env-
pod/dhermes-echo-6c69bf4f49-6zbmx labeled
$
$ kubectl get pods --namespace cyberdyne --selector cyberdyne-service=dhermes-echo
NAME                            READY   STATUS    RESTARTS   AGE
dhermes-echo-6c69bf4f49-8tmrf   3/3     Running   0          20s
$
$ kubectl get pods --namespace cyberdyne --selector cyberdyne-deploy=bcxsl9xf
NAME                            READY   STATUS    RESTARTS   AGE
dhermes-echo-6c69bf4f49-6zbmx   3/3     Running   0          86m
dhermes-echo-6c69bf4f49-8tmrf   3/3     Running   0          51s

and the service doesn't skip a beat

$ curl https://dhermes-echo.sandbox.k8s.invalid/headers
{"Accept":["*/*"],"Accept-Encoding":["gzip"],"User-Agent":["curl/7.64.1"],"X-Forwarded-For":["10.131.12.77"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"]}

Additionally, the liveness and readiness probes can be removed from any containers that have them, so the bad behavior can be left alone

$ kubectl edit pod --namespace cyberdyne dhermes-echo-6c69bf4f49-6zbmx

Tip: I usually set KUBE_EDITOR=emacs or KUBE_EDITOR='code --wait' when running kubectl edit. The default editor it uses likely won't be what you want.

Don't forget to clean up the pod when done debugging

$ kubectl delete pod --namespace cyberdyne dhermes-echo-6c69bf4f49-6zbmx

Bossy Lobster

A blog by Danny Hermes; musing on tech, mathematics, etc.

Isolating (Cordoning) a Misbehaving Pod

Comments