Skip to main content

Triaging Image Pull Errors

Problem identification

In order to identify the affected resources, you can either filter by ErrImagePull or ImagePullBackOff, which is the state that appears after the resource stops trying to pull the image after a given threshold of times on a configured interval. You can do all of this using kubectl.

$ kubectl get all --all-namespaces | grep 'ErrImagePull|ImagePullBackOff'
NAMESPACE NAME READY STATUS RESTARTS AGE
mynamespace pod/mypod-1 0/1 ErrImagePull 0 8m53s
mynamespace pod/mypod-2 0/1 ImagePullBackOff 0 8h

Hypothesis

  • The docker image may not be available for download, or it may not exist.
  • There may be an issue with authentication, which is preventing a successful pull.

Debugging

Reviewing the Cluster Metrics Dashboard

Reviewing this dashboard will give the user an understanding of the scope of the impact. Is it just one metric data-point or is it many; are there trends or other considerations to be made?

  • Log into the AWS Console
  • Navigate to CloudWatch
  • Browse the Dashboards Section
  • Locate the cluster metrics dashboard
  • Locate the dashboard widget showing the ErrImagePull metrics
  • Identify how many errors there are, and what trends are showing.

Are there any pod logs or events to help identify the issue?

$ kubectl -n mynamespace logs pod/mypod
Error from server (BadRequest): container "mycontainer" in pod "mypod-1" is waiting to start: trying and failing to pull image
$ kubectl -n mynamespace describe pod/mypod-1
State: Waiting
Reason: ImagePullBackOff
...
Warning Failed 3m57s (x4 over 5m28s) kubelet Error: ErrImagePull
Warning Failed 3m42s (x6 over 5m28s) kubelet Error: ImagePullBackOff
Normal BackOff 18s (x20 over 5m28s) kubelet Back-off pulling image "failed-image"

What is the docker image reference?

To identify the image references, look at the spec.

kubectl -n mynamespace get pod/mypod-1 -o json | jq -r .spec.containers[].image

Is the docker image available for download, or does it not exist?

Make sure the image exists - it is possible that the image reference is simply incorrect, invalid, or maybe the image or the specific tag does not exist.

$  curl -sL "https://registry.hub.docker.com/v2/repositories/${IMAGE_REF}/tags/" | jq '."results"[]["name"]' -r | sort
latest
tag_1
tag_2
tag_3

Is there an issue with authentication, which is preventing a successful pull?

Check the credentials and the API key. Describing the pod will also be indicative of an authorization failure.

$ kubectl -n mynamespace describe pod/mypod-1
State: Waiting
Reason: ImagePullBackOff
...
Warning Failed 3m57s (x4 over 5m28s) kubelet Error: ErrImagePull
Warning Failed 3m42s (x6 over 5m28s) kubelet Error: ImagePullBackOff
Normal BackOff 18s (x20 over 5m28s) kubelet Back-off pulling image "authorization failed"

Mitigations

  • Can the Development Team rebuild the project to make the docker images available?
  • Can the Skpr Platform Team able to fix the authentication issue using a token, creds or IAM/ECR policy updates