Skip to content

Triaging Image Pull Errors

Problem identification

In order to identify the affected resources, you can either filter by ErrImagePull or ImagePullBackOff, which is the state that appears after the resource stops trying to pull the image after a given threshold of times on a configured interval. You can do all of this using kubectl.

$ kubectl get all --all-namespaces | grep 'ErrImagePull|ImagePullBackOff'
NAMESPACE    NAME          READY   STATUS             RESTARTS   AGE
mynamespace  pod/mypod-1   0/1     ErrImagePull       0          8m53s
mynamespace  pod/mypod-2   0/1     ImagePullBackOff   0          8h

Hypothesis

  • The docker image may not be available for download, or it may not exist.
  • There may be an issue with authentication, which is preventing a successful pull.

Debugging

Reviewing the Cluster Metrics Dashboard

Reviewing this dashboard will give the user an understanding of the scope of the impact. Is it just one metric data-point or is it many; are there trends or other considerations to be made?

  • Log into the AWS Console
  • Navigate to CloudWatch
  • Browse the Dashboards Section
  • Locate the cluster metrics dashboard
  • Locate the dashboard widget showing the ErrImagePull metrics
  • Identify how many errors there are, and what trends are showing.

Are there any pod logs or events to help identify the issue?

$ kubectl -n mynamespace logs pod/mypod
Error from server (BadRequest): container "mycontainer" in pod "mypod-1" is waiting to start: trying and failing to pull image
$ kubectl -n mynamespace describe pod/mypod-1
State:          Waiting
Reason:       ImagePullBackOff
...
Warning  Failed     3m57s (x4 over 5m28s)  kubelet            Error: ErrImagePull
Warning  Failed     3m42s (x6 over 5m28s)  kubelet            Error: ImagePullBackOff
Normal   BackOff    18s (x20 over 5m28s)   kubelet            Back-off pulling image "failed-image"

What is the docker image reference?

To identify the image references, look at the spec.

kubectl -n mynamespace get pod/mypod-1 -o json | jq -r .spec.containers[].image

Is the docker image available for download, or does it not exist?

Make sure the image exists - it is possible that the image reference is simply incorrect, invalid, or maybe the image or the specific tag does not exist.

$  curl -sL "https://registry.hub.docker.com/v2/repositories/${IMAGE_REF}/tags/" | jq '."results"[]["name"]' -r | sort
latest
tag_1
tag_2
tag_3

Is there an issue with authentication, which is preventing a successful pull?

Check the credentials and the API key. Describing the pod will also be indicative of an authorization failure.

$ kubectl -n mynamespace describe pod/mypod-1
State:          Waiting
Reason:       ImagePullBackOff
...
Warning  Failed     3m57s (x4 over 5m28s)  kubelet            Error: ErrImagePull
Warning  Failed     3m42s (x6 over 5m28s)  kubelet            Error: ImagePullBackOff
Normal   BackOff    18s (x20 over 5m28s)   kubelet            Back-off pulling image "authorization failed"

Mitigations

  • Can the Development Team rebuild the project to make the docker images available?
  • Can the Skpr Platform Team able to fix the authentication issue using a token, creds or IAM/ECR policy updates