Failed to pull Sitecore images on AKS
Last week I tried to spin up my first Sitecore instance on AKS using the latest Container Deployment Package provided by Sitecore. During the process I ran into the problem that the init jobs weren’t able to pull images from Sitecore’s container registry. Both pods (mssql-init and solr-init) logged the following error:
Failed to pull image “scr.sitecore.com/sxp/sitecore-xp1-solr-init:10.0-ltsc2019”: rpc error: code = Unknown desc = Error response from daemon: Get https://scr.sitecore.com/v2/: x509: certificate signed by unknown authority
On the local machine this message normally means that your Docker is running in Linux container mode. Switching to Windows container mode fixes the problem. If you see this error on AKS then it probably means that the init jobs run on the Linux node. There are two potential options how to fix this problem.
Option 1: Use a node selector
Add a node selector to the init yaml files (mssql-init.yaml and solr-init.yaml). This node selector ensures that those pods get created on Windows nodes only:
Make sure you re-apply the init jobs again as described in the “Installation Guide for Production Environment with Kubernetes”:
Thanks to Mihály Árvai for pointing out this solution.
Option 2: Drain Linux node
If the first option didn’t work for you then try to drain the Linux node before running the init jobs. This makes sure that no new pods get created on the Linux node.
Afterwards you can apply the init jobs again as described in the “Installation Guide for Production Environment with Kubernetes”:
When both pods ´mssql-init´ and ´solr-init´ are in state “Completed” you can resume the Linux node again (multiple pods might get created before that, due to errors):