How to do a canary upgrade to existing istio customised setup.
Requirement:
- We have existing customised setup of istio 1.7.3 (installed using istoctl method and no revision set for this) for AKS 1.18.14.
- Now we need to upgrade to istio 1.8 with no downtime or minimal.
- The upgrade should be safer and it wont break our prod environemnt in any ways.
How we installed the current istio customised environment:
1) created manifest.
istioctl manifest generate --set profile=default -f /manifests/overlay/overlay.yaml > $HOME/generated-manifest.yaml
2) installed istio.
istioctl install --set profile=default -f /manifests/overlay/overlay.yaml
3) Verified istio against the deployed manifest.
istioctl verify-install -f $HOME/generated-manifest.yaml
Planned upgrade process Reference
1) Precheck for upgrade.
istioctl x precheck
2) export the current used configuration of istio using below command to a yaml file.
kubectl -n istio-system get iop installed-state-install -o yaml > /tmp/iop.yaml
3) Download istio 1.8 binary and extract the directory and navigate the directory to where we have the 1.8 version istioctl binary.
cd istio1.8\istioctl1.8
4) from the new version istio directory, create a new controlplane for istio(1.8) with proper revision set and use the previously exported installed-state "iop.yaml".
./istioctl1.8 install --set revision=1-8 --set profile=default -f /tmp/iop.yaml
Expect that it will create new control plane with existing costamised configuration and now we will have two control plane deployments and services running side-by-side:
$ kubectl get pods -n istio-system -l app=istiod
NAME READY STATUS RESTARTS AGE
istiod-786779888b-p9s5n 1/1 Running 0 114m
istiod-1-7-6956db645c-vwhsk 1/1 Running 0 1m
5) After this, we need to change the existing label of all our cluster namespaces where we need to inject the istio proxy containers. Need to remove the old "istio-injection" label, and add the istio.io/rev label to point to the canary revision 1-8.
$ kubectl label namespace test-ns istio-injection- istio.io/rev=1-8
Hope, at this point also the environment is stable with old istio configurations and we can make decision on which app pods can be restarted to make the new control plane changes as per our downtime, and its allowed to run some apps with older control plane and another with new controller plane configs t this point.
eg: kubectl rollout restart deployment -n test-ns (first)
kubectl rollout restart deployment -n test-ns2 ( later)
kubectl rollout restart deployment -n test-ns3 (again after sometieme later)
6) Once we planed for downtime and restarted the deployments as we decided, confirm all the pods are now using dataplane proxy injector of version 1.8 only
kubectl get pods -n test-ns -l istio.io/rev=1-8
7) To verify that the new pods in the test-ns namespace are using the istiod-canary service corresponding to the canary revision
istioctl proxy-status | grep ${pod_name} | awk '{print $7}'
8) After upgrading both the control plane and data plane, can uninstall the old control plane
istioctl x uninstall -f /tmp/iop.yaml.
Need to clear below points before upgrade.
- Are all the steps prepared for the upgrade above are good to proceed for highly used Prod environment ?
- By exporting the installed state iop is enough to get all customised step to proceed the canary upgrade? or is there any chance of braking the upgrade or missing any settings?
- Whether the step 4 above will create the 1.8 istio control plane with all the customization as we already have without any break or missing something ?
- after the step 4, do we need to any any extra configuration related to istiod service configuration> the followed document is not clear about that,
- for the step 5 above, how we can identy all the namespaces, where we have the istio-injection enabled and only modify those namespace alone and leave others as it was before?
- so for the step 8 above, how to ensure we are uninstalling old control plane only ? we have to get the binary for old controlplane say (1.7 in my case)and use that binary with same exported /tmp/iop.yaml ?
- No Idea about how to rollback any issues happened in between.. before or after the old controlplane deleted