---
title: Azure Machine Pool Machines
authors:
- @devigned
reviewers:
- @CecileRobertMichon
- @nader-ziada
creation-date: 2021-02-22
last-updated: 2021-02-22
status: implementable
see-also:
- https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/819
- https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1067
---
- Summary
- Motivation
- Proposal
- Available Options
- Conclusions
- Additional Details
- Implementation History
Azure MachinePool currently embeds the state of each of the instances in the MachinePool within the status of the Azure MachinePool. MachinePool instances should be their own resources to enable individual lifecycles.
By giving each AzureMachinePoolMachine an individual lifecycle, a user would be able to inform CAPZ of the specific instance to delete and then have the AzureMachinePoolMachine controller cordon and drain the node prior to deleting the underlying infrastructure.
- Be able to delete specific AzureMachinePool instances
- Rolling updates with max unavailable and max surge
- MaxUnavailable is the max number of machines that are allowed to be unavailable at any time
- MaxSurge is the number of machines to surge, add to the current replica count, during an upgrade of the VMSS model
- Safely update by cordoning and draining nodes prior to deleting the underlying infrastructure
- Be able to take advantage of Azure's Virtual Machine Scale Set Update Instance API to in-place update a VMSS instance rather than delete and recreate the infrastructure, which would result in a much quicker upgrade.
- Create a CAPI Machine owner for each AzureMachinePoolMachine
- Implementing different roll out and scale down strategies
- Adopting individual Machine instances to be managed by the MachinePool
- Create or use an on instance agent to cordon and drain in response to Azure Virtual Machine Scale Sets provide terminate notifications
Azure Virtual Machine Scale Sets provide terminate notifications. These terminate notifications would be helpful to inform Kubernetes when a node is going to be deleted. Unfortunately, terminate notifications do not provide notifications when an instance is Updated, in this case "Updated" means the instance is reimaged to match the updated VMSS model by using the Update Instance API. If a VMSS instance were to be reimaged, rather than deleted and recreated the instance will not receive a notification. Due to the design of terminate notifications the CAPZ controller needs to alert Kubernetes when an instance is being Updated. Without some way to inform Kubernetes of the specific instance that is to be updated, the underlying infrastructure may be removed before workloads can be safely migrated from the machine / node. By managing the lifecycle from CAPZ, we are able to safely delete / upgrade machines / nodes.
In the future, it would be useful to integrate awesomenix/drainsafe or something similar to handle scenarios when Azure will delete or migrate a VMSS instance. Two scenarios come to mind.
- VMSS is configured to use Spot instances and Azure must evict an instance.
- Azure must perform maintenance on an instance.
Alex is an engineer in a large organization which has a MachinePool running 1.18.x and would like to upgrade the MachinePool 1.19.x. It is important to Alex that the MachinePool doesn't experience downtime during the upgrade. Alex has set the MaxUnavailable and MaxSurge values on the AzureMachinePool to limit the number of machines that will be unavailable during the upgrade, and the number of extra machines VMSS will add during upgrade. The MachinePool upgrades each machine in the pool by first cordoning and draining, then replacing the machine in the pool.
Alex is an engineer in a large organization which has a MachinePool running. Alex has too many nodes running on the cluster and would like to reduce the replicas. It is important to Alex that the MachinePool doesn't experience downtime. Alex decreases the replica count of the MachinePool by 2. The MachinePool deletes 2 machines from the pool by first cordoning and draining, then deleting the underlying infrastructure.
Alex is an engineer in a large organization which has a MachinePool running with 5 replicas. Alex would like to delete a
specific MachinePool machine. It is important to Alex that the MachinePool doesn't experience downtime while deleting
the individual machine. Alex uses kubectl
to delete the specific MachinePool machine resource. The MachinePool machine
is cordoned and drained, then the underlying infrastructure is deleted. The MachinePool still has a replica count of 5,
but only has 4 running replicas. The MachinePool creates a new machine to take the place of the deleted instance.
FR1. CAPZ MUST support deleting an individual Virtual Machine Scale Set instance.
FR2. CAPZ SHOULD support cordon and draining workload from a Virtual Machine Scale Set instance.
FR3. CAPZ SHOULD support updating an instance in-place using Virtual Machine Scale Set Update API
NFR1. CAPZ SHOULD provide resource status updates as the Azure resources are provisioned
NFR2. CAPZ SHOULD not overwhelm Azure API request limits and should rate limit reconciliation cycles
NFR3. Unit tests MUST exist for upgrade and delete instance selection
NFR4. e2e tests MUST exist for MachinePool upgrade, scale up / down, and instance delete scenarios
The current implementation of CAPZ AzureMachinePool embeds the state of each of the instances in the Scale Set within the status of the AzureMachinePool.
// AzureMachinePoolStatus defines the observed state of AzureMachinePool
AzureMachinePoolStatus struct {
/*
Other fields omitted for brevity
*/
// Instances is the VM instance status for each VM in the VMSS
// +optional
Instances []*AzureMachinePoolInstanceStatus `json:"instances,omitempty"`
}
// AzureMachinePoolInstanceStatus provides status information for each instance in the VMSS
AzureMachinePoolInstanceStatus struct {
// Version defines the Kubernetes version for the VM Instance
// +optional
Version string `json:"version"`
// ProvisioningState is the provisioning state of the Azure virtual machine instance.
// +optional
ProvisioningState *infrav1.VMState `json:"provisioningState"`
// ProviderID is the provider identification of the VMSS Instance
// +optional
ProviderID string `json:"providerID"`
// InstanceID is the identification of the Machine Instance within the VMSS
// +optional
InstanceID string `json:"instanceID"`
// InstanceName is the name of the Machine Instance within the VMSS
// +optional
InstanceName string `json:"instanceName"`
// LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes
// the image version the VM is running. If the instance is not running the latest model, it means the instance
// may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated.
LatestModelApplied bool `json:"latestModelApplied"`
}
These are included here to provide a description of the structures as they exist in CAPI and will be leveraged to extend AzureMachinePool. There are no changes to these structures. They are simply for reference.
// MachineDeploymentStrategy describes how to replace existing machines with new ones.
type MachineDeploymentStrategy struct {
// Type of deployment. Currently the only supported strategy is
// "RollingUpdate".
// Default is RollingUpdate.
// +optional
Type MachineDeploymentStrategyType `json:"type,omitempty"`
// Rolling update config params. Present only if
// MachineDeploymentStrategyType = RollingUpdate.
// +optional
RollingUpdate *MachineRollingUpdateDeployment `json:"rollingUpdate,omitempty"`
}
// MachineRollingUpdateDeployment is used to control the desired behavior of rolling update.
type MachineRollingUpdateDeployment struct {
// The maximum number of machines that can be unavailable during the update.
// Value can be an absolute number (ex: 5) or a percentage of desired
// machines (ex: 10%).
// Absolute number is calculated from percentage by rounding down.
// This can not be 0 if MaxSurge is 0.
// Defaults to 0.
// Example: when this is set to 30%, the old MachineSet can be scaled
// down to 70% of desired machines immediately when the rolling update
// starts. Once new machines are ready, old MachineSet can be scaled
// down further, followed by scaling up the new MachineSet, ensuring
// that the total number of machines available at all times
// during the update is at least 70% of desired machines.
// +optional
MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`
// The maximum number of machines that can be scheduled above the
// desired number of machines.
// Value can be an absolute number (ex: 5) or a percentage of
// desired machines (ex: 10%).
// This can not be 0 if MaxUnavailable is 0.
// Absolute number is calculated from percentage by rounding up.
// Defaults to 1.
// Example: when this is set to 30%, the new MachineSet can be scaled
// up immediately when the rolling update starts, such that the total
// number of old and new machines do not exceed 130% of desired
// machines. Once old machines have been killed, new MachineSet can
// be scaled up further, ensuring that total number of machines running
// at any time during the update is at most 130% of desired machines.
// +optional
MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`
// DeletePolicy defines the policy used by the MachineDeployment to identify nodes to delete when downscaling.
// Valid values are "Random, "Newest", "Oldest"
// When no value is supplied, the default DeletePolicy of MachineSet is used
// +kubebuilder:validation:Enum=Random;Newest;Oldest
// +optional
DeletePolicy *string `json:"deletePolicy,omitempty"`
}
The proposed changes below show the CAPZ AzureMachinePool and AzureMachinePoolMachine.
const azureMachinePoolUpdateInstanceAnnotation = "azuremachinepool.infrastructure.cluster.x-k8s.io/updateInstance"
type AzureMachinePoolSpec struct {
// The deployment strategy to use to replace existing machines with
// new ones.
// +optional
Strategy MachineDeploymentStrategy `json:"strategy,omitempty"`
// NodeDrainTimeout is the total amount of time that the controller will spend on draining a node.
// The default value is 0, meaning that the node can be drained without any time limitations.
// NOTE: NodeDrainTimeout is different from `kubectl drain --timeout`
// +optional
NodeDrainTimeout *metav1.Duration `json:"nodeDrainTimeout,omitempty"`
}
// AzureMachinePoolMachineSpec defines the desired state of AzureMachinePoolMachine
type AzureMachinePoolMachineSpec struct {
// ProviderID is the identification ID of the Virtual Machine Scale Set
ProviderID string `json:"providerID"`
}
// AzureMachinePoolMachineStatus defines the observed state of AzureMachinePoolMachine
type AzureMachinePoolMachineStatus struct {
// NodeRef will point to the corresponding Node if it exists.
// +optional
NodeRef *corev1.ObjectReference `json:"nodeRef,omitempty"`
// Version defines the Kubernetes version for the VM Instance
// +optional
Version string `json:"version"`
// ProvisioningState is the provisioning state of the Azure virtual machine instance.
// +optional
ProvisioningState *infrav1.VMState `json:"provisioningState"`
// InstanceID is the identification of the Machine Instance within the VMSS
InstanceID string `json:"instanceID"`
// InstanceName is the name of the Machine Instance within the VMSS
// +optional
InstanceName string `json:"instanceName"`
// FailureReason will be set in the event that there is a terminal problem
// reconciling the MachinePool machine and will contain a succinct value suitable
// for machine interpretation.
//
// Any transient errors that occur during the reconciliation of MachinePools
// can be added as events to the MachinePool object and/or logged in the
// controller's output.
// +optional
FailureReason *string `json:"failureReason,omitempty"`
// FailureMessage will be set in the event that there is a terminal problem
// reconciling the MachinePool and will contain a more verbose string suitable
// for logging and human consumption.
//
// Any transient errors that occur during the reconciliation of MachinePools
// can be added as events to the MachinePool object and/or logged in the
// controller's output.
// +optional
FailureMessage *string `json:"failureMessage,omitempty"`
// Conditions defines current service state of the AzureMachinePool.
// +optional
Conditions clusterv1.Conditions `json:"conditions,omitempty"`
// LongRunningOperationState saves the state for an Azure long running operations so it can be continued on the
// next reconciliation loop.
// +optional
LongRunningOperationState *infrav1.Future `json:"longRunningOperationState,omitempty"`
// LatestModelApplied indicates the instance is running the most up-to-date VMSS model. A VMSS model describes
// the image version the VM is running. If the instance is not running the latest model, it means the instance
// may not be running the version of Kubernetes the Machine Pool has specified and needs to be updated.
LatestModelApplied bool `json:"latestModelApplied"`
// Ready is true when the provider resource is ready.
// +optional
Ready bool `json:"ready"`
}
- Create a new AzureMachinePoolMachine controller.
- Remove VMSS instance status tracking logic from AzureMachinePool controller and moving it to AzureMachinePoolMachine controller.
- Introduce rate limiting behavior to AzureMachinePool* controllers to ensure Azure API limits are not exceeded.
Currently in CAPZ, the AzureMachinePool controller is responsible for both the Virtual Machine Scale Set (VMSS) and the instances created by the VMSS. The proposed change would separate the responsibility of managing the state of the VMSS and the instances created by the VMSS. This would introduce a new AzureMachinePoolMachine controller and a new MachinePoolMachineScope. The responsibilities would be as follows.
AzureMachinePool Responsibilities:
- Create AzureMachinePoolMachine instances when a new VMSS instance is observed. The AzureMachinePoolMachine spec should
have the
ProviderID
field set with the observed resource ID. The AzureMachinePool should also be added to the AzureMachinePoolMachine's OwnerReferences. - Selection of AzureMachinePoolMachine instances for deletion or upgrade. When a change to the AzureMachinePool model
occurs, the
MachinePoolScope
will be responsible for coordinating the rollout of the updated model by selecting AzureMachinePoolMachines to delete or upgrade with respect to MaxUnavailable and the DeletePolicy. - Scale up: AzureMachinePool should increase the number of VMSS replicas if the replica count increases on MachinePool
- Scale down: AzureMachinePool should select and delete AzureMachinePoolMachines that are overprovisioned with respect to MaxUnavailable and DeletePolicy from the proposed MachinePool Strategy.
- Upgrade: AzureMachinePool should select the AzureMachinePoolMachines to upgrade, set the
azureMachinePoolUpdateInstanceAnnotation
on the AzureMachinePoolMachine and wait for the annotation to be removed before proceeding with the rolling upgrade. - Clean up. When a AzureMachinePoolMachine is no longer in the list of instances in Azure, but a matching AzureMachinePoolMachine resource exists, delete the AzureMachinePoolMachine.
AzureMachinePoolMachine Responsibilities:
- Update Azure Provisioning State: when creating a new VMSS instance, the AzureMachinePoolMachine controller will poll the Azure API until the instance reaches a terminal state.
- Cordon and Drain: when deleting or upgrading the AzureMachinePoolMachine resource, the AzureMachinePoolMachine controller is responsible for ensuring workload is moved from the node prior to removing the underlying Azure infrastructure.
- NodeRef: as a VMSS instance joins the cluster, the AzureMachinePoolMachine controller is responsible for ensuring the node is found and ready before marking the AzureMachinePoolMachine resource as ready.
- Upgrade: The AzureMachinePoolMachine is responsible for removing the
azureMachinePoolUpdateInstanceAnnotation
upon successful instance upgrade.
Create annotations on AzureMachinePool resources to indicate which machine should be upgraded next or deleted.
- No custom resource schema changes would be needed
- Would enable a user to provide input to the help the controller to decide the next machine to delete / upgrade
- Annotations don't have strong schema
- Controller would be dependent on the application of annotations to inform machine selection, which could be error prone and brittle.
- Each machine lifecycle will need to be embedded in the status of the AzureMachinePool to enable cordon and drain
Introduce a new custom resource, AzureMachinePoolMachine, to represent AzureMachinePool instances rather than persisting
each instance status in the AzureMachinePool.Status.Instances
- Allows for easier tracking of state of individual AzureMachinePool instances via their own resource
- Each AzureMachinePoolMachine can be responsible for their own lifecycle, decomposing the logic in the controllers
- Would enable a user to interact with an AzureMachinePoolMachine the same way they would any other machine
- Breaking change to the status of the AzureMachinePool by removing the instances array
Separate AzureMachinePool and AzureMachinePoolMachine resources provide a reasonable way to break down concerns and offer the functionality to enable safe rolling upgrades and individual instance deletion.
- Unit tests to validate the proper selection of VMSS nodes to delete / upgrade
- Unit tests for the new MachinePoolMachineScope
- e2e tests for upgrade, scale down / up, and instance delete
- 2021/02/22: Initial proposal
- 2021/01/06: Initial PR opened #1105