Skip to content

machine pool surge, max unavailable and instance delete #1105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

devigned
Copy link
Contributor

@devigned devigned commented Jan 6, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduces API changes to AzureMachinePool to enable users of CAPZ to perform safe, fast rolling upgrades building off of #1067. In this change set MaxSurge and MaxUnavailable fields are introduced to the AzureMachinePool spec.

  • MaxSurge enables machine pools to over-provision machines, increase the number of machines above the desired count, during an upgrade, which would allow faster upgrades at the cost of Azure compute core quota.
  • MaxUnavailable enables one to specify how many nodes must be available during a rolling upgrade.
  • Instance delete enables an individual to delete Azure Machine Pool Machines individually and controller initiated node drain. As part of the instance delete / node drain state tracking for AzureMachinePoolMachines, it is advantageous to track state on these resources individually. That is why in the PR, the AzureMachinePool.Status.Instances array is removed in favor of AzureMachinePoolMachine resources.
  • Node drain will be completed in a subsequent PR

⚠️ Breaking out AzureMachinePool instances into individual AzureMachinePoolMachine resources will be a breaking change in the experimental API.

This PR is work in progress. Please provide early feedback.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #819

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Introduce maxUnavailable and maxSurge to `AzureMachinePool` and remove `AzureMachinePool.Status.Instances` in favor of representing machine pool machines as individual resources of type `AzureMachinePoolMachine`. This is a breaking change to the experimental API.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 6, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign devigned after the PR has been reviewed.
You can assign the PR to them by writing /assign @devigned in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jan 6, 2021
@devigned devigned changed the title [WIP] machine pool surge, max unavailable and instance delete [WIP] ⚠️ machine pool surge, max unavailable and instance delete Jan 6, 2021
@devigned devigned changed the title [WIP] ⚠️ machine pool surge, max unavailable and instance delete [WIP] machine pool surge, max unavailable and instance delete Jan 6, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 23, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2021
@devigned devigned force-pushed the surge branch 4 times, most recently from 4a05556 to 25ae039 Compare February 4, 2021 22:32
@devigned devigned force-pushed the surge branch 2 times, most recently from bb0c14e to 07e5a09 Compare February 12, 2021 14:02
@@ -154,7 +154,7 @@ def capz():
local_resource(
"manager",
cmd = 'mkdir -p .tiltbuild;CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags \'-extldflags "-static"\' -o .tiltbuild/manager',
deps = ["api", "cloud", "config", "controllers", "exp", "feature", "pkg", "go.mod", "go.sum", "main.go"]
deps = ["api", "azure", "config", "controllers", "exp", "feature", "pkg", "go.mod", "go.sum", "main.go"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably pull this out into another PR, but what's 1 line on 4k+ 😞

// NewCoalescingReconciler returns a reconcile wrapper that will delay new reconcile.Requests
// after the cache expiry of the request string key.
// A successful reconciliation is defined as as one where no error is returned
func NewCoalescingReconciler(upstream reconcile.Reconciler, cache *CoalescingRequestCache, log logr.Logger) reconcile.Reconciler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wraps the AzureMachinePool and AzureMachinePoolMachine reconcilers so that they debounce, the reconcilers rate limit the incoming events so they only do so many within a window of time to not overwhelm Azure API limits.

There is no good way to do this in controller-runtime. I'll intro a proposal over there for middleware or incoming rate limiting.

// inform the controller that if the parent MachinePool.Spec.Template.Spec.Version field is updated, this image
// will be updated with the corresponding default image. If Defaulted is set to false, the controller will not
// update the image reference when the MachinePool.Spec.Template.Spec.Version changes.
AzureDefaultingImage struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought we should be explicit about image defaulting. This enables users to use default image versions for K8s versions specified on MachinePools while being safe to specify their own without updates to MachinePools overriding their image reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea. Should these AzureDefaultingImage changes be part of a different PR instead?

// 2) over-provisioned machines prioritized by out of date models first
// 3) over-provisioned ready machines
// 4) ready machines within budget which have out of date models
func (m *MachinePoolScope) selectMachinesToDelete(machinesByProviderID map[string]infrav1exp.AzureMachinePoolMachine) map[string]infrav1exp.AzureMachinePoolMachine {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains the logic for selecting machines to delete when over-provisioned or upgrading. The AzureMachinePool informs the AzureMachinePoolMachine to delete and the AzureMachinePoolMachine is expected to safely remove itself from the pool.

return ampml.Items, nil
}

func (m *MachinePoolScope) applyAzureMachinePoolMachines(ctx context.Context) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains the logic to compare the state of Azure VMSS with the state of AzureMachinePool(Machine)s. If a machine exists in Azure, a AzureMachinePoolMachine is created. If an AzureMachinePoolMachine exists, but doesn't have an Azure counterpart, it is asked to delete. At the end, the upgrade / over-provision state is evaluated and machines can be deleted if the state requires.

@devigned
Copy link
Contributor Author

@nader-ziada and @CecileRobertMichon I believe this PR is now ready for review. I'm so sorry of the enormity of the change set. With that in mind, I'm going to introduce AzureMachinePoolMachine cordon and drain functionality to a subsequent PR.

One design aspect I've vacillated on is how to represent MaxSurge and MaxUnavailable. Right now, they are on the top level spec for AzureMachinePool. Would it be better to use a rollout strategy structure similar to (or the same type as) kubernetes-sigs/cluster-api#4073?

@CecileRobertMichon
Copy link
Contributor

@devigned I've been thinking about this and I think we need a proposal to explain what this does, why, what future work is planned and what alternatives were considered (and why they don't work). This is a pretty big PR and it's not obvious why we're doing upgrade this way, so we should document it for the record.

Also, it can serve as developer-facing documentation once the feature is merged. What do you think?

Copy link
Contributor

@nader-ziada nader-ziada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the changes and it all makes sense, but I have a comment about the structure of the code, the main logic that figures out what machines to create/delete is in the scope instead of in the controller, even setting the conditions, which is different that how we have done things in other places, was this by design?

@devigned
Copy link
Contributor Author

I went through the changes and it all makes sense, but I have a comment about the structure of the code, the main logic that figures out what machines to create/delete is in the scope instead of in the controller, even setting the conditions, which is different that how we have done things in other places, was this by design?

It was a conscious decision. It was a bit of an experiment to see how it would turn out.

  • Scope should be responsible for updating the K8s state upon closing the scope.
  • Controller / Reconciler is responsible for creating the Scope, calling services, responding to errors in reconciliation, and closing the Scope.
  • Services are responsible for manipulating external state and updating the Scope with the external state.

That was a perceptive review. Thank you. WDYT?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2021
@k8s-ci-robot
Copy link
Contributor

@devigned: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nader-ziada
Copy link
Contributor

  • Scope should be responsible for updating the K8s state upon closing the scope.
  • Controller / Reconciler is responsible for creating the Scope, calling services, responding to errors in reconciliation, and closing the Scope.
  • Services are responsible for manipulating external state and updating the Scope with the external state.

That was a perceptive review. Thank you. WDYT?

I don't feel strongly about it, but I expected the controller / reconciler to be responsible for also updating the status on the k8s resources, which would include setting the condition.


// Reconcile idempotently gets, creates, and updates a scale set.
func (s *Service) Reconcile(ctx context.Context) error {
ctx, span := tele.Tracer().Start(ctx, "scalesets.Service.Reconcile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, span := tele.Tracer().Start(ctx, "scalesets.Service.Reconcile")
ctx, span := tele.Tracer().Start(ctx, "scalesetvms.Service.Reconcile")

}
}

// Reconcile idempotently gets, creates, and updates a scale set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be changed. Maybe something like "Reconcile fetches the latest data about the scaleset instance".

)

type (
// ScaleSetVMScope defines the scope interface for a scale sets service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ScaleSetVMScope defines the scope interface for a scale sets service.
// ScaleSetVMScope defines the scope interface for a scaleset vms service.


// Delete deletes a scaleset instance asynchronously returning a future which encapsulates the long running operation.
func (s *Service) Delete(ctx context.Context) error {
ctx, span := tele.Tracer().Start(ctx, "scalesets.Service.Delete")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, span := tele.Tracer().Start(ctx, "scalesets.Service.Delete")
ctx, span := tele.Tracer().Start(ctx, "scalesetvms.Service.Delete")

@@ -271,3 +317,108 @@ func AzureClusterToAzureMachinePoolsFunc(kClient client.Client, log logr.Logger)
return result
}
}

// AzureMachinePoolToAzureMachinePoolMachines maps an AzureMachinePool to it's child AzureMachinePoolMachines through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// AzureMachinePoolToAzureMachinePoolMachines maps an AzureMachinePool to it's child AzureMachinePoolMachines through
// AzureMachinePoolToAzureMachinePoolMachines maps an AzureMachinePool to its child AzureMachinePoolMachines through

// Defaulted informs the controller that the image reference was defaulted so that it can be updated by changes
// to the MachinePool.Spec.Template.Spec.Version field by default.
// +optional
Defaulted bool `json:"defaulted,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case Defaulted means not only that the value was defaulted but that the Image will be managed by the controller. What do you think about calling this field Managed instead? IMHO the word better conveys the behavior. I believe managed is also how we refer to resources managed by the controllers in the code base.

c, err := ctrl.NewControllerManagedBy(mgr).
WithOptions(options.Options).
For(&infrav1exp.AzureMachinePoolMachine{}).
WithEventFilter(predicates.ResourceNotPaused(log)). // don't queue reconcile if resource is paused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
WithEventFilter(predicates.ResourceNotPaused(log)). // don't queue reconcile if resource is paused
WithEventFilter(predicates.ResourceNotPausedAndHasFilterLabel(ctrl.LoggerFrom(ctx), r.WatchFilterValue)).

Comment on lines +73 to +81
func NewAzureMachinePoolMachineController(c client.Client, log logr.Logger, recorder record.EventRecorder, reconcileTimeout time.Duration) *AzureMachinePoolMachineController {
return &AzureMachinePoolMachineController{
Client: c,
Log: log,
Recorder: recorder,
ReconcileTimeout: reconcileTimeout,
reconcilerFactory: newAzureMachinePoolMachineReconciler,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func NewAzureMachinePoolMachineController(c client.Client, log logr.Logger, recorder record.EventRecorder, reconcileTimeout time.Duration) *AzureMachinePoolMachineController {
return &AzureMachinePoolMachineController{
Client: c,
Log: log,
Recorder: recorder,
ReconcileTimeout: reconcileTimeout,
reconcilerFactory: newAzureMachinePoolMachineReconciler,
}
}
func NewAzureMachinePoolMachineController(c client.Client, log logr.Logger, recorder record.EventRecorder, reconcileTimeout time.Duration, watchFilterValue string) *AzureMachinePoolMachineController {
return &AzureMachinePoolMachineController{
Client: c,
Log: log,
Recorder: recorder,
ReconcileTimeout: reconcileTimeout,
reconcilerFactory: newAzureMachinePoolMachineReconciler,
WatchFilterValue: watchFilterValue,
}
}

Log logr.Logger
Scheme *runtime.Scheme
Recorder record.EventRecorder
ReconcileTimeout time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ReconcileTimeout time.Duration
ReconcileTimeout time.Duration
WatchFilterValue string

machineScope.SetFailureReason(capierrors.UpdateMachineError)
machineScope.SetFailureMessage(errors.Errorf("Azure VM state is %s", state))
case infrav1.VMStateDeleting:
// for some reason, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the comment got truncated

@@ -97,6 +97,30 @@ spec:
location:
description: Location is the Azure region location e.g. westus2
type: string
maxSurge:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maxSurge field in the k8s Deployment object accepts an absolute number but also a percentage. Should we support percentage as well? I believe it improves the UX as the user doesn't have to adapt the maxSurge value when scaling up/down their VMSS. If we decide to support percentages, that would mean changing the type of the field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return vm, nil
}

// DeleteAsync is the operation to delete a virtual machine scale set asynchronously. DeleteAsync sends a DELETE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DeleteAsync is the operation to delete a virtual machine scale set asynchronously. DeleteAsync sends a DELETE
// DeleteAsync is the operation to delete a virtual machine scale set vm asynchronously. DeleteAsync sends a DELETE

//
// Parameters:
// resourceGroupName - the name of the resource group.
// vmssName - the name of the VM scale set to create or update. parameters - the scale set object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// vmssName - the name of the VM scale set to create or update. parameters - the scale set object.
// vmssName - the name of the VM scale set to create or update. parameters - the scale set object.
// instanceID - the ID of the VM scale set VM.

// resourceGroupName - the name of the resource group.
// vmssName - the name of the VM scale set to create or update. parameters - the scale set object.
func (ac *azureClient) DeleteAsync(ctx context.Context, resourceGroupName, vmssName, instanceID string) (*infrav1.Future, error) {
ctx, span := tele.Tracer().Start(ctx, "scalesets.AzureClient.DeleteAsync")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, span := tele.Tracer().Start(ctx, "scalesets.AzureClient.DeleteAsync")
ctx, span := tele.Tracer().Start(ctx, "scalesetvms.AzureClient.DeleteAsync")


// Get retrieves the Virtual Machine Scale Set Virtual Machine
func (ac *azureClient) Get(ctx context.Context, resourceGroupName, vmssName, instanceID string) (compute.VirtualMachineScaleSetVM, error) {
ctx, span := tele.Tracer().Start(ctx, "scalesets.AzureClient.Get")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, span := tele.Tracer().Start(ctx, "scalesets.AzureClient.Get")
ctx, span := tele.Tracer().Start(ctx, "scalesetvms.AzureClient.Get")

@devigned
Copy link
Contributor Author

I'm going to close this PR and open a new PR with an updated branch based off or the proposal in #1191. I think it will help us to apply fresh eyes and a clean slate.

/close

@k8s-ci-robot
Copy link
Contributor

@devigned: Closed this PR.

In response to this:

I'm going to close this PR and open a new PR with an updated branch based off or the proposal in #1191. I think it will help us to apply fresh eyes and a clean slate.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MachinePool K8s upgrade should provide MinAvailable, MaxSurge, with safe rolling updates
5 participants