-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Refine the Deployment proposal and move away from hashing #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
0xmichalis
wants to merge
2
commits into
kubernetes:master
from
0xmichalis:update-deployment-proposal
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,14 +3,13 @@ | |
## Abstract | ||
|
||
A proposal for implementing a new resource - Deployment - which will enable | ||
declarative config updates for Pods and ReplicationControllers. | ||
|
||
Users will be able to create a Deployment, which will spin up | ||
a ReplicationController to bring up the desired pods. | ||
Users can also target the Deployment at existing ReplicationControllers, in | ||
which case the new RC will replace the existing ones. The exact mechanics of | ||
replacement depends on the DeploymentStrategy chosen by the user. | ||
DeploymentStrategies are explained in detail in a later section. | ||
declarative config updates for ReplicaSets. Users will be able to create a | ||
Deployment, which will spin up a ReplicaSet to bring up the desired Pods. | ||
Users can also target the Deployment to an existing ReplicaSet either by | ||
rolling back an existing Deployment or creating a new Deployment that can | ||
adopt an existing ReplicaSet. The exact mechanics of replacement depends on | ||
the DeploymentStrategy chosen by the user. DeploymentStrategies are explained | ||
in detail in a later section. | ||
|
||
## Implementation | ||
|
||
|
@@ -33,27 +32,35 @@ type Deployment struct { | |
type DeploymentSpec struct { | ||
// Number of desired pods. This is a pointer to distinguish between explicit | ||
// zero and not specified. Defaults to 1. | ||
Replicas *int | ||
Replicas *int32 | ||
|
||
// Label selector for pods. Existing ReplicationControllers whose pods are | ||
// selected by this will be scaled down. New ReplicationControllers will be | ||
// Label selector for pods. Existing ReplicaSets whose pods are | ||
// selected by this will be scaled down. New ReplicaSets will be | ||
// created with this selector, with a unique label `pod-template-hash`. | ||
// If Selector is empty, it is defaulted to the labels present on the Pod template. | ||
Selector map[string]string | ||
|
||
// A counter that tracks the number of times an update happened in the PodTemplateSpec | ||
// of the Deployment, similarly to how metadata.Generation tracks updates in the Spec | ||
// of all first-class API objects. | ||
TemplateGeneration *int32 | ||
|
||
// Describes the pods that will be created. | ||
Template *PodTemplateSpec | ||
|
||
// The deployment strategy to use to replace existing pods with new ones. | ||
Strategy DeploymentStrategy | ||
|
||
// Minimum number of seconds for which a newly created pod should be ready | ||
// without any of its container crashing, for it to be considered available. | ||
// Defaults to 0 (pod will be considered available as soon as it is ready) | ||
MinReadySeconds int32 | ||
} | ||
|
||
type DeploymentStrategy struct { | ||
// Type of deployment. Can be "Recreate" or "RollingUpdate". | ||
Type DeploymentStrategyType | ||
|
||
// TODO: Update this to follow our convention for oneOf, whatever we decide it | ||
// to be. | ||
// Rolling update config params. Present only if DeploymentStrategyType = | ||
// RollingUpdate. | ||
RollingUpdate *RollingUpdateDeploymentStrategy | ||
|
@@ -65,7 +72,7 @@ const ( | |
// Kill all existing pods before creating new ones. | ||
RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate" | ||
|
||
// Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one. | ||
// Replace the old RSs by new one using rolling update i.e gradually scale down the old RSs and scale up the new one. | ||
RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate" | ||
) | ||
|
||
|
@@ -94,20 +101,15 @@ type RollingUpdateDeploymentStrategy struct { | |
// new RC can be scaled up further, ensuring that total number of pods running | ||
// at any time during the update is atmost 130% of original pods. | ||
MaxSurge IntOrString | ||
|
||
// Minimum number of seconds for which a newly created pod should be ready | ||
// without any of its container crashing, for it to be considered available. | ||
// Defaults to 0 (pod will be considered available as soon as it is ready) | ||
MinReadySeconds int | ||
} | ||
|
||
type DeploymentStatus struct { | ||
// Total number of ready pods targeted by this deployment (this | ||
// includes both the old and new pods). | ||
Replicas int | ||
Replicas int32 | ||
|
||
// Total number of new ready pods with the desired template spec. | ||
UpdatedReplicas int | ||
UpdatedReplicas int32 | ||
} | ||
|
||
``` | ||
|
@@ -116,38 +118,42 @@ type DeploymentStatus struct { | |
|
||
#### Deployment Controller | ||
|
||
The DeploymentController will make Deployments happen. | ||
It will watch Deployment objects in etcd. | ||
For each pending deployment, it will: | ||
The DeploymentController will process Deployments and crud ReplicaSets. | ||
For each creation or update for a Deployment, it will: | ||
|
||
1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector. | ||
- For now, we will do this in the client - list all RCs and then filter the | ||
1. Find all RSs (ReplicaSets) whose label selector is a superset of DeploymentSpec.Selector. | ||
- For now, we will do this in the client - list all RSs and then filter the | ||
ones we want. Eventually, we want to expose this in the API. | ||
2. The new RC can have the same selector as the old RC and hence we add a unique | ||
selector to all these RCs (and the corresponding label to their pods) to ensure | ||
that they do not select the newly created pods (or old pods get selected by | ||
new RC). | ||
- The label key will be "pod-template-hash". | ||
- The label value will be hash of the podTemplateSpec for that RC without | ||
this label. This value will be unique for all RCs, since PodTemplateSpec should be unique. | ||
- If the RCs and pods dont already have this label and selector: | ||
- We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to | ||
2. The new RS can have the same selector as the old RS and hence we need to add a unique label | ||
in the selector of all these RSs (and the corresponding label to their pods) to ensure that | ||
they do not select the newly created pods (or old pods get selected by the new RS). | ||
- The label key will be "controller-uid" similar to the key set in Jobs when job.spec.manualSelector | ||
is unset. | ||
- The label value will be the uid of the RS. | ||
Unlike Jobs, the generated selector will be added by default by the API server to every RS | ||
that is created with an owner reference, hence is not controlled directly by a user. To ensure | ||
that existing Pods will be labeled correctly, the Deployment controller will continue to relabel | ||
ReplicaSets and sync them to use their uids in their selectors and in their existing Pods. | ||
- If the RSs and pods dont already have this label and selector: | ||
- We will first add this to RS.PodTemplateSpec.Metadata.Labels for all RSs to | ||
ensure that all new pods that they create will have this label. | ||
- Then we will add this label to their existing pods and then add this as a selector | ||
to that RC. | ||
3. Find if there exists an RC for which value of "pod-template-hash" label | ||
is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then | ||
this is the RC that will be ramped up. If there is no such RC, then we create | ||
a new one using DeploymentSpec and then add a "pod-template-hash" label | ||
to it. RCSpec.replicas = 0 for a newly created RC. | ||
4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy. | ||
- Raise an event if we detect an error, like new pods failing to come up. | ||
5. Go back to step 1 unless the new RC has been ramped up to desired replicas | ||
and the old RCs have been ramped down to 0. | ||
6. Cleanup. | ||
- Then we will add this label to their existing pods | ||
- Eventually we flip the RS selector to use the new label. | ||
This process potentially can be abstracted to a new endpoint for controllers [1]. | ||
3. Find if there exists an RS with the same PodTemplateSpec such as the PodTemplateSpec of the | ||
Deployment. If it exists already, then this is the RS that will be ramped up. If there is no | ||
such RS, then we create a new one by using TemplateGeneration in some way in its name to ensure | ||
it is a stable name. The size of the new RS depends on the used DeploymentStrategyType. | ||
4. Scale up the new RS and scale down the olds ones as per the DeploymentStrategy. | ||
Raise events appropriately (both in case of failure or success). | ||
5. Go back to step 1 unless the new RS has been ramped up to desired replicas | ||
and the old RSs have been ramped down to 0. | ||
6. Cleanup old RSs as per revisionHistoryLimit. | ||
|
||
DeploymentController is stateless so that it can recover in case it crashes during a deployment. | ||
|
||
[1] See https://github.com/kubernetes/kubernetes/issues/36897 | ||
|
||
### MinReadySeconds | ||
|
||
We will implement MinReadySeconds using the Ready condition in Pod. We will add | ||
|
@@ -159,56 +165,85 @@ https://github.com/kubernetes/kubernetes/issues/11234 tracks updating kubelet | |
and https://github.com/kubernetes/kubernetes/issues/12615 tracks adding | ||
LastTransitionTime to PodCondition. | ||
|
||
### TemplateGeneration | ||
|
||
Hashing an API object such as the PodTemplateSpec means that the resulting hash is subject | ||
to change due to API changes in PodTemplateSpec (including referenced objects) between | ||
minor versions of Kubernetes when new API changes are introduced. A new API field will be | ||
introduced in DeploymentSpec called TemplateGeneration. The new field will be a counter | ||
that will track the number of times an update happened in the PodTemplateSpec of the | ||
Deployment, similarly to how metadata.Generation tracks updates in the Spec of all | ||
first-class API objects. Unlike metadata.Generation, this field can be initialized by users | ||
in order to avoid naming collisions when re-adopting existing RSs. This is similar to how it | ||
is already used by DaemonSets. | ||
|
||
TemplateGeneration is not authoritative and only helps in constructing the name for the new | ||
ReplicaSet in case there is no other ReplicaSet that matches the Deployment. The Deployment | ||
controller still decides what is the new ReplicaSet by comparing PodTemplateSpecs. If no | ||
matching ReplicaSet is found, the controller will try to create a new ReplicaSet using its | ||
current TemplateGeneration. | ||
|
||
The naming scheme used by ReplicaSets (*deployment.name-podtemplatehash*) will need to change | ||
to something different because we cannot migrate old ReplicaSets to use TemplateGeneration | ||
for their names and we want to avoid name collisions. For new RS names, we can either append | ||
the TemplateGeneration to the Deployment name or we can compute the hash of | ||
*deployment.name+deployment.templategeneration*, and use something like | ||
*deployment.name+hash+templateGeneration* for the new RS names. | ||
|
||
|
||
## Changing Deployment mid-way | ||
|
||
### Updating | ||
|
||
Users can update an ongoing deployment before it is completed. | ||
In this case, the existing deployment will be stalled and the new one will | ||
Users can update an ongoing Deployment before it is completed. | ||
In this case, the existing rollout will be stalled and the new one will | ||
begin. | ||
For ex: consider the following case: | ||
- User creates a deployment to rolling-update 10 pods with image:v1 to | ||
For example, consider the following case: | ||
- User updates a Deployment to rolling-update 10 pods with image:v1 to | ||
pods with image:v2. | ||
- User then updates this deployment to create pods with image:v3, | ||
when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC | ||
- User then updates this Deployment to create pods with image:v3, | ||
when the image:v2 RS had been ramped up to 5 pods and the image:v1 RS | ||
had been ramped down to 5 pods. | ||
- When Deployment Controller observes the new deployment, it will create | ||
a new RC for creating pods with image:v3. It will then start ramping up this | ||
new RC to 10 pods and will ramp down both the existing RCs to 0. | ||
- When Deployment Controller observes the new update, it will create | ||
a new RS for creating pods with image:v3. It will then start ramping up this | ||
new RS to 10 pods and will ramp down both the existing RSs to 0. | ||
|
||
### Deleting | ||
|
||
Users can pause/cancel a deployment by deleting it before it is completed. | ||
Recreating the same deployment will resume it. | ||
For ex: consider the following case: | ||
- User creates a deployment to rolling-update 10 pods with image:v1 to | ||
pods with image:v2. | ||
- User then deletes this deployment while the old and new RCs are at 5 replicas each. | ||
User will end up with 2 RCs with 5 replicas each. | ||
User can then create the same deployment again in which case, DeploymentController will | ||
notice that the second RC exists already which it can ramp up while ramping down | ||
Users can cancel a rollout by doing a non-cascading deletion of the Deployment | ||
before it is complete. Recreating the same Deployment will resume it. | ||
For example, consider the following case: | ||
- User creats a Deployment to perform a rolling-update for 10 pods from image:v1 to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/creats/creates |
||
image:v2. | ||
- User then deletes the Deployment while the old and new RSs are at 5 replicas each. | ||
User will end up with 2 RSs with 5 replicas each. | ||
User can then re-create the same Deployment again in which case, DeploymentController will | ||
notice that the second RS exists already which it can ramp up while ramping down | ||
the first one. | ||
|
||
### Rollback | ||
|
||
We want to allow the user to rollback a deployment. To rollback a | ||
completed (or ongoing) deployment, user can create (or update) a deployment with | ||
DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec. | ||
We want to allow the user to rollback a Deployment. To rollback a completed (or | ||
ongoing) Deployment, users can simply use `kubectl rollout undo` or update the | ||
Deployment directly by using its spec.rollbackTo.revision field and specify the | ||
revision they want to rollback to or no revision which means that the Deployment | ||
will be rolled back to its previous revision. | ||
|
||
Rollbacks are going to work the same way both for hashing and TemplateGeneration. | ||
|
||
|
||
## Deployment Strategies | ||
|
||
DeploymentStrategy specifies how the new RC should replace existing RCs. | ||
To begin with, we will support 2 types of deployment: | ||
* Recreate: We kill all existing RCs and then bring up the new one. This results | ||
in quick deployment but there is a downtime when old pods are down but | ||
DeploymentStrategy specifies how the new RS should replace existing RSs. | ||
To begin with, we will support 2 types of Deployment: | ||
* Recreate: We kill all existing RSs and then bring up the new one. This results | ||
in quick Deployment but there is a downtime when old pods are down but | ||
the new ones have not come up yet. | ||
* Rolling update: We gradually scale down old RCs while scaling up the new one. | ||
This results in a slower deployment, but there is no downtime. At all times | ||
during the deployment, there are a few pods available (old or new). The number | ||
of available pods and when is a pod considered "available" can be configured | ||
using RollingUpdateDeploymentStrategy. | ||
|
||
In future, we want to support more deployment types. | ||
* Rolling update: We gradually scale down old RSs while scaling up the new one. | ||
This results in a slower Deployment, but there can be no downtime. Depending on | ||
the strategy parameters, it is possible to have at all times during the rollout | ||
available pods (old or new). The number of available pods and when is a pod | ||
considered "available" can be configured using RollingUpdateDeploymentStrategy. | ||
|
||
## Future | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh ptal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍