CompositeController
CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects based on the desired state specified in a parent object.
Workload controllers like Deployment and StatefulSet are examples of existing controllers that fit this pattern.
This page is a detailed reference of all the features available in this API. See the Create a Controller guide for a step-by-step walkthrough.
Example
This example CompositeController defines a controller that behaves like StatefulSet.
apiVersion: metacontroller.k8s.io/v1alpha1
kind: CompositeController
metadata:
name: catset-controller
spec:
parentResource:
apiVersion: ctl.enisoc.com/v1
resource: catsets
revisionHistory:
fieldPaths:
- spec.template
childResources:
- apiVersion: v1
resource: pods
updateStrategy:
method: RollingRecreate
statusChecks:
conditions:
- type: Ready
status: "True"
- apiVersion: v1
resource: persistentvolumeclaims
hooks:
sync:
webhook:
url: http://catset-controller.metacontroller/sync
timeout: 10s
Spec
A CompositeController spec
has the following fields:
Field | Description |
---|---|
parentResource | A single resource rule specifying the parent resource. |
childResources | A list of resource rules specifying the child resources. |
resyncPeriodSeconds | How often, in seconds, you want every parent object to be resynced (sent to your hook), even if no changes are detected. |
generateSelector | If true , ignore the selector in each parent object and instead generate a unique selector that prevents overlap with other objects. |
hooks | A set of lambda hooks for defining your controller's behavior. |
Parent Resource
The parent resource is the "entry point" for the CompositeController. It should contain the information your controller needs to create children, such as a Pod template if your controller creates Pods. This is often a custom resource that you define (e.g. with CRD), and for which you are now implementing a custom controller.
CompositeController expects to have full control over this resource. That is, you shouldn't define a CompositeController with a parent resource that already has its own controller. See DecoratorController for an API that's better suited for adding behavior to existing resources.
The parentResource
rule has the following fields:
Field | Description |
---|---|
apiVersion | The API <group>/<version> of the parent resource, or just <version> for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the parent resource. (e.g. deployments , replicasets , statefulsets ) |
labelSelector | An optional label selector for narrowing down the objects to target. When not set defaults to all objects |
revisionHistory | If any child resources use rolling updates, this field specifies how parent revisions are tracked. |
ignoreStatusChanges | An optional field through which status changes can be ignored for reconcilation. If set to true , only spec changes or labels/annotations changes will reconcile the parent resource. |
Label Selector
Kubernetes APIs use labels and selectors to define subsets of objects, such as the Pods managed by a given ReplicaSet.
The parent resource of a CompositeController is assumed to have a
spec.selector
that matches the form of spec.selector
in built-in resources
like Deployment and StatefulSet (with matchLabels
and/or matchExpressions
).
If the parent object doesn't have this field, or it can't be parsed in the expected label selector format, the sync hook for that parent will fail, unless you are using selector generation.
The parent's label selector determines which child objects a given parent will try to manage, according to the ControllerRef rules. Metacontroller automatically handles orphaning and adoption for you, and will only send you the observed states of children you own.
These rules imply:
- Children you create must have labels that satisfy the parent's selector, or else they will be immediately orphaned and you'll never see them again.
- If other controllers or users create orphaned objects that match the parent's selector, Metacontroller will try to adopt them for you.
- If Metacontroller adopts an object, and you subsequently decline to list that object in your desired list of children, it will get deleted (because you now own it, but said you don't want it).
To avoid confusion, it's therefore important that users of your custom
controller specify a spec.selector
(on each parent object) that is
sufficiently precise to discriminate its child objects from those of other
parents in the same namespace.
Revision History
Within the parentResource
rule, the revisionHistory
field has the following subfields:
Field | Description |
---|---|
fieldPaths | A list of field path strings (e.g. spec.template ) specifying which parent fields trigger rolling updates of children (for any child resources that use rolling updates). Changes to other parent fields (e.g. spec.replicas ) apply immediately. Defaults to ["spec"] , meaning any change in the parent's spec triggers a rolling update. |
Child Resources
This list should contain a rule for every type of child resource that your controller creates on behalf of each parent.
Each entry in the childResources
list has the following fields:
Field | Description |
---|---|
apiVersion | The API group/version of the child resource, or just version for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the child resource. (e.g. deployments , replicasets , statefulsets ) |
updateStrategy | An optional field that specifies how to update children when they already exist but don't match your desired state. If no update strategy is specified, children of that type will never be updated if they already exist. |
Child Update Strategy
Within each rule in the childResources
list, the updateStrategy
field
has the following subfields:
Field | Description |
---|---|
method | A string indicating the overall method that should be used for updating this type of child resource. The default is OnDelete , which means don't try to update children that already exist. |
statusChecks | If any rolling update method is selected, children that have already been updated must pass these status checks before the rollout will continue, please also read this section |
Child Update Methods
Within each child resource's updateStrategy
, the method
field can have
these values:
Method | Description |
---|---|
OnDelete | Don't update existing children unless they get deleted by some other agent. |
Recreate | Immediately delete any children that differ from the desired state, and recreate them in the desired state. |
InPlace | Immediately update any children that differ from the desired state. |
RollingRecreate | Delete each child that differs from the desired state, one at a time, and recreate each child before moving on to the next one. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks. |
RollingInPlace | Update each child that differs from the desired state, one at a time. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks. |
Child Update Status Checks
Within each updateStrategy
, the statusChecks
field has the following subfields:
Field | Description |
---|---|
conditions | A list of status condition checks that must all pass on already-updated children for the rollout to continue. |
Status Condition Check
Within a set of statusChecks
, each item in the conditions
list has the following subfields:
Field | Description |
---|---|
type | A string specifying the status condition type to check. |
status | A string specifying the required status of the given status condition. If none is specified, the condition's status is not checked. |
reason | A string specifying the required reason of the given status condition. If none is specified, the condition's reason is not checked. |
Resync Period
By default, your sync hook will only be called when something changes in one of the resources you're watching, or when the local cache is flushed.
Sometimes you may want to sync periodically even if nothing has changed in the Kubernetes API objects, either to simply observe the passage of time, or because your hook takes external state into account. For example, CronJob uses a periodic resync to check whether it's time to start a new Job.
The resyncPeriodSeconds
value specifies how often to do this.
Each time it triggers, Metacontroller will send sync hook requests for
all objects of the parent resource type, with the latest observed
values of all the necessary objects.
Note that these objects will be retrieved from Metacontroller's local cache (kept up-to-date through watches), so adding a resync shouldn't add more load on the API server, unless you actually change objects. For example, it's relatively cheap to use this setting to poll until it's time to trigger some change, as long as most sync calls result in a no-op (no CRUD operations needed to achieve desired state).
Generate Selector
Usually, each parent object managed by a CompositeController must have its own
user-specified label selector, just like each
Deployment has its own label selector in spec.selector
.
However, sometimes it makes more sense to let the user of your API pretend there
are no labels or label selectors.
For example, the built-in Job API doesn't make you specify labels for your
Pods, and you can leave spec.selector
unset.
Because each Job object represents a unique invocation at a point in time,
you wouldn't expect a newly-created Job to be satisfied by finding a
pre-existing Pod that just happens to have the right labels.
On the other hand, a ReplicaSet assumes all Pods that match its selector are
interchangeable, so it would be happy to have one less replica it has to create.
If you set spec.generateSelector
to true
in your CompositeController
definition, Metacontroller will do the following:
- When creating children for you, Metacontroller will automatically add a label
that points to the parent object's unique ID (
metadata.uid
). - Metacontroller will not expect each parent object to contain a
spec.selector
, and will ignore the value even if one is set. - Metacontroller will manage children as if each parent object had an "imaginary" label selector that points to the unique ID label that Metacontroller added to all your children.
The end result is that you and the users of your API don't have to think about
labels or selectors, similar to the Job API.
The downside is that your API won't support all the same capabilities as
built-in APIs.
For example, with ReplicaSet or StatefulSet, you can delete the controller with
kubectl delete --cascade=false
to keep the Pods around, and later create a new
controller with the same selector to adopt those existing Pods instead of making
new ones from scratch.
Hooks
Within the CompositeController spec
, the hooks
field has the following subfields:
Field | Description |
---|---|
sync | Specifies how to call your sync hook, if any. |
finalize | Specifies how to call your finalize hook, if any. |
customize | Specifies how to call your customize hook, if any. |
Each field of hooks
contains subfields that specify how to invoke
that hook, such as by sending a request to a webhook.
Sync Hook
The sync
hook is how you specify which children to create/maintain
for a given parent -- in other words, your desired state.
Based on the CompositeController spec, Metacontroller gathers up all the resources you said you need to decide on the desired state, and sends you their latest observed states.
After you return your desired state, Metacontroller begins to take action to converge towards it -- creating, deleting, and updating objects as appropriate.
A simple way to think about your sync hook implementation is like a script
that generates JSON to be sent to kubectl apply
.
However, unlike a one-off client-side generator, your script has access to
the latest observed state in the cluster, and will automatically get called
any time that observed state changes.
Sync Hook Request
A separate request will be sent for each parent object, so your hook only needs to think about one parent at a time.
The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:
Field | Description |
---|---|
controller | The whole CompositeController object, like what you might get from kubectl get compositecontroller <name> -o json . |
parent | The parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json . |
children | An associative array of child objects that already exist. |
related | An associative array of related objects that exists, if customize hook was specified. See the customize hook |
finalizing | This is always false for the sync hook. See the finalize hook for details. |
Each field of the children
object represents one of the types of child resources
you specified in your CompositeController spec.
The field name for each child type is <Kind>.<apiVersion>
,
where <apiVersion>
could be just <version>
(for a core resource)
or <group>/<version>
, just like you'd write in a YAML file.
For example, the field name for Pods would be Pod.v1
,
while the field name for StatefulSets might be StatefulSet.apps/v1
.
For resources that exist in multiple versions, the apiVersion
you specify
in the child resource rule is the one you'll be sent.
Metacontroller requires you to be explicit about the version you expect
because it does conversion for you as needed, so your hook doesn't need
to know how to convert between different versions of a given resource.
Within each child type (e.g. in children['Pod.v1']
), there is another
associative array that maps from the child's path relative to the parent
to the JSON representation, like what you might get from
kubectl get <child-resource> <child-name> -o json
.
If the parent and child are of the same scope - both cluster or both namespace -
then the key is only the child's .metadata.name
. If the parent is
cluster scoped and the child is namespace scoped, then the key will be of the
form {.metadata.namespace}/{.metadata.name}
. This is to disambiguate between
two children with the same name in different namespaces. A parent may never be
namespace scoped while a child is cluster scoped.
For example, a Pod named my-pod
in the my-namespace
namespace could be
accessed as follows if the parent is also in my-namespace
:
request.children['Pod.v1']['my-pod']
Alternatively, if the parent resource is cluster scoped, the Pod could be accessed as:
request.children['Pod.v1']['my-namespace/my-pod']
Note that you will only be sent children that you "own" according to the ControllerRef rules. That means, for a given parent object, you will only see children whose labels match the parent's label selector, and that don't belong to any other parent.
There will always be an entry in children
for every child resource rule,
even if no children of that type were observed at the time of the sync.
For example, if you listed Pods as a child resource rule, but no existing Pods
matched the parent's selector, you will receive:
{
"children": {
"Pod.v1": {}
}
}
as opposed to:
{
"children": {}
}
Related resources, represented under related
field, are present in the same form as children
,
but representing resources matching customize
hook response for given parent
object.
Those object are not managed by controller, therefore are unmodificable, but you can use them to calculate children
's.
Some existing examples implementing this approach are :
- ConfigMapPropagation - makes copy of given ConfigMap in several namespaces.
- GlobalConfigMap - makes copy of given ConfigMap in every namespace.
- SecretPropagation - makes copy of given Secret in reach namespace satisfying label selector.
Please note, than when related resources is updated, sync
hook is triggered again (even if parent
object and children
does not change) - and you can recalculate
children state according to fresh view of related objects.
Sync Hook Response
The body of your response should be a JSON object with the following fields:
Field | Description |
---|---|
status | A JSON object that will completely replace the status field within the parent object. |
children | A list of JSON objects representing all the desired children for this parent object. |
resyncAfterSeconds | Set the delay (in seconds, as a float) before an optional, one-time, per-object resync. |
What you put in status
is up to you, but usually it's best to follow
conventions established by controllers like Deployment.
You should compute status
based only on the children that existed
when your hook was called; status represents a report on the last
observed state, not the new desired state.
The children
field should contain a flat list of objects,
not an associative array.
Metacontroller groups the objects it sends you by type and name as a
convenience to simplify your scripts, but it's actually redundant
since each object contains its own apiVersion
, kind
, and metadata.name
.
It's important to include the apiVersion
and kind
in objects
you return, and also to ensure that you list every type of
child resource you plan to create in the
CompositeController spec.
If the parent resource is cluster scoped and the child resource is namespaced,
it's important to include the .metadata.namespace
since the namespace cannot
be inferred from the parent's namespace.
Any objects sent as children in the request that you decline to return in your response list will be deleted. However, you shouldn't directly copy children from the request into the response because they're in different forms.
Instead, you should think of each entry in the list of children
as being
sent to kubectl apply
.
That is, you should set only the fields that you care about.
You can optionally set resyncAfterSeconds
to a value greater than 0 to request
that the sync
hook be called again with this particular parent object after
some delay (specified in seconds, with decimal fractions allowed).
Unlike the controller-wide resyncPeriodSeconds
, this is a
one-time request (not a request to start periodic resyncs), although you can
always return another resyncAfterSeconds
value from subsequent sync
calls.
Also unlike the controller-wide setting, this request only applies to the
particular parent object that this sync
call sent, so you can request
different delays (or omit the request) depending on the state of each object.
Note that your webhook handler must return a response with a status code of 200
to be considered successful. Metacontroller will wait for a response for up to the
amount defined in the Webhook spec.
Finalize Hook
If the finalize
hook is defined, Metacontroller will add a finalizer to the
parent object, which will prevent it from being deleted until your hook has had
a chance to run and the response indicates that you're done cleaning up.
This is useful for doing ordered teardown of children, or for cleaning up
resources you may have created in an external system.
If you don't define a finalize
hook, then when a parent object is deleted,
the garbage collector will delete all your children immediately,
and no hooks will be called.
The semantics of the finalize
hook are mostly equivalent to those of
the sync
hook.
Metacontroller will attempt to reconcile the desired states you return in the
children
field, and will set status
on the parent.
The main difference is that finalize
will be called instead of sync
when
it's time to clean up because the parent object is pending deletion.
Note that, just like sync
, your finalize
handler must be idempotent.
Metacontroller might call your hook multiple times as the observed state
changes, possibly even after you first indicate that you're done finalizing.
Your handler should know how to check what still needs to be done
and report success if there's nothing left to do.
Both sync
and finalize
have a request field called finalizing
that
indicates which hook was actually called.
This lets you implement finalize
either as a separate handler or as a check
within your sync
handler, depending on how much logic they share.
To use the same handler for both, just define a finalize
hook and set it to
the same value as your sync
hook.
Finalize Hook Request
The finalize
hook request has all the same fields as the
sync
hook request, with the following changes:
Field | Description |
---|---|
finalizing | This is always true for the finalize hook. See the finalize hook for details. |
If you share the same handler for both sync
and finalize
, you can use the
finalizing
field to tell whether it's time to clean up or whether it's a
normal sync.
If you define a separate handler just for finalize
, there's no need to check
the finalizing
field since it will always be true
.
Finalize Hook Response
The finalize
hook response has all the same fields as the
sync
hook response, with the following additions:
Field | Description |
---|---|
finalized | A boolean indicating whether you are done finalizing. |
To perform ordered teardown, you can generate children just like you would for
sync
, but omit some children from the desired state depending on the observed
set of children that are left.
For example, if you observe [A,B,C]
, generate only [A,B]
as your desired
state; if you observe [A,B]
, generate only [A]
; if you observe [A]
,
return an empty desired list []
.
Once the observed state passed in with the finalize
request meets all your
criteria (e.g. no more children were observed), and you have checked all
other criteria (e.g. no corresponding external resource exists), return true
for the finalized
field in your response.
Note that you should not return finalized: true
the first time you return
a desired state that you consider "final", since there's no guarantee that your
desired state will be reached immediately.
Instead, you should wait until the observed state matches what you want.
If the observed state passed in with the request doesn't meet your criteria,
you can return a successful response (HTTP code 200) with finalized: false
,
and Metacontroller will call your hook again automatically if anything changes
in the observed state.
If the only thing you're still waiting for is a state change in an external
system, and you don't need to assert any new desired state for your children,
returning success from the finalize
hook may mean that Metacontroller doesn't
call your hook again until the next periodic resync.
To reduce the delay, you can request a one-time, per-object resync by setting
resyncAfterSeconds
in your hook response, giving you
a chance to recheck the external state without holding up a slot in the work
queue.