Introduction
Metacontroller is an add-on for Kubernetes that makes it easy to write and deploy custom controllers. Although the open-source project was started at Google, the add-on works the same in any Kubernetes cluster.
While custom resources provide storage for new types of objects, custom controllers define the behavior of a new extension to the Kubernetes API. Just like the CustomResourceDefinition (CRD) API makes it easy to request storage for a custom resource, the Metacontroller APIs make it easy to define behavior for a new extension API or add custom behavior to existing APIs.
Simple Automation
Kubernetes provides a lot of powerful automation through its built-in APIs, but sometimes you just want to tweak one little thing or add a bit of logic on top. With Metacontroller, you can write and deploy new level-triggered API logic in minutes.
The code for your custom controller could be as simple as this example in Jsonnet that adds a label to Pods:
// This example is written in Jsonnet (a JSON templating language),
// but you can write hooks in any language.
function(request) {
local pod = request.object,
local labelKey = pod.metadata.annotations["pod-name-label"],
// Inject the Pod name as a label with the key requested in the annotation.
labels: {
[labelKey]: pod.metadata.name
}
}
Since all you need to provide is a webhook that understands JSON, you can use any programming language, often without any dependencies beyond the standard library. The code above is not a snippet; it's the entire script.
You can quickly deploy your code through any FaaS platform that offers HTTP(S) endpoints, or just load your script into a ConfigMap and launch a simple HTTP server to run it:
kubectl create configmap service-per-pod-hooks -n metacontroller --from-file=hooks
Finally, you declaratively specify how your script interacts with the Kubernetes API, which is analogous to writing a CustomResourceDefinition (to specify how to store objects):
apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
name: pod-name-label
spec:
resources:
- apiVersion: v1
resource: pods
annotationSelector:
matchExpressions:
- {key: pod-name-label, operator: Exists}
hooks:
sync:
webhook:
url: http://service-per-pod.metacontroller/sync-pod-name-label
This declarative specification means that your code never has to talk to the Kubernetes API, so you don't need to import any Kubernetes client library nor depend on any code provided by Kubernetes. You merely receive JSON describing the observed state of the world and return JSON describing your desired state.
Metacontroller remotely handles all interaction with the Kubernetes API. It runs a level-triggered reconciliation loop on your behalf, much the way CRD provides a declarative interface to request that the API Server store objects on your behalf.
Reusable Building Blocks
In addition to making ad hoc automation simple, Metacontroller also makes it easier to build and compose general-purpose abstractions.
For example, many built-in workload APIs like StatefulSet are almost trivial to reimplement as Metacontroller hooks, meaning you can easily fork and customize such APIs. Feature requests that used to take months to implement in the core Kubernetes repository can be hacked together in an afternoon by anyone who wants them.
You can also compose existing APIs into higher-level abstractions, such as how BlueGreenDeployment builds on top of the ReplicaSet and Service APIs.
Users can even invent new general-purpose APIs like IndexedJob, which is a Job-like API that provides unique Pod identities like StatefulSet.
Complex Orchestration
Extension APIs implemented with Metacontroller can also build on top of other extension APIs that are themselves implemented with Metacontroller. This pattern can be used to compose complex orchestration out of simple building blocks that each do one thing well.
For example, the Vitess Operator is implemented entirely as Jsonnet webhooks with Metacontroller. The end result is much more complex than ad hoc automation or even general-purpose workload abstractions, but the key is that this complexity arises solely from the challenge of orchestrating Vitess, a distributed MySQL clustering system.
Building Operators with Metacontroller frees developers from learning the internal machinery of implementing Kubernetes controllers and APIs, allowing them to focus on solving problems in the application domain. It also means they can take advantage of existing API machinery like shared caches without having to write their Operators in Go.
Metacontroller's webhook APIs are designed to make it feel like you're
writing a one-shot, client-side generator that spits out JSON that gets
piped to kubectl apply
.
In other words, if you already know how to manually manage an application
in Kubernetes with kubectl
, Metacontroller lets you write automation for
that app without having to learn a new language or how to use Kubernetes
client libraries.
Get Started
- Install Metacontroller
- Learn concepts
- See examples
- Create a controller
- Give feedback by filing GitHub issues.
- Contribute!
Introduction
Metacontroller is an add-on for Kubernetes that makes it easy to write and deploy custom controllers. Although the open-source project was started at Google, the add-on works the same in any Kubernetes cluster.
While custom resources provide storage for new types of objects, custom controllers define the behavior of a new extension to the Kubernetes API. Just like the CustomResourceDefinition (CRD) API makes it easy to request storage for a custom resource, the Metacontroller APIs make it easy to define behavior for a new extension API or add custom behavior to existing APIs.
Simple Automation
Kubernetes provides a lot of powerful automation through its built-in APIs, but sometimes you just want to tweak one little thing or add a bit of logic on top. With Metacontroller, you can write and deploy new level-triggered API logic in minutes.
The code for your custom controller could be as simple as this example in Jsonnet that adds a label to Pods:
// This example is written in Jsonnet (a JSON templating language),
// but you can write hooks in any language.
function(request) {
local pod = request.object,
local labelKey = pod.metadata.annotations["pod-name-label"],
// Inject the Pod name as a label with the key requested in the annotation.
labels: {
[labelKey]: pod.metadata.name
}
}
Since all you need to provide is a webhook that understands JSON, you can use any programming language, often without any dependencies beyond the standard library. The code above is not a snippet; it's the entire script.
You can quickly deploy your code through any FaaS platform that offers HTTP(S) endpoints, or just load your script into a ConfigMap and launch a simple HTTP server to run it:
kubectl create configmap service-per-pod-hooks -n metacontroller --from-file=hooks
Finally, you declaratively specify how your script interacts with the Kubernetes API, which is analogous to writing a CustomResourceDefinition (to specify how to store objects):
apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
name: pod-name-label
spec:
resources:
- apiVersion: v1
resource: pods
annotationSelector:
matchExpressions:
- {key: pod-name-label, operator: Exists}
hooks:
sync:
webhook:
url: http://service-per-pod.metacontroller/sync-pod-name-label
This declarative specification means that your code never has to talk to the Kubernetes API, so you don't need to import any Kubernetes client library nor depend on any code provided by Kubernetes. You merely receive JSON describing the observed state of the world and return JSON describing your desired state.
Metacontroller remotely handles all interaction with the Kubernetes API. It runs a level-triggered reconciliation loop on your behalf, much the way CRD provides a declarative interface to request that the API Server store objects on your behalf.
Reusable Building Blocks
In addition to making ad hoc automation simple, Metacontroller also makes it easier to build and compose general-purpose abstractions.
For example, many built-in workload APIs like StatefulSet are almost trivial to reimplement as Metacontroller hooks, meaning you can easily fork and customize such APIs. Feature requests that used to take months to implement in the core Kubernetes repository can be hacked together in an afternoon by anyone who wants them.
You can also compose existing APIs into higher-level abstractions, such as how BlueGreenDeployment builds on top of the ReplicaSet and Service APIs.
Users can even invent new general-purpose APIs like IndexedJob, which is a Job-like API that provides unique Pod identities like StatefulSet.
Complex Orchestration
Extension APIs implemented with Metacontroller can also build on top of other extension APIs that are themselves implemented with Metacontroller. This pattern can be used to compose complex orchestration out of simple building blocks that each do one thing well.
For example, the Vitess Operator is implemented entirely as Jsonnet webhooks with Metacontroller. The end result is much more complex than ad hoc automation or even general-purpose workload abstractions, but the key is that this complexity arises solely from the challenge of orchestrating Vitess, a distributed MySQL clustering system.
Building Operators with Metacontroller frees developers from learning the internal machinery of implementing Kubernetes controllers and APIs, allowing them to focus on solving problems in the application domain. It also means they can take advantage of existing API machinery like shared caches without having to write their Operators in Go.
Metacontroller's webhook APIs are designed to make it feel like you're
writing a one-shot, client-side generator that spits out JSON that gets
piped to kubectl apply
.
In other words, if you already know how to manually manage an application
in Kubernetes with kubectl
, Metacontroller lets you write automation for
that app without having to learn a new language or how to use Kubernetes
client libraries.
Get Started
- Install Metacontroller
- Learn concepts
- See examples
- Create a controller
- Give feedback by filing GitHub issues.
- Contribute!
Examples
This page lists some examples of what you can make with Metacontroller.
If you'd like to add a link to another example that demonstrates a new language or technique, please send a pull request against this document.
CompositeController
CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects based on the desired state specified in a parent object. Workload controllers like Deployment and StatefulSet are examples of existing controllers that fit this pattern.
CatSet (JavaScript)
CatSet is a rewrite of StatefulSet, including rolling updates, as a CompositeController. It shows that existing workload controllers already use a pattern that could fit within a CompositeController, namely managing child objects based on a parent spec.
BlueGreenDeployment (JavaScript)
BlueGreenDeployment is an alternative to Deployment that implements a Blue-Green rollout strategy. It shows how CompositeController can be used to add various automation on top of built-in APIs like ReplicaSet.
IndexedJob (Python)
IndexedJob is an alternative to Job that gives each Pod a unique index, like StatefulSet. It shows how to write a CompositeController in Python, and also demonstrates selector generation.
DecoratorController
DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which resources to watch, as well as filters on labels and annotations.
For each object you watch, you can add, edit, or remove labels and annotations, as well as create new objects and attach them. Unlike CompositeController, these new objects don't have to match the main object's label selector. Since they're attached to the main object, they'll be cleaned up automatically when the main object is deleted.
Service Per Pod (Jsonnet)
Service Per Pod is an example DecoratorController that creates an individual Service for every Pod in a StatefulSet (e.g. to give them static IPs), effectively adding new behavior to StatefulSet without having to reimplement it.
Customize hook examples
Customize hook is addition to Composite/Decorator controllers, extending information given in sync
hook of other objects (called related
) in addition to parent.
ConfigMapPropagation
ConfigMapPropagation is
a simple mechanism to propagate given ConfigMap
to other namespaces, specified in given objects. Source ConfigMap
is also specified.
This is also an example how Status
subresource should be handled.
Global Config Map
Global Config Map is similar to ConfigMapPropagation
. but populates ConfigMap
to all namespaces.
Secret propagation
Secret propagation is modyfication of ConfigMapPropagation
concept,
using label selector on Namespace
object to choose where to
propagate Secret
.
Concepts
This page provides some background on terms that are used throughout the Metacontroller documentation.
Kubernetes Concepts
These are some of the general Kubernetes Concepts that are particularly relevant to Metacontroller.
Resource
In the context of the Kubernetes API, a resource is a REST-style collection of API objects. When writing controllers, it's important to understand the following terminology.
Resource Name
There are many ways to refer to a resource. For example, you may have noticed that you can fetch ReplicaSets with any of the following commands:
kubectl get rs # short name
kubectl get replicaset # singular name
kubectl get replicasets # plural name
When writing controllers, it's important to note that the plural name is the canonical form when interacting with the REST API (it's in the URL) and API discovery (entries are keyed by plural name).
So, whenever Metacontroller asks for a resource name, you should use the
canonical, lowercase, plural form (e.g. replicasets
).
API Group
Each resource lives inside a particular API group, which helps different API authors avoid name conflicts. For example, you can have two resources with the same name as long as they are in different API groups.
API Version
Each API group has one or more available API versions. It's important to note that Kubernetes API versions are format versions. That is, each version is a different lens through which you can view objects in the collection, but you'll see the same set of underlying objects no matter which lens you view them through.
The API group and version are often combined in the form <group>/<version>
,
such as in the apiVersion
field of an API object.
APIs in the core group (like Pod) omit the group name in such cases,
specifying only <version>
.
API Kind
Whereas a resource is a collection of objects served at a particular REST path, the kind of a resource represents something like the type or class of those objects.
Since Kubernetes resources and kinds must have a 1-to-1 correspondence within a given API group, the resource name and kind are often used interchangeably in Kubernetes documentation. However, it's important to distinguish the resource and kind when writing controllers.
The kind is often the same as the singular resource name, except that it's written in UpperCamelCase. This is the form that you use when writing JSON or YAML manifests, and so it's also the form you should use when generating objects within a lambda hook:
apiVersion: apps/v1
kind: ReplicaSet
[...]
Custom Resource
A custom resource is any resource that's installed through dynamic API registration (either through CRD or aggregation), rather than by being compiled directly into the Kubernetes API server.
Controller
Distributed components in the Kubernetes control plane communicate with each other by posting records in a shared datastore (like a public message board), rather than sending direct messages (like email).
This design helps avoid silos of information. All participants can see what everyone is saying to everyone else, so each participant can easily access whatever information it needs to make the best decision, even as those needs change. The lack of silos also means extensions have the same power as built-in features.
In the context of the Kubernetes control plane, a controller is a long-running, automated, autonomous agent that participates in the control plane via this shared datastore (the Kubernetes API server). In the message board analogy, you can think of controllers like bots.
A given controller might participate by:
- observing objects in the API server as inputs and creating or updating other objects in the API server as outputs (e.g. creating Pods for a ReplicaSet);
- observing objects in the API server as inputs and taking action in some other domain (e.g. spawning containers for a Pod);
- creating or updating objects in the API server to report observations from some other domain (e.g. "the container is running");
- or any combination of the above.
Custom Controller
A custom controller is any controller that can be installed, upgraded, and removed in a running cluster, independently of the cluster's own lifecycle.
Metacontroller Concepts
These are some concepts that are specific to Metacontroller.
Metacontroller
Metacontroller is a server that extends Kubernetes with APIs that encapsulate the common parts of writing custom controllers.
Just like kube-controller-manager, this server hosts multiple controllers. However, the set of hosted controllers changes dynamically in response to updates in objects of the Metacontroller API types. Metacontroller is thus itself a controller that watches the Metacontroller API objects and launches hosted controllers in response. In other words, it's a controller-controller -- hence the name.
Lambda Controller
When you create a controller with one of the Metacontroller APIs, you provide a function that contains only the business logic specific to your controller. Since these functions are called via webhooks, you can write them in any language that can understand HTTP and JSON, and optionally host them with a Functions-as-a-Service provider.
The Metacontroller server then executes a control loop on your behalf, calling your function whenever necessary to decide what to do.
These callback-based controllers are called lambda controllers. To keep the interface as simple as possible, each lambda controller API targets a specific controller pattern, such as:
- CompositeController: objects composed of other objects
- DecoratorController: attach new behavior to existing objects
Support for other types of controller patterns will be added in the future, such as coordinating between Kubernetes API objects and external state in another domain.
Lambda Hook
Each lambda controller API defines a set of hooks, which it calls to let you implement your business logic.
Currently, these lambda hooks must be implemented as webhooks, but other mechanisms could be added in the future, such as gRPC or embedded scripting languages.
Features
This is a high-level overview of what Metacontroller provides for Kubernetes controller authors.
- Dynamic Scripting
- Controller Best Practices
- Declarative Watches
- Declarative Reconciliation
- Declarative Declarative Rolling Update
Dynamic Scripting
With Metacontroller's hook-based design, you can write controllers in any language while still taking advantage of the efficient machinery we developed in Go for core controllers.
This makes Metacontroller especially useful for rapid development of automation in dynamic scripting languages like Python or JavaScript, although you're also free to use statically-typed languages like Go or Java.
To support fast ramp-up and iteration on your ideas, Metacontroller makes it possible to write controllers with:
- No schema/IDL
- No generated code
- No library dependencies
- No container image build/push
Controller Best Practices
Controllers you write with Metacontroller automatically behave like first-class citizens out of the box, before you write any code.
All interaction with the Kubernetes API happens inside the Metacontroller server in response to your instructions. This allows Metacontroller to implement best practices learned from writing core controllers without polluting your business logic.
Even the simplest Hello, World example with Metacontroller already takes care of:
- Label selectors (for defining flexible collections of objects)
- Orphan/adopt semantics (controller reference)
- Garbage collection (owner references for automatic cleanup)
- Watches (for low latency)
- Caching (shared informers/reflectors/listers)
- Work queues (deduplicated parallelism)
- Optimistic concurrency (resource version)
- Retries with exponential backoff
- Periodic relist/resync
Declarative Watches
Rather than writing boilerplate code for each type of resource you want to watch, you simply list those resources declaratively:
childResources:
- apiVersion: v1
resource: pods
- apiVersion: v1
resource: persistentvolumeclaims
Behind the scenes, Metacontroller sets up watch streams that are shared across all controllers that use Metacontroller.
That means, for example, that you can create as many lambda controllers as you want that watch Pods, and the API server will only need to send one Pod watch stream (to Metacontroller itself).
Metacontroller then acts like a demultiplexer, determining which controllers will care about a given event in the stream and triggering their hooks only as needed.
Declarative Reconciliation
A large part of the expressiveness of the Kubernetes API is due to its focus on declarative management of cluster state, which lets you directly specify an end state without specifying how to get there. Metacontroller expands on this philosophy, allowing you to define controllers in terms of what they want without specifying how to get there.
Instead of thinking about imperative operations like create/read/update/delete, you just generate a list of all the things you want to exist. Based on the current cluster state, Metacontroller will then determine what actions are required to move the cluster towards your desired state and maintain it once its there.
Just like the built-in controllers, the reconciliation that Metacontroller performs for you is level-triggered so it's resilient to downtime (missed events), yet optimized for low latency and low API load through shared watches and caches.
However, the clear separation of deciding what you want (the hook you write) from running a low-latency, level-triggered reconciliation loop (what Metacontroller does for you) means you don't have to think about this.
Declarative Declarative Rolling Update
Another big contributor to the power of Kubernetes APIs like Deployment and StatefulSet is the ability to declaratively specify gradual state transitions. When you update your app's container image or configuration, for example, these controllers will slowly roll out Pods with the new template and automatically pause if things don't look right.
Under the hood, implementing gradual state transitions with level-triggered reconcilation loops involves careful bookkeeping with auxilliary records, which is why StatefulSet originally launched without rolling updates. Metacontroller lets you easily build your own APIs that offer declarative rolling updates without making you think about all this additional bookkeeping.
In fact, Metacontroller provides a declarative interface for configuring how you want to implement declarative rolling updates in your controller (declarative declarative rolling update), so you don't have to write any code to take advantage of this feature.
For example, adding support for rolling updates to a Metacontroller-based rewrite of StatefulSet looks essentially like this:
childResources:
- apiVersion: v1
resource: pods
+ updateStrategy:
+ method: RollingRecreate
+ statusChecks:
+ conditions:
+ - type: Ready
+ status: "True"
For comparison, the corresponding pull request to add rolling updates to StatefulSet itself involved over 9,000 lines of changes to business logic, boilerplate, and generated code.
FAQ
This page answers some common questions encountered while evaluating, setting up, and using Metacontroller.
If you have any questions that aren't answered here, please ask on the mailing list or Slack channel.
- Evaluating Metacontroller
- Setting Up Metacontroller
- Developing with Metacontroller
- Which languages can I write hooks in?
- How do I access the Kubernetes API from my hook?
- Can I call external APIs from my hook?
- How can I make sure external resources get cleaned up?
- Does Metacontroller support "apply" semantics?
- How do I host my hook?
- How can I provide a programmatic client for my API?
- What are the best practices for designing controllers?
- How do I troubleshoot problems?
Evaluating Metacontroller
How does Metacontroller compare with other tools?
See the features page for a list of the things that are most unique about Metacontroller's approach.
In general, Metacontroller aims to make common patterns as simple as possible, without necessarily supporting the full flexibility you would have if you wrote a controller from scratch. The philosophy is analogous to that of CustomResourceDefinition (CRD), where the main API server does all the heavy lifting for you, but you don't have as much control as you would if you wrote your own API server and connected it through aggregation.
Just like CRD, Metacontroller started with a small set of capabilities and is expanding over time to support more customization and more use cases as we gain confidence in the abstractions. Depending on your use case, you may prefer one of the alternative tools that took the opposite approach of first allowing everything and then building "rails" over time to encourage best practices and simplify development.
What is Metacontroller good for?
Metacontroller is intended to be a generic tool for creating many kinds of Kubernetes controllers, but one of its earliest motivating use cases was to simplify development of custom workload automation, so it's particularly well-suited for this.
For example, if you've ever thought, "I wish StatefulSet would do this one thing differently," Metacontroller gives you the tools to define your own custom behavior without reinventing the wheel.
Metacontroller is also well-suited to people who prefer languages other than Go, but still want to benefit from the efficient API machinery that was developed in Go for the core Kubernetes controllers.
Lastly, Metacontroller is good for rapid development of automation on top of APIs that already exist as Kubernetes resources, such as:
- ad hoc scripting ("make an X for every Y")
- configuration abstraction ("when I say A, that means {X,Y,Z}")
- higher-level automation of custom APIs added by Operators
- gluing an external CRUD API into the Kubernetes control plane with a simple translation layer
What is Metacontroller not good for?
Metacontroller is not a good fit when you need to examine a large number of objects to answer a single hook request. For example, if you need to be sent a list of all Pods or all Nodes in order to decide on your desired state, we'd have to call your hook with the full list of all Pods or Nodes any time any one of them changed. However, it might be a good fit if your desired behavior can be naturally broken down into per-Pod or per-Node tasks, since then we'd only need to call your hook with each object that changed.
Metacontroller is also not a good fit for writing controllers that perform long sequences of imperative steps -- for example, a single hook that executes many steps of a workflow by creating various children at the right times. That's because Metacontroller hooks work best when they use a functional style (no side effects, and output depends only on input), which is an awkward style for defining imperative sequences.
Do I have to use CRD?
It's common to use CRD, but Metacontroller doesn't know or care whether a resource is built-in or custom, nor whether it's served by CRD or by an aggregated API server.
Metacontroller uses API discovery and the dynamic client to treat all resources the same, so you can write automation for any type of resource. Using the dynamic client also means Metacontroller doesn't need to be updated when new APIs or fields are added in subsequent Kubernetes releases.
What does the name Metacontroller mean?
The name Metacontroller comes from the English words meta and controller. Metacontroller is a controller controller -- a controller that controls other controllers.
How do you pronounce Metacontroller?
Please see the pronunciation guide.
Setting Up Metacontroller
Do I need to be a cluster admin to install Metacontroller?
Installing Metacontroller requires permission to both install CRDs (representing the Metacontroller APIs themselves) and grant permissions for Metacontroller to access other resources on behalf of the controllers it hosts.
Why is Metacontroller shared cluster-wide?
Metacontroller currently only supports cluster-wide installation because it's modeled after the built-in kube-controller-manager component to achieve the same benefits of sharing watches and caches.
Also, resources in general (either built-in or custom) can only be installed cluster-wide, and a Kubernetes API object is conventionally intended to mean the same thing regardless of what namespace it's in.
Why does Metacontroller need these permissions?
During alpha, Metacontroller simply requests wildcard permission to all resources so the controllers it hosts can access anything they want. For this reason, you should only give trusted users access to the Metacontroller APIs that create hosted controllers.
By contrast, core controllers are restricted to only the minimal set of permissions needed to do their jobs.
Does Metacontroller have to be in its own namespace?
The default installation manifests put Metacontroller in its own namespace
to make it easy to see what's there and clean up if necessary,
but it can run anywhere.
The metacontroller
namespace is also used in examples for similar
convenience reasons, but you can run webhooks in any namespace
or even host them outside the cluster.
Developing with Metacontroller
Which languages can I write hooks in?
You can write lambda hooks (the business logic for your controller) in any language, as long as you can host it as a webhook that accepts and returns JSON. Regardless of which language you use for your business logic, Metacontroller uses the efficient machinery written in Go for the core controllers to interact with the API server on your behalf.
How do I access the Kubernetes API from my hook?
You don't! Or at least, you don't have to, and it's best not to. Instead, you just declare what objects you care about and Metacontroller will send them to you as part of the hook request. Then, your hook should simply return a list of desired objects. Metacontroller will take care of reconciling your desired state.
Can I call external APIs from my hook?
Yes. Your hook code can do whatever it wants as part of computing a response to a Metacontroller hook request, including calling external APIs.
The main thing to be careful of is to avoid synchronously waiting for long-running tasks to finish, since that will hold up one of a fixed number of concurrent slots in the queue of triggers for that hook. Instead, if your hook needs to wait for some condition that's checked through an external API, you should return a status that indicates this pending state, and set a resync period so you get a chance to check the condition again later.
How can I make sure external resources get cleaned up?
If you allocate external resources as part of your hook, you should also implement a finalize hook to make sure you get a chance to clean up those external resources when the Kubernetes API object for which you created them goes away.
Does Metacontroller support "apply" semantics?
Yes, Metacontroller enforces apply semantics, which means your controller will play nicely with other automation as long as you only fill in the fields that you care about in the objects you return.
How do I host my hook?
You can host your lambda hooks with an HTTP server library in your chosen language, with a standalone HTTP server, or with a Functions-as-a-Service platform. See the examples page for approaches in various languages.
How can I provide a programmatic client for my API?
Since Metacontroller uses the dynamic client on your behalf, you can write your controller's business logic without any client library at all. That also means you can write a "dynamically typed" controller without creating static schema (either Kubernetes' Go IDL or OpenAPI) or generating a client.
However, if you want to provide a static client for users of your API, nothing about Metacontroller prevents you from writing Go IDL or OpenAPI and generating a client the same way you would without Metacontroller.
What are the best practices for designing controllers?
Please see the dedicated best practices guide.
How do I troubleshoot problems?
Please see the dedicated troubleshooting guide.
How to pronounce Metacontroller
Metacontroller is pronounced as me-ta-con-trol-ler.
User Guide
This section contains general tips and step-by-step tutorials for using Metacontroller.
See the API Reference for details about all the available options.
Installation
This page describes how to install Metacontroller, either to develop your own controllers or just to run third-party controllers that depend on it.
Create a Controller
This tutorial walks through a simple example of creating a controller in Python with Metacontroller.
Best Practices
Metacontroller will take care of merging your change to importantField
while
preserving the fields you don't care about that were set by others.
Troubleshooting
This is a collection of tips for debugging controllers written with Metacontroller.
Installation
This page describes how to install Metacontroller, either to develop your own controllers or just to run third-party controllers that depend on it.
- Docker images
- Prerequisites
- Install Metacontroller using Kustomize
- Install Metacontroller using Helm
- Migrating from /GoogleCloudPlatform/metacontroller
Docker images
Images are hosted in two places:
Feel free to use whatever suits your need, they identical. Note - currently in helm
charts the dockerhub one's are used.
Prerequisites
- Kubernetes v1.17+ (because of maintainability, e2e test suite might not cover all releases)
- You should have
kubectl
available and configured to talk to the desired cluster.
Grant yourself cluster-admin (GKE only)
Due to a known issue
in GKE, you'll need to first grant yourself cluster-admin
privileges before
you can install the necessary RBAC manifests.
kubectl create clusterrolebinding <user>-cluster-admin-binding --clusterrole=cluster-admin --user=<user>@<domain>
Replace <user>
and <domain>
above based on the account you use to authenticate to GKE.
Install Metacontroller using Kustomize
# Apply all set of production resources defined in kustomization.yaml in `production` directory .
kubectl apply -k https://github.com/metacontroller/metacontroller/manifests/production
If you prefer to build and host your own images, please see the build instructions in the contributor guide.
If your kubectl
version does does not support -k
flag, please
install resources mentioned in manifests/production/kustomization.yaml
one by one manually with kubectl apply -f {{filename}}
command.
Install Metacontroller using Helm
Alternatively, metacontroller can be installed using an Helm chart.
Migrating from /GoogleCloudPlatform/metacontroller
As current version of metacontroller uses different name of the finalizer than GCP version (GCP - metacontroller.app
,
current version - metacontroller.io
) thus after installing metacontroller
you might need to clean up old finalizers,
i.e. by running:
kubectl get <comma separated list of your resource types here> --no-headers --all-namespaces | awk '{print $2 " -n " $1}' | xargs -L1 -P 50 -r kubectl patch -p '{"metadata":{"finalizers": [null]}}' --type=merge
Install Metacontroller using Helm
Building the chart from source code
The chart can be built from metacontroller source:
git clone https://github.com/metacontroller/metacontroller.git
cd metacontroller
helm package deploy/helm/metacontroller --destination deploy/helm
Installing the chart from package
helm install metacontroller deploy/helm/metacontroller-helm-v*.tgz
Installing chart from ghcr.io
Charts are published as packages on ghcr.io
You can pull them like:
HELM_EXPERIMENTAL_OCI=1 helm pull oci://ghcr.io/metacontroller/metacontroller-helm --version=<version>
as OCI is currently (at least for helm 3.8.x) a beta feature.
Configuration
Parameter | Description | Default |
---|---|---|
command | Command which is used to start metacontroller | /usr/bin/metacontroller |
commandArgs | Command arguments which are used to start metacontroller. See configuration.md for additional details. | [ "--zap-log-level=4", "--discovery-interval=20s", "--cache-flush-interval=30m" ] |
rbac.create | Create and use RBAC resources | true |
image.repository | Image repository | metacontrollerio/metacontroller |
image.pullPolicy | Image pull policy | IfNotPresent |
image.tag | Image tag | "" (Chart.AppVersion ) |
imagePullSecrets | Image pull secrets | [] |
nameOverride | Override the deployment name | "" (Chart.Name ) |
namespaceOverride | Override the deployment namespace | "" (Release.Namespace ) |
fullnameOverride | Override the deployment full name | "" (Release.Namespace-Chart.Name ) |
serviceAccount.create | Create service account | true |
serviceAccount.annotations | ServiceAccount annotations | {} |
serviceAccount.name | Service account name to use, when empty will be set to created account if serviceAccount.create is set else to default | "" |
podAnnotations | Pod annotations | {} |
podSecurityContext | Pod security context | {} |
securityContext | Container security context | {} |
resources | CPU/Memory resource requests/limits | {} |
nodeSelector | Node labels for pod assignment | {} |
tolerations | Toleration labels for pod assignment | [] |
affinity | Affinity settings for pod assignment | {} |
priorityClassName | The name of the PriorityClass that will be assigned to metacontroller | "" |
clusterRole.aggregationRule | The aggregationRule applied to metacontroller ClusterRole | {} |
clusterRole.rules | The rules applied to metacontroller ClusterRole | { "apiGroups": "*", "resources": "*", "verbs": "*" } |
replicas | Specifies the number of metacontroller pods that will be deployed | 1 |
podDisruptionBudget | The podDisruptionBudget applied to metacontroller pods | {} |
service.enabled | If true , then create a Service to expose ports | false |
service.ports | List of ports that are exposed on the Service | [] |
Configuration
This page describes how to configure Metacontroller.
Command line flags
The Metacontroller server has a few settings that can be configured
with command-line flags (by editing the Metacontroller StatefulSet
in manifests/metacontroller.yaml
):
Flag | Description |
---|---|
--zap-log-level | Zap log level to configure the verbosity of logging. Can be one of ‘debug’, ‘info’, ‘error’, or any integer value > 0 which corresponds to custom debug levels of increasing verbosity(e.g. --zap-log-level=5 ). Level 4 logs Metacontroller's interaction with the API server. Levels 5 and up additionally log details of Metacontroller's invocation of lambda hooks. See the troubleshooting guide for more. |
--zap-devel | Development Mode (e.g. --zap-devel ) defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). |
--zap-encoder | Zap log encoding - json or console (e.g. --zap-encoder='json' ) defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). |
--zap-stacktrace-level | Zap Level at and above which stacktraces are captured - one of info or error (e.g. --zap-stacktrace-level='info' ). |
--discovery-interval | How often to refresh discovery cache to pick up newly-installed resources (e.g. --discovery-interval=10s ). |
--cache-flush-interval | How often to flush local caches and relist objects from the API server (e.g. --cache-flush-interval=30m ). |
--metrics-address | The address to bind metrics endpoint - /metrics (e.g. --metrics-address=:9999 ). It can be set to "0" to disable the metrics serving. |
--kubeconfig | Path to kubeconfig file (same format as used by kubectl); if not specified, use in-cluster config (e.g. --kubeconfig=/path/to/kubeconfig ). |
--client-go-qps | Number of queries per second client-go is allowed to make (default 5, e.g. --client-go-qps=100 ) |
--client-go-burst | Allowed burst queries for client-go (default 10, e.g. --client-go-burst=200 ) |
--workers | Number of sync workers to run (default 5, e.g. --workers=100 ) |
--events-qps | Rate of events flowing per object (default - 1 event per 5 minutes, e.g. --events-qps=0.0033 ) |
--events-burst | Number of events allowed to send per object (default 25, e.g. --events-burst=25 ) |
--pprof-address | Enable pprof and bind to endpoint /debug/pprof, set to 0 to disable pprof serving (default 0, e.g. --pprof-address=:6060 ) |
--leader-election | Determines whether or not to use leader election when starting metacontroller (default false , e.g., --leader-election ) |
--leader-election-resource-lock | Determines which resource lock to use for leader election (default leases , e.g., --leader-election-resource-lock=leases ). Valid resource locks are endpoints , configmaps , leases , endpointsleases , or configmapsleases . See the client-go documentation leaderelection/resourcelock for additional information. |
--leader-election-namespace | Determines the namespace in which the leader election resource will be created. If metacontroller is running in-cluster, the default leader election namespace is the same namespace as metacontroller. If metacontroller is running out-of-cluster, the default leader election namespace is undefined. If you are running metacontroller out-of-cluster with leader election enabled, you must specify the leader election namespace. (e.g., --leader-election-namespace=metacontroller ) |
--leader-election-id | Determines the name of the resource that leader election will use for holding the leader lock. For example, if the leader election id is metacontroller and the leader election resource lock is leases , then a resource of kind leases with metadata.name metacontroller will hold the leader lock. (default metacontroller, e.g., --leader-election-id=metacontroller ) |
--health-probe-bind-address | The address the health probes endpoint binds to (default ":8081", e.g., --health-probe-bind-address=":8081" ) |
--target-label-selector | Label selector used to restrict an instance of metacontroller to manage specific Composite and Decorator controllers, which enables the ability to run multiple metacontroller instances on the same cluster (e.g. --target-label-selector=controller-group=cicd" ) |
Logging flags are being set by controller-runtime
, more on the meaning of them can be found here
Running multiple instances
Metacontroller can be setup to run multiple instances in the same Kubernetes cluster that can watch resources based on separate grouping or as a way to split responsibilities; which can also act as a scaling aid.
This is made possible by configuring Metacontroller with the target-label-selector
argument.
Further details on this feature can be found here.
Pros
- Clean separation of different Metacontroller instances, in case of
- Permissions needed to manage its controllers (they can be limited to what the actual operator needs)
- Allowing the Sidecar pattern - so in the pod there is Metacontroller pod and operator pod, metacontroller manages only this operator.
- Allow scaling (in a primitive way in term of separation of concerns / grouping)
Cons
- Metacontroller fighting over resources - can be caused by two Metacontroller instances managing the same CRD
- Care needs to be taken when configuring multiple Metacontroller instances and not just deploy the default configuration but properly set the
target-label-selector
value for each Metacontroller instance to be unique if what it is managing.
- Care needs to be taken when configuring multiple Metacontroller instances and not just deploy the default configuration but properly set the
Example
1. Configure Metacontroller
Add the --target-label-selector
argument to Metacontroller binary arguments; below this is inside the Kubernetes deployment spec for Metacontroller:
...
spec:
containers:
- args:
- --zap-devel
- --zap-log-level=5
- --discovery-interval=5s
- --target-label-selector=controller-group=cicd
...
You can also use the more advanced features like Equality-based requirement and Set-base requirement when defining your target label selector.
2. Specify labels on the Controller
Below is an example of a Decorator Controller that has the appropriate label added so that this instance of Metacontroller can target and manage it:
apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
name: pod-name-label
labels:
controller-group: cicd
spec:
resources:
- apiVersion: v1
resource: pods
annotationSelector:
matchExpressions:
- {key: pod-name-label, operator: Exists}
hooks:
sync:
webhook:
url: http://service-per-pod.metacontroller/sync-pod-name-label
Create a Controller
This tutorial walks through a simple example of creating a controller in Python with Metacontroller.
Prerequisites
- Kubernetes v1.9+
- You should have
kubectl
available and configured to talk to the desired cluster. - You should have already installed Metacontroller.
Hello, World!
In this example, we'll create a useless controller that runs a single Pod that prints a greeting to its standard output. Once you're familiar with the general process, you can look through the examples page to find concepts that actually do something useful.
To make cleanup easier, first create a new Namespace called hello
:
kubectl create namespace hello
We'll put all our Namespace-scoped objects there by adding -n hello
to the
kubectl
commands.
Define a custom resource
Our example controller will implement the behavior for a new API represented as a custom resource.
First, let's use the built-in CustomResourceDefinition API to set up a storage location (a helloworlds resource) for objects of our custom type (HelloWorld).
Save the following to a file called crd.yaml
:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: helloworlds.example.com
spec:
group: example.com
names:
kind: HelloWorld
plural: helloworlds
singular: helloworld
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
who:
type: string
subresources:
status: {}
Then apply it to your cluster:
kubectl apply -f crd.yaml
Define a custom controller
For each HelloWorld object, we're going to create a Pod as a child object, so we'll use the CompositeController API to implement a controller that defines this parent-child relationship.
Save the following to a file called controller.yaml
:
apiVersion: metacontroller.k8s.io/v1alpha1
kind: CompositeController
metadata:
name: hello-controller
spec:
generateSelector: true
parentResource:
apiVersion: example.com/v1
resource: helloworlds
childResources:
- apiVersion: v1
resource: pods
updateStrategy:
method: Recreate
hooks:
sync:
webhook:
url: http://hello-controller.hello/sync
Then apply it to your cluster:
kubectl apply -f controller.yaml
This tells Metacontroller to start a reconciling control loop
for you, running inside the Metacontroller server.
The parameters under spec:
let you tune the behavior of the controller
declaratively.
In this case:
- We set
generateSelector
totrue
to mimic the built-in Job API since we're running a Pod to completion and don't want to share Pods across invocations. - The
parentResource
is our custom resource calledhelloworlds
. - The idea of CompositeController is that the parent resource represents
objects that are composed of other objects.
A HelloWorld is composed of just a Pod, so we have only one entry in the
childResources
list. - For each child resource, we can optionally set an
updateStrategy
to specify what to do if a child object needs to be updated. Since Pods are effectively immutable, we use theRecreate
method, which means, "delete the outdated object and create a new one". - Finally, we tell Metacontroller how to invoke the
sync
webhook, which is where we'll define the business logic of our controller. The example relies on in-cluster DNS to resolve the address of thehello-controller
Service (which we'll define below) within thehello
Namespace.
Write a webhook
Metacontroller will handle the controllery bits for us, but we still need to tell it what our controller actually does.
To define our business logic, we write a webhook that generates child objects based on the parent spec, which is provided as JSON in the webhook request. The sync hook request contains additional information as well, but the parent spec is all we need for this example.
You can write Metacontroller hooks in any language, but Python is particularly nice because its dictionary type is convenient for programmatically building JSON objects (like the Pod object below).
If you have a preferred Functions-as-a-Service framework, you can use that to
write your webhook, but we'll keep this example self-contained by relying on
the basic HTTP server module in the Python standard library.
The do_POST()
method handles decoding and encoding the request and response
as JSON.
The real hook logic is in the sync()
method, and consists primarily of
building a Pod object.
Because Metacontroller uses apply semantics, you can simply return the
Pod object as if you were creating it, every time.
If the Pod already exists, Metacontroller will take care of updates according
to your update strategy.
In this case, we set the update method to Recreate
, so an existing Pod
would be deleted and replaced if it doesn't match the desired state returned
by your hook.
Notice, however, that the hook code below doesn't need to mention any of that
because it's only responsible for computing the desired state;
the Metacontroller server takes care of
reconciling with the observed state.
Save the following to a file called sync.py
:
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
class Controller(BaseHTTPRequestHandler):
def sync(self, parent, children):
# Compute status based on observed state.
desired_status = {
"pods": len(children["Pod.v1"])
}
# Generate the desired child object(s).
who = parent.get("spec", {}).get("who", "World")
desired_pods = [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": parent["metadata"]["name"]
},
"spec": {
"restartPolicy": "OnFailure",
"containers": [
{
"name": "hello",
"image": "busybox",
"command": ["echo", "Hello, %s!" % who]
}
]
}
}
]
return {"status": desired_status, "children": desired_pods}
def do_POST(self):
# Serve the sync() function as a JSON webhook.
observed = json.loads(self.rfile.read(int(self.headers.get("content-length"))))
desired = self.sync(observed["parent"], observed["children"])
self.send_response(200)
self.send_header("Content-type", "application/json")
self.end_headers()
self.wfile.write(json.dumps(desired).encode())
HTTPServer(("", 80), Controller).serve_forever()
Then load it into your cluster as a ConfigMap:
kubectl -n hello create configmap hello-controller --from-file=sync.py
Note: The -n hello
flag is important to put the ConfigMap in the
hello
namespace we created for the tutorial.
Deploy the webhook
Finally, since we wrote our hook as a self-contained Python web server, we need to deploy it somewhere that Metacontroller can reach. Luckily, we have this thing called Kubernetes which is great at hosting stateless web services.
Since our hook consists of only a small Python script, we'll use a generic Python container image and mount the script from the ConfigMap we created.
Save the following to a file called webhook.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-controller
spec:
replicas: 1
selector:
matchLabels:
app: hello-controller
template:
metadata:
labels:
app: hello-controller
spec:
containers:
- name: controller
image: python:3
command: ["python3", "/hooks/sync.py"]
volumeMounts:
- name: hooks
mountPath: /hooks
volumes:
- name: hooks
configMap:
name: hello-controller
---
apiVersion: v1
kind: Service
metadata:
name: hello-controller
spec:
selector:
app: hello-controller
ports:
- port: 80
Then apply it to your cluster:
kubectl -n hello apply -f webhook.yaml
Try it out
Now we can create HelloWorld objects and see what they do.
Save the following to a file called hello.yaml
:
apiVersion: example.com/v1
kind: HelloWorld
metadata:
name: your-name
spec:
who: Your Name
Then apply it to your cluster:
kubectl -n hello apply -f hello.yaml
Our controller should see this and create a Pod that prints a greeting and then exits.
kubectl -n hello get pods
You should see something like this:
NAME READY STATUS RESTARTS AGE
hello-controller-746fc7c4dc-rzslh 1/1 Running 0 2m
your-name 0/1 Completed 0 15s
Then you can check the logs on the Completed Pod:
kubectl -n hello logs your-name
Which should look like this:
Hello, Your Name!
Now let's look at what happens when you update the parent object, for example to change the name:
kubectl -n hello patch helloworld your-name --type=merge -p '{"spec":{"who":"My Name"}}'
If you now check the Pod logs again:
kubectl -n hello logs your-name
You should see that the Pod was updated (actually deleted and recreated) to print a greeting to the new name, even though the hook code doesn't mention anything about updates.
Hello, My Name!
Clean up
Another thing Metacontroller does for you by default is set up links so that child objects are removed by the garbage collector when the parent goes away (assuming your cluster is version 1.8+).
You can check this by deleting the parent:
kubectl -n hello delete helloworld your-name
And then checking for the child Pod:
kubectl -n hello get pods
You should see that the child Pod was cleaned up automatically, so only the webhook Pod remains:
NAME READY STATUS RESTARTS AGE
hello-controller-746fc7c4dc-rzslh 1/1 Running 0 3m
When you're done with the tutorial, you should remove the controller, CRD, and Namespace as follows:
kubectl delete compositecontroller hello-controller
kubectl delete crd helloworlds.example.com
kubectl delete ns hello
Next Steps
- Explore other example controllers.
- Read about best practices for writing controllers.
- Learn how to troubleshoot controllers.
- Dive into the details of all the available Metacontroller APIs.
Constraints and best practices
This is a collection of recommendations for writing controllers with Metacontroller.
If you have something to add to the collection, please send a pull request against this document.
Constraints
Objects relationship
Because of limitations of Kubernetes garbage collection we have following restrictions between objects:
Parent | Child | Related |
---|---|---|
Cluster | - Cluster - Namespaced (any namespace) | - Cluster - Namespaced (any namespace) |
Namespaced | - Namespaced (the same namespace as parent) | - Namespaced (the same namespace as parent) |
Lambda Hooks
Apply Semantics
Because Metacontroller uses apply semantics, you don't have to think about whether a given object needs to be created (because it doesn't exist) or patched (because it exists and some fields don't match your desired state). In either case, you should generate a fresh object from scratch with only the fields you care about filled in.
For example, suppose you create an object like this:
apiVersion: example.com/v1
kind: Foo
metadata:
name: my-foo
spec:
importantField: 1
Then later you decide to change the value of importantField
to 2.
Since Kubernetes API objects can be edited by the API server, users, and other controllers to collaboratively produce emergent behavior, the object you observe might now look like this:
apiVersion: example.com/v1
kind: Foo
metadata:
name: my-foo
stuffFilledByAPIServer: blah
spec:
importantField: 1
otherField: 5
To avoid overwriting the parts of the object you don't care about, you would ordinarily need to either build a patch or use a retry loop to send concurrency-safe updates. With apply semantics, you instead just call your "generate object" function again with the new values you want, and return this (as JSON):
apiVersion: example.com/v1
kind: Foo
metadata:
name: my-foo
spec:
importantField: 2
Metacontroller will take care of merging your change to importantField
while
preserving the fields you don't care about that were set by others.
Side Effects
Your hook code should generally be free of side effects whenever possible. Ideally, you should interpret a call to your hook as asking, "Hypothetically, if the observed state of the world were like this, what would your desired state be?"
In particular, Metacontroller may ask you about such hypothetical scenarios during rolling updates, when your object is undergoing a slow transition between two desired states. If your hook has to produce side effects to work, you should avoid enabling rolling updates on that controller.
Status
If your object uses the Spec/Status convention, keep in mind that the Status returned from your hook should ideally reflect a judgement on only the observed objects that were sent to you. The Status you compute should not yet account for your desired state, because the actual state of the world may not match what you want yet.
For example, if you observe 2 Pods, but you return a desired list of 3 Pods,
you should return a Status that reflects only the observed Pods
(e.g. replicas: 2
).
This is important so that Status reflects present reality, not future desires.
Working with Status subresource in metacontroller
If you would like to expose and use the Status
subresource in your custom resource, you should take care of:
- having a proper CRD schema definition for
Status
section in order to let metacontroller update it successfully - it must be a part of CRD schema, i.e.
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: configmappropagations.examples.metacontroller.io
spec:
...
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
...
status:
type: object
properties:
expected_copies:
type: integer
actual_copies:
type: integer
observedGeneration:
type: integer
required:
- spec
subresources:
status: {}
- your controller must be strict about the types in the schema defined in CRD, i.e., in example above
do not try to set any of the
integer
fields asstring
s, or add additional fields there.
To read more about Status
subresource please look at:
- Kubernetes documentation - https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource
Troubleshooting
This is a collection of tips for debugging controllers written with Metacontroller.
If you have something to add to the collection, please send a pull request against this document.
Events
As metacontroller emits kubernetes Events for internal actions, you might check events on parent object, like:
kubectl describe secretpropagations.examples.metacontroller.io <name>
where, at the end, you will see all events related with given parent:
Name: secret-propagation
Namespace:
Labels: <none>
Annotations: <none>
API Version: examples.metacontroller.io/v1alpha1
Kind: SecretPropagation
Metadata:
Creation Timestamp: 2021-07-14T20:25:09Z
...
Spec:
Source Name: shareable
Source Namespace: omega
Target Namespace Label Selector:
Match Labels:
Propagate: true
Status:
Working: fine
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SyncError 1s (x11 over 8s) metacontroller Sync error: sync hook failed for SecretPropagation /secret-propagation: sync hook failed: http error: Post "http://secret-propagation-controller.metacontroller/sync": dial tcp 10.96.138.14:80: connect: connection refused
You can access also events using kubectl get events
, which return all events from given namespace. As metacontroller
CRD's are might be cluster wide, they can land in default
namespace:
> kubectl get events -n default
39m Normal Started compositecontroller/secret-propagation-controller Started controller: secret-propagation-controller
39m Normal Starting compositecontroller/secret-propagation-controller Starting controller: secret-propagation-controller
39m Normal Stopping compositecontroller/secret-propagation-controller Stopping controller: secret-propagation-controller
39m Normal Stopped compositecontroller/secret-propagation-controller Stopped controller: secret-propagation-controller
6m25s Normal Started compositecontroller/secret-propagation-controller Started controller: secret-propagation-controller
6m25s Normal Starting compositecontroller/secret-propagation-controller Starting controller: secret-propagation-controller
2m27s Normal Stopping compositecontroller/secret-propagation-controller Stopping controller: secret-propagation-controller
2m27s Normal Stopped compositecontroller/secret-propagation-controller Stopped controller: secret-propagation-controller
Metacontroller Logs
Until Metacontroller emits events, the first place to look when troubleshooting controller behavior is the logs for the Metacontroller server itself.
For example, you can fetch the last 25 lines with a command like this:
kubectl -n metacontroller logs --tail=25 -l app=metacontroller
Log Levels
You can customize the verbosity of the Metacontroller server's logs with the
--zap-log-level
flag.
At all log levels, Metacontroller will log the progress of server startup and shutdown, as well as major changes like starting and stopping hosted controllers.
At level 4 and above, Metacontroller will log actions (like create/update/delete) on individual objects (like Pods) that it takes on behalf of hosted controllers. It will also log when it decides to sync a given controller as well as events that may trigger a sync.
At level 5 and above, Metacontroller will log the diffs between existing objects, and the desired state of those objects returned by controller hooks.
At level 6 and above, Metacontroller will log every hook invocation as well as the JSON request and response bodies.
Common Log Messages
Since API discovery info is refreshed periodically, you may see log messages like this when you start a controller that depends on a recently-installed CRD:
failed to sync CompositeController "my-controller": discovery: can't find resource <resource> in apiVersion <group>/<version>
Usually, this should fix itself within about 30s when the new CRD is discovered. If this message continues indefinitely, check that the resource name and API group/version are correct.
You may also notice periodic log messages like this:
Watch close - *unstructured.Unstructured total <X> items received
This comes from the underlying client-go library, and just indicates when the shared caches are periodically flushed to place an upper bound on cache inconsistency due to potential silent failures in long-running watches.
Webhook Logs
If you return an HTTP error code (e.g., 500) from your webhook, the Metacontroller server will log the text of the response body.
If you need more detail on what's happening inside your hook code, as opposed to what Metacontroller does for you, you'll need to add log statements to your own code and inspect the logs on your webhook server.
API reference
This section contains detailed reference information for the APIs offered by Metacontroller.
See the user guide for introductions and step-by-step walkthroughs.
Apply Semantics
This page describes how Metacontroller emulates kubectl apply.
CompositeController
CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects...
ControllerRevision
ControllerRevision is an internal API used by Metacontroller to implement declarative rolling updates.
DecoratorController
DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which re...
Hook
This page describes how hook targets are defined in various APIs.
Apply Semantics
This page describes how Metacontroller emulates kubectl apply
.
In most cases, you should be able to think of Metacontroller's apply semantics
as being the same as kubectl apply
, but there are some differences.
Motivation
This section explains why Metacontroller uses apply
semantics.
As an example, suppose you create a simple Pod like this
with kubectl apply -f
:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
labels:
app: my-app
spec:
containers:
- name: nginx
image: nginx
If you then read back the Pod you created with kubectl get pod my-pod -o yaml
,
you'll see a lot of extra fields filled in that you never set:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
nginx'
creationTimestamp: 2018-04-13T00:46:51Z
labels:
app: my-app
name: my-pod
namespace: default
resourceVersion: "28573496"
selfLink: /api/v1/namespaces/default/pods/my-pod
uid: 27f1b2e1-3eb4-11e8-88d2-42010a800051
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources:
requests:
cpu: 100m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
[...]
These fields may represent materialized default values and other metadata set by the API server, values set by built-in admission control or external admission plugins, or even values set by other controllers.
Rather than sifting through all that to find the fields you care about,
kubectl apply
lets you go back to your original, simple file,
and make a change:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
labels:
app: my-app
role: staging # added a label
spec:
containers:
- name: nginx
image: nginx
If you try to kubectl create -f
your updated file, it will fail because
you can't create something that already exists.
If you try to kubectl replace -f
your updated file, it will fail because
it thinks you're trying to unset all those extra fields.
However, if you use kubectl apply -f
with your updated file,
it will update only the part you changed (adding a label),
and leave all those extra fields untouched.
Metacontroller treats the desired objects you return from your hook in much the same way (but with some differences, such as support for strategic merge inside CRDs). As a result, you should always return the short form containing only the fields you care about, not the long form containing all the extra fields.
This generally means you should use the same code path to update things as you do to create them. Just generate a full JSON object from scratch every time, containing all the fields you care about, and only the fields you care about.
Metacontroller will figure out whether the object needs to be created or updated, and which fields it should and shouldn't touch in the case of an update.
Dynamic Apply
The biggest difference between kubectl's implementation of apply and Metacontroller's is that Metacontroller can emulate strategic merge inside CRDs.
For example, suppose you have a CRD with an embedded Pod template:
apiVersion: ctl.enisoc.com/v1
kind: CatSet # this resource is served via CRD
metadata:
name: my-catset
spec:
template: # embedded Pod template in CRD
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
You create this with apply:
kubectl apply -f catset.yaml
The promise of apply
is that it will "apply the changes you’ve made, without overwriting any automated changes to properties you haven’t specified".
As an example, suppose some other automation decides to edit your Pod template and add a sidecar container:
apiVersion: ctl.enisoc.com/v1
kind: CatSet # this resource is served via CRD
metadata:
name: my-catset
spec:
template: # embedded Pod template in CRD
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
- name: sidecar
image: log-uploader # fake sidecar example
Now suppose you change something in your local file and reapply it:
kubectl apply -f catset.yaml
Because kubectl apply
doesn't support strategic merge inside CRDs,
this will completely replace the containers
list with yours,
removing the sidecar
container.
By contrast, if this had been a Deployment or StatefulSet,
kubectl apply
would have preserved the sidecar
container.
As a result, if a controller uses kubectl's apply implementation with CRDs, that controller will fight against automation that tries to add sidecar containers or makes other changes to lists of objects that Kubernetes expects to be treated like associative arrays (ports, volumes, etc.).
To avoid this fighting, and to make the experience of using CRDs beter match that of native resources, Metacontroller uses an alternative implementation of apply logic that's based on convention instead of configuration.
Conventions
The main convention that Metacontroller enforces on apply semantics is how to detect and handle "associative lists".
In Kubernetes API conventions, an associative list is a list of objects or scalars that should be treated as if it were a map (associative array), but because of limitations in JSON/YAML it looks the same as an ordered list when serialized.
For native resources, kubectl apply
determines which lists are associative
lists by configuration: it must have compiled-in knowledge of all the resources,
and metadata about how each of their fields should be treated.
There is currently no mechanism for CRDs to specify this metadata,
which is why kubectl apply
falls back to assuming all lists are "atomic",
and should never be merged (only replaced entirely).
Even if there were a mechanism for CRDs to specify metadata for every field
(e.g. through extensions to OpenAPI),
it's not clear that it makes sense to require every CRD author to do so
in order for their resources to behave correctly when used with kubecl apply
.
One alternative that has been considered for such "schemaless CRDs" is to
establish a convention -- as long as your CRD follows the convention, you
don't need to provide configuration.
Metacontroller implements one such convention that empirically handles many common cases encountered when embedding Pod templates in CRDs (although it has limitations), developed by surveying the use of associative lists across the resources built into Kubernetes:
- A list is detected as an associative list if and only if all of the
following conditions are met:
- All items in the list are JSON objects (not scalars, nor other lists).
- All objects in the list have some field name in common,
where that field name is one of the conventional merge keys
(most commonly
name
).
- If a list is detected as an associative list, the conventional
field name that all objects have in common (e.g.
name
) is used as the merge key.- If more than one conventional merge key might work, pick only one according to a fixed order.
This allows Metacontroller to "do the right thing" in the majority of cases, without requiring advance knowledge about the resources it's working with -- knowledge that's not available anywhere in the case of CRDs.
In the future, Metacontroller will likely switch from this custom apply implementation to server-side apply, which is trying to solve the broader problem for all components that interact with the Kubernetes API. However, it's not yet clear whether that proposal will embrace schemaless CRDs and support apply semantics on them.
Limitations
A convention-based approach is necessarily more limiting than the native apply implementation, which supports arbitrary per-field configuration. The trade-off is that conventions reduce boilerplate and lower the barrier to entry for simple use cases.
This section lists some examples of configurations that the native apply allows, but are currently not supported in Metacontroller's convention-based apply. If any of these are blockers for you, please file an issue describing your use case.
- Atomic object lists
- A list of objects that share one of the conventional keys, but should nevertheless be treated atomically (replaced rather than merged).
- Unconventional associative list keys
- An associative list that doesn't use one of the conventional keys.
- Multi-field associative list keys
- A key that's composed of two or more fields (e.g. both
port
andprotocol
).
- A key that's composed of two or more fields (e.g. both
- Scalar-valued associative lists
- A list of scalars (not objects) that should be merged as if the scalar values were field names in an object.
CompositeController
CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects based on the desired state specified in a parent object.
Workload controllers like Deployment and StatefulSet are examples of existing controllers that fit this pattern.
This page is a detailed reference of all the features available in this API. See the Create a Controller guide for a step-by-step walkthrough.
Example
This example CompositeController defines a controller that behaves like StatefulSet.
apiVersion: metacontroller.k8s.io/v1alpha1
kind: CompositeController
metadata:
name: catset-controller
spec:
parentResource:
apiVersion: ctl.enisoc.com/v1
resource: catsets
revisionHistory:
fieldPaths:
- spec.template
childResources:
- apiVersion: v1
resource: pods
updateStrategy:
method: RollingRecreate
statusChecks:
conditions:
- type: Ready
status: "True"
- apiVersion: v1
resource: persistentvolumeclaims
hooks:
sync:
webhook:
url: http://catset-controller.metacontroller/sync
timeout: 10s
Spec
A CompositeController spec
has the following fields:
Field | Description |
---|---|
parentResource | A single resource rule specifying the parent resource. |
childResources | A list of resource rules specifying the child resources. |
resyncPeriodSeconds | How often, in seconds, you want every parent object to be resynced (sent to your hook), even if no changes are detected. |
generateSelector | If true , ignore the selector in each parent object and instead generate a unique selector that prevents overlap with other objects. |
hooks | A set of lambda hooks for defining your controller's behavior. |
Parent Resource
The parent resource is the "entry point" for the CompositeController. It should contain the information your controller needs to create children, such as a Pod template if your controller creates Pods. This is often a custom resource that you define (e.g. with CRD), and for which you are now implementing a custom controller.
CompositeController expects to have full control over this resource. That is, you shouldn't define a CompositeController with a parent resource that already has its own controller. See DecoratorController for an API that's better suited for adding behavior to existing resources.
The parentResource
rule has the following fields:
Field | Description |
---|---|
apiVersion | The API <group>/<version> of the parent resource, or just <version> for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the parent resource. (e.g. deployments , replicasets , statefulsets ) |
labelSelector | An optional label selector for narrowing down the objects to target. When not set defaults to all objects |
revisionHistory | If any child resources use rolling updates, this field specifies how parent revisions are tracked. |
ignoreStatusChanges | An optional field through which status changes can be ignored for reconcilation. If set to true , only spec changes or labels/annotations changes will reconcile the parent resource. |
Label Selector
Kubernetes APIs use labels and selectors to define subsets of objects, such as the Pods managed by a given ReplicaSet.
The parent resource of a CompositeController is assumed to have a
spec.selector
that matches the form of spec.selector
in built-in resources
like Deployment and StatefulSet (with matchLabels
and/or matchExpressions
).
If the parent object doesn't have this field, or it can't be parsed in the expected label selector format, the sync hook for that parent will fail, unless you are using selector generation.
The parent's label selector determines which child objects a given parent will try to manage, according to the ControllerRef rules. Metacontroller automatically handles orphaning and adoption for you, and will only send you the observed states of children you own.
These rules imply:
- Children you create must have labels that satisfy the parent's selector, or else they will be immediately orphaned and you'll never see them again.
- If other controllers or users create orphaned objects that match the parent's selector, Metacontroller will try to adopt them for you.
- If Metacontroller adopts an object, and you subsequently decline to list that object in your desired list of children, it will get deleted (because you now own it, but said you don't want it).
To avoid confusion, it's therefore important that users of your custom
controller specify a spec.selector
(on each parent object) that is
sufficiently precise to discriminate its child objects from those of other
parents in the same namespace.
Revision History
Within the parentResource
rule, the revisionHistory
field has the following subfields:
Field | Description |
---|---|
fieldPaths | A list of field path strings (e.g. spec.template ) specifying which parent fields trigger rolling updates of children (for any child resources that use rolling updates). Changes to other parent fields (e.g. spec.replicas ) apply immediately. Defaults to ["spec"] , meaning any change in the parent's spec triggers a rolling update. |
Child Resources
This list should contain a rule for every type of child resource that your controller creates on behalf of each parent.
Each entry in the childResources
list has the following fields:
Field | Description |
---|---|
apiVersion | The API group/version of the child resource, or just version for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the child resource. (e.g. deployments , replicasets , statefulsets ) |
updateStrategy | An optional field that specifies how to update children when they already exist but don't match your desired state. If no update strategy is specified, children of that type will never be updated if they already exist. |
Child Update Strategy
Within each rule in the childResources
list, the updateStrategy
field
has the following subfields:
Field | Description |
---|---|
method | A string indicating the overall method that should be used for updating this type of child resource. The default is OnDelete , which means don't try to update children that already exist. |
statusChecks | If any rolling update method is selected, children that have already been updated must pass these status checks before the rollout will continue, please also read this section |
Child Update Methods
Within each child resource's updateStrategy
, the method
field can have
these values:
Method | Description |
---|---|
OnDelete | Don't update existing children unless they get deleted by some other agent. |
Recreate | Immediately delete any children that differ from the desired state, and recreate them in the desired state. |
InPlace | Immediately update any children that differ from the desired state. |
RollingRecreate | Delete each child that differs from the desired state, one at a time, and recreate each child before moving on to the next one. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks. |
RollingInPlace | Update each child that differs from the desired state, one at a time. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks. |
Child Update Status Checks
Within each updateStrategy
, the statusChecks
field has the following subfields:
Field | Description |
---|---|
conditions | A list of status condition checks that must all pass on already-updated children for the rollout to continue. |
Status Condition Check
Within a set of statusChecks
, each item in the conditions
list has the following subfields:
Field | Description |
---|---|
type | A string specifying the status condition type to check. |
status | A string specifying the required status of the given status condition. If none is specified, the condition's status is not checked. |
reason | A string specifying the required reason of the given status condition. If none is specified, the condition's reason is not checked. |
Resync Period
By default, your sync hook will only be called when something changes in one of the resources you're watching, or when the local cache is flushed.
Sometimes you may want to sync periodically even if nothing has changed in the Kubernetes API objects, either to simply observe the passage of time, or because your hook takes external state into account. For example, CronJob uses a periodic resync to check whether it's time to start a new Job.
The resyncPeriodSeconds
value specifies how often to do this.
Each time it triggers, Metacontroller will send sync hook requests for
all objects of the parent resource type, with the latest observed
values of all the necessary objects.
Note that these objects will be retrieved from Metacontroller's local cache (kept up-to-date through watches), so adding a resync shouldn't add more load on the API server, unless you actually change objects. For example, it's relatively cheap to use this setting to poll until it's time to trigger some change, as long as most sync calls result in a no-op (no CRUD operations needed to achieve desired state).
Generate Selector
Usually, each parent object managed by a CompositeController must have its own
user-specified label selector, just like each
Deployment has its own label selector in spec.selector
.
However, sometimes it makes more sense to let the user of your API pretend there
are no labels or label selectors.
For example, the built-in Job API doesn't make you specify labels for your
Pods, and you can leave spec.selector
unset.
Because each Job object represents a unique invocation at a point in time,
you wouldn't expect a newly-created Job to be satisfied by finding a
pre-existing Pod that just happens to have the right labels.
On the other hand, a ReplicaSet assumes all Pods that match its selector are
interchangeable, so it would be happy to have one less replica it has to create.
If you set spec.generateSelector
to true
in your CompositeController
definition, Metacontroller will do the following:
- When creating children for you, Metacontroller will automatically add a label
that points to the parent object's unique ID (
metadata.uid
). - Metacontroller will not expect each parent object to contain a
spec.selector
, and will ignore the value even if one is set. - Metacontroller will manage children as if each parent object had an "imaginary" label selector that points to the unique ID label that Metacontroller added to all your children.
The end result is that you and the users of your API don't have to think about
labels or selectors, similar to the Job API.
The downside is that your API won't support all the same capabilities as
built-in APIs.
For example, with ReplicaSet or StatefulSet, you can delete the controller with
kubectl delete --cascade=false
to keep the Pods around, and later create a new
controller with the same selector to adopt those existing Pods instead of making
new ones from scratch.
Hooks
Within the CompositeController spec
, the hooks
field has the following subfields:
Field | Description |
---|---|
sync | Specifies how to call your sync hook, if any. |
finalize | Specifies how to call your finalize hook, if any. |
customize | Specifies how to call your customize hook, if any. |
Each field of hooks
contains subfields that specify how to invoke
that hook, such as by sending a request to a webhook.
Sync Hook
The sync
hook is how you specify which children to create/maintain
for a given parent -- in other words, your desired state.
Based on the CompositeController spec, Metacontroller gathers up all the resources you said you need to decide on the desired state, and sends you their latest observed states.
After you return your desired state, Metacontroller begins to take action to converge towards it -- creating, deleting, and updating objects as appropriate.
A simple way to think about your sync hook implementation is like a script
that generates JSON to be sent to kubectl apply
.
However, unlike a one-off client-side generator, your script has access to
the latest observed state in the cluster, and will automatically get called
any time that observed state changes.
Sync Hook Request
A separate request will be sent for each parent object, so your hook only needs to think about one parent at a time.
The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:
Field | Description |
---|---|
controller | The whole CompositeController object, like what you might get from kubectl get compositecontroller <name> -o json . |
parent | The parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json . |
children | An associative array of child objects that already exist. |
related | An associative array of related objects that exists, if customize hook was specified. See the customize hook |
finalizing | This is always false for the sync hook. See the finalize hook for details. |
Each field of the children
object represents one of the types of child resources
you specified in your CompositeController spec.
The field name for each child type is <Kind>.<apiVersion>
,
where <apiVersion>
could be just <version>
(for a core resource)
or <group>/<version>
, just like you'd write in a YAML file.
For example, the field name for Pods would be Pod.v1
,
while the field name for StatefulSets might be StatefulSet.apps/v1
.
For resources that exist in multiple versions, the apiVersion
you specify
in the child resource rule is the one you'll be sent.
Metacontroller requires you to be explicit about the version you expect
because it does conversion for you as needed, so your hook doesn't need
to know how to convert between different versions of a given resource.
Within each child type (e.g. in children['Pod.v1']
), there is another
associative array that maps from the child's path relative to the parent
to the JSON representation, like what you might get from
kubectl get <child-resource> <child-name> -o json
.
If the parent and child are of the same scope - both cluster or both namespace -
then the key is only the child's .metadata.name
. If the parent is
cluster scoped and the child is namespace scoped, then the key will be of the
form {.metadata.namespace}/{.metadata.name}
. This is to disambiguate between
two children with the same name in different namespaces. A parent may never be
namespace scoped while a child is cluster scoped.
For example, a Pod named my-pod
in the my-namespace
namespace could be
accessed as follows if the parent is also in my-namespace
:
request.children['Pod.v1']['my-pod']
Alternatively, if the parent resource is cluster scoped, the Pod could be accessed as:
request.children['Pod.v1']['my-namespace/my-pod']
Note that you will only be sent children that you "own" according to the ControllerRef rules. That means, for a given parent object, you will only see children whose labels match the parent's label selector, and that don't belong to any other parent.
There will always be an entry in children
for every child resource rule,
even if no children of that type were observed at the time of the sync.
For example, if you listed Pods as a child resource rule, but no existing Pods
matched the parent's selector, you will receive:
{
"children": {
"Pod.v1": {}
}
}
as opposed to:
{
"children": {}
}
Related resources, represented under related
field, are present in the same form as children
,
but representing resources matching customize
hook response for given parent
object.
Those object are not managed by controller, therefore are unmodificable, but you can use them to calculate children
's.
Some existing examples implementing this approach are :
- ConfigMapPropagation - makes copy of given ConfigMap in several namespaces.
- GlobalConfigMap - makes copy of given ConfigMap in every namespace.
- SecretPropagation - makes copy of given Secret in reach namespace satisfying label selector.
Please note, than when related resources is updated, sync
hook is triggered again (even if parent
object and children
does not change) - and you can recalculate
children state according to fresh view of related objects.
Sync Hook Response
The body of your response should be a JSON object with the following fields:
Field | Description |
---|---|
status | A JSON object that will completely replace the status field within the parent object. |
children | A list of JSON objects representing all the desired children for this parent object. |
resyncAfterSeconds | Set the delay (in seconds, as a float) before an optional, one-time, per-object resync. |
What you put in status
is up to you, but usually it's best to follow
conventions established by controllers like Deployment.
You should compute status
based only on the children that existed
when your hook was called; status represents a report on the last
observed state, not the new desired state.
The children
field should contain a flat list of objects,
not an associative array.
Metacontroller groups the objects it sends you by type and name as a
convenience to simplify your scripts, but it's actually redundant
since each object contains its own apiVersion
, kind
, and metadata.name
.
It's important to include the apiVersion
and kind
in objects
you return, and also to ensure that you list every type of
child resource you plan to create in the
CompositeController spec.
If the parent resource is cluster scoped and the child resource is namespaced,
it's important to include the .metadata.namespace
since the namespace cannot
be inferred from the parent's namespace.
Any objects sent as children in the request that you decline to return in your response list will be deleted. However, you shouldn't directly copy children from the request into the response because they're in different forms.
Instead, you should think of each entry in the list of children
as being
sent to kubectl apply
.
That is, you should set only the fields that you care about.
You can optionally set resyncAfterSeconds
to a value greater than 0 to request
that the sync
hook be called again with this particular parent object after
some delay (specified in seconds, with decimal fractions allowed).
Unlike the controller-wide resyncPeriodSeconds
, this is a
one-time request (not a request to start periodic resyncs), although you can
always return another resyncAfterSeconds
value from subsequent sync
calls.
Also unlike the controller-wide setting, this request only applies to the
particular parent object that this sync
call sent, so you can request
different delays (or omit the request) depending on the state of each object.
Note that your webhook handler must return a response with a status code of 200
to be considered successful. Metacontroller will wait for a response for up to the
amount defined in the Webhook spec.
Finalize Hook
If the finalize
hook is defined, Metacontroller will add a finalizer to the
parent object, which will prevent it from being deleted until your hook has had
a chance to run and the response indicates that you're done cleaning up.
This is useful for doing ordered teardown of children, or for cleaning up
resources you may have created in an external system.
If you don't define a finalize
hook, then when a parent object is deleted,
the garbage collector will delete all your children immediately,
and no hooks will be called.
The semantics of the finalize
hook are mostly equivalent to those of
the sync
hook.
Metacontroller will attempt to reconcile the desired states you return in the
children
field, and will set status
on the parent.
The main difference is that finalize
will be called instead of sync
when
it's time to clean up because the parent object is pending deletion.
Note that, just like sync
, your finalize
handler must be idempotent.
Metacontroller might call your hook multiple times as the observed state
changes, possibly even after you first indicate that you're done finalizing.
Your handler should know how to check what still needs to be done
and report success if there's nothing left to do.
Both sync
and finalize
have a request field called finalizing
that
indicates which hook was actually called.
This lets you implement finalize
either as a separate handler or as a check
within your sync
handler, depending on how much logic they share.
To use the same handler for both, just define a finalize
hook and set it to
the same value as your sync
hook.
Finalize Hook Request
The finalize
hook request has all the same fields as the
sync
hook request, with the following changes:
Field | Description |
---|---|
finalizing | This is always true for the finalize hook. See the finalize hook for details. |
If you share the same handler for both sync
and finalize
, you can use the
finalizing
field to tell whether it's time to clean up or whether it's a
normal sync.
If you define a separate handler just for finalize
, there's no need to check
the finalizing
field since it will always be true
.
Finalize Hook Response
The finalize
hook response has all the same fields as the
sync
hook response, with the following additions:
Field | Description |
---|---|
finalized | A boolean indicating whether you are done finalizing. |
To perform ordered teardown, you can generate children just like you would for
sync
, but omit some children from the desired state depending on the observed
set of children that are left.
For example, if you observe [A,B,C]
, generate only [A,B]
as your desired
state; if you observe [A,B]
, generate only [A]
; if you observe [A]
,
return an empty desired list []
.
Once the observed state passed in with the finalize
request meets all your
criteria (e.g. no more children were observed), and you have checked all
other criteria (e.g. no corresponding external resource exists), return true
for the finalized
field in your response.
Note that you should not return finalized: true
the first time you return
a desired state that you consider "final", since there's no guarantee that your
desired state will be reached immediately.
Instead, you should wait until the observed state matches what you want.
If the observed state passed in with the request doesn't meet your criteria,
you can return a successful response (HTTP code 200) with finalized: false
,
and Metacontroller will call your hook again automatically if anything changes
in the observed state.
If the only thing you're still waiting for is a state change in an external
system, and you don't need to assert any new desired state for your children,
returning success from the finalize
hook may mean that Metacontroller doesn't
call your hook again until the next periodic resync.
To reduce the delay, you can request a one-time, per-object resync by setting
resyncAfterSeconds
in your hook response, giving you
a chance to recheck the external state without holding up a slot in the work
queue.
Customize Hook
ControllerRevision
ControllerRevision is an internal API used by Metacontroller to implement declarative rolling updates.
Users of Metacontroller normally shouldn't need to know about this API, but it is documented here for Metacontroller contributors, as well as for troubleshooting.
Note that this is different from the ControllerRevision in apps/v1
,
although it serves a similar purpose.
You will likely need to use a fully-qualified resource name to inspect
Metacontroller's ControllerRevisions:
kubectl get controllerrevisions.metacontroller.k8s.io
Each ControllerRevision's name is a combination of the name and API group (excluding the version suffix) of the resource that it's a revision of, as well as a hash that is deterministic yet unique (used only for idempotent creation, not for lookup).
By default, ControllerRevisions belonging to a particular parent instance will get garbage-collected if the parent is deleted. However, it is possible to orphan ControllerRevisions during parent deletion, and then create a replacement parent to adopt them. ControllerRevisions are adopted based on the parent's label selector, the same way controllers like ReplicaSet adopt Pods.
Example
apiVersion: metacontroller.k8s.io/v1alpha1
kind: ControllerRevision
metadata:
name: catsets.ctl.enisoc.com-5463ba99b804a121d35d14a5ab74546d1e8ba953
labels:
app: nginx
component: backend
metacontroller.k8s.io/apiGroup: ctl.enisoc.com
metacontroller.k8s.io/resource: catsets
parentPatch:
spec:
template:
[...]
children:
- apiGroup: ""
kind: Pod
names:
- nginx-backend-0
- nginx-backend-1
- nginx-backend-2
Parent Patch
The parentPatch
field stores a partial representation of the parent object
at a given revision, containing only those fields listed by the lambda controller
author as participating in rolling updates.
For example, if a CompositeController's revision history specifies
a fieldPaths
list of ["spec.template"]
, the parent patch will contain
only spec.template
and any subfields nested within it.
This mirrors the selective behavior of rolling updates in built-in APIs like Deployment and StatefulSet. Any fields that aren't part of the parent patch take effect immediately, rather than rolling out gradually.
Children
The children
field stores a list of child objects that "belong" to this
particular revision of the parent.
This is how Metacontroller keeps track of the current desired revision of a given child. For example, if a Pod that hasn't been updated yet gets deleted by a Node drain, it should be replaced at the revision it was on before it got deleted, not at the latest revision.
When Metacontroller decides it's time to update a given child to another revision, it first records this intention by updating the relevant ControllerRevision objects. After committing these records, it then begins updating that child according to the configured child update strategy. This ensures that the intermediate progress of the rollout is persisted in the API server so it survives process restarts.
Children are grouped by API Group (excluding the version suffix) and Kind. For each Group-Kind, we store a list of object names. Note that parent and children must be in the same namespace, and ControllerRevisions for a given parent also live in that parent's namespace.
DecoratorController
DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which resources to watch, as well as filters on labels and annotations.
This page is a detailed reference of all the features available in this API. See the Create a Controller guide for a step-by-step walkthrough.
Example
This example DecoratorController attaches a Service for each Pod belonging to a StatefulSet, for any StatefulSet that requests this behavior through a set of annotations.
apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
name: service-per-pod
spec:
resources:
- apiVersion: apps/v1
resource: statefulsets
annotationSelector:
matchExpressions:
- {key: service-per-pod-label, operator: Exists}
- {key: service-per-pod-ports, operator: Exists}
attachments:
- apiVersion: v1
resource: services
hooks:
sync:
webhook:
url: http://service-per-pod.metacontroller/sync-service-per-pod
timeout: 10s
Spec
A DecoratorController spec
has the following fields:
Field | Description |
---|---|
resources | A list of resource rules specifying which objects to target for decoration (adding behavior). |
attachments | A list of resource rules specifying what this decorator can attach to the target resources. |
resyncPeriodSeconds | How often, in seconds, you want every target object to be resynced (sent to your hook), even if no changes are detected. |
hooks | A set of lambda hooks for defining your controller's behavior. |
Resources
Each DecoratorController can target one or more types of resources. For every object that matches one of these rules, Metacontroller will call your sync hook to ask for your desired state.
Each entry in the resources
list has the following fields:
Field | Description |
---|---|
apiVersion | The API <group>/<version> of the target resource, or just <version> for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the target resource. (e.g. deployments , replicasets , statefulsets ) |
labelSelector | An optional label selector for narrowing down the objects to target. |
annotationSelector | An optional annotation selector for narrowing down the objects to target. |
ignoreStatusChanges | An optional field through which status changes can be ignored for reconcilation. If set to true , only spec changes or labels/annotations changes will reconcile the parent resource. |
Label Selector
The labelSelector
field within a resource rule has the following subfields:
Field | Description |
---|---|
matchLabels | A map of key-value pairs representing labels that must exist and have the specified values in order for an object to satisfy the selector. |
matchExpressions | A list of set-based requirements on labels in order for an object to satisfy the selector. |
This label selector has the same format and semantics as the selector in built-in APIs like Deployment.
If a labelSelector
is specified for a given resource type,
the DecoratorController will ignore any objects of that type
that don't satisfy the selector.
If a resource rule has both a labelSelector
and an annotationSelector
,
the DecoratorController will only target objects of that type that satisfy
both selectors.
Annotation Selector
The annotationSelector
field within a resource rule has the following subfields:
Field | Description |
---|---|
matchAnnotations | A map of key-value pairs representing annotations that must exist and have the specified values in order for an object to satisfy the selector. |
matchExpressions | A list of set-based requirements on annotations in order for an object to satisfy the selector. |
The annotation selector has an analogous format and semantics to the
label selector (note the field name matchAnnotations
rather than matchLabels
).
If an annotationSelector
is specified for a given resource type,
the DecoratorController will ignore any objects of that type
that don't satisfy the selector.
If a resource rule has both a labelSelector
and an annotationSelector
,
the DecoratorController will only target objects of that type that satisfy
both selectors.
Attachments
This list should contain a rule for every type of resource your controller wants to attach to an object of one of the targeted resources.
Unlike child resources in CompositeController, attachments are not related to the target object through labels and label selectors. This allows you to attach arbitrary things (which may not have any labels) to other arbitrary things (which may not even have a selector).
Instead, attachments are only connected to the target object through owner references, meaning they will get cleaned up if the target object is deleted.
Each entry in the attachments
list has the following fields:
Field | Description |
---|---|
apiVersion | The API group/version of the attached resource, or just version for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the attached resource. (e.g. deployments , replicasets , statefulsets ) |
updateStrategy | An optional field that specifies how to update attachments when they already exist but don't match your desired state. If no update strategy is specified, attachments of that type will never be updated if they already exist. |
Attachment Update Strategy
Within each rule in the attachments
list, the updateStrategy
field
has the following subfields:
Field | Description |
---|---|
method | A string indicating the overall method that should be used for updating this type of attachment resource. The default is OnDelete , which means don't try to update attachments that already exist. |
Attachment Update Methods
Within each attachment resource's updateStrategy
, the method
field can have
these values:
Method | Description |
---|---|
OnDelete | Don't update existing attachments unless they get deleted by some other agent. |
Recreate | Immediately delete any attachments that differ from the desired state, and recreate them in the desired state. |
InPlace | Immediately update any attachments that differ from the desired state. |
Note that DecoratorController doesn't directly support rolling update of attachments because you can compose such behavior by attaching a CompositeController (or any other API that supports declarative rolling update, like Deployment or StatefulSet).
Resync Period
The resyncPeriodSeconds
field in DecoratorController's spec
works similarly to the same field in
CompositeController.
Hooks
Within the DecoratorController spec
, the hooks
field has the following subfields:
Field | Description |
---|---|
sync | Specifies how to call your sync hook, if any. |
finalize | Specifies how to call your finalize hook, if any. |
customize | Specifies how to call your customize hook, if any. |
Each field of hooks
contains subfields that specify how to invoke
that hook, such as by sending a request to a webhook.
Sync Hook
The sync
hook is how you specify which attachments to create/maintain
for a given target object -- in other words, your desired state.
Based on the DecoratorController spec, Metacontroller gathers up all the resources you said you need to decide on the desired state, and sends you their latest observed states.
After you return your desired state, Metacontroller begins to take action to converge towards it -- creating, deleting, and updating objects as appropriate.
A simple way to think about your sync hook implementation is like a script
that generates JSON to be sent to kubectl apply
.
However, unlike a one-off client-side generator, your script has access to
the latest observed state in the cluster, and will automatically get called
any time that observed state changes.
Sync Hook Request
A separate request will be sent for each target object, so your hook only needs to think about one target object at a time.
The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:
Field | Description |
---|---|
controller | The whole DecoratorController object, like what you might get from kubectl get decoratorcontroller <name> -o json . |
object | The target object, like what you might get from kubectl get <target-resource> <target-name> -o json . |
attachments | An associative array of attachments that already exist. |
related | An associative array of related objects that exists, if customize hook was specified. See the customize hook |
finalizing | This is always false for the sync hook. See the finalize hook for details. |
Each field of the attachments
object represents one of the types of
attachment resources in your DecoratorController spec.
The field name for each attachment type is <Kind>.<apiVersion>
,
where <apiVersion>
could be just <version>
(for a core resource)
or <group>/<version>
, just like you'd write in a YAML file.
For example, the field name for Pods would be Pod.v1
,
while the field name for StatefulSets might be StatefulSet.apps/v1
.
For resources that exist in multiple versions, the apiVersion
you specify
in the attachment resource rule is the one you'll be sent.
Metacontroller requires you to be explicit about the version you expect
because it does conversion for you as needed, so your hook doesn't need
to know how to convert between different versions of a given resource.
Within each attachment type (e.g. in attachments['Pod.v1']
), there is another
associative array that maps from the attachment's path relative to the parent to
the JSON representation, like what you might get from
kubectl get <attachment-resource> <attachment-name> -o json
.
If the parent and attachment are of the same scope - both cluster or both namespace -
then the key is only the object's .metadata.name
. If the parent is
cluster scoped and the attachment is namespace scoped, then the key will be of the
form {.metadata.namespace}/{.metadata.name}
. This is to disambiguate between
two attachments with the same name in different namespaces. A parent may never
be namespace scoped while an attachment is cluster scoped.
For example, a Pod named my-pod
in the my-namespace
namespace could be
accessed as follows if the parent is also in my-namespace
:
request.attachments['Pod.v1']['my-pod']
Alternatively, if the parent resource is cluster scoped, the Pod could be accessed as:
request.attachments['Pod.v1']['my-namespace/my-pod']
Note that you will only be sent objects that are owned by the target (i.e. objects you attached), not all objects of that resource type.
There will always be an entry in attachments
for every attachment resource rule,
even if no attachments of that type were observed at the time of the sync.
For example, if you listed Pods as an attachment resource rule,
but no existing Pods have been attached, you will receive:
{
"attachments": {
"Pod.v1": {}
}
}
as opposed to:
{
"attachments": {}
}
Related resources, represented under related
field, are present in the same form as attachements
,
but representing resources matching customize
hook response for given parent
object.
Those object are not managed by controller, therefore are unmodificable, but you can use them to calculate attachements
.
Some existing examples implementing this approach are :
- ConfigMapPropagation - makes copy of given ConfigMap in several namespaces.
- GlobalConfigMap - makes copy of given ConfigMap in every namespace.
- SecretPropagation - makes copy of given Secret in reach namespace satisfying label selector.
Please note, than when related resources is updated, sync
hook is triggered again (even if parent
object and attachements
does not change) - and you can recalculate
children state according to fresh view of related objects.
Sync Hook Response
The body of your response should be a JSON object with the following fields:
Field | Description |
---|---|
labels | A map of key-value pairs for labels to set on the target object. |
annotations | A map of key-value pairs for annotations to set on the target object. |
status | A JSON object that will completely replace the status field within the target object. Leave unspecified or null to avoid changing status . |
attachments | A list of JSON objects representing all the desired attachments for this target object. |
resyncAfterSeconds | Set the delay (in seconds, as a float) before an optional, one-time, per-object resync. |
By convention, the controller for a given resource should not modify its own spec, so your decorator can't mutate the target's spec.
As a result, decorators currently cannot modify the target object except to optionally set labels, annotations, and status on it. Note that if the target resource already has its own controller, that controller might ignore and overwrite any status updates you make.
The attachments
field should contain a flat list of objects,
not an associative array.
Metacontroller groups the objects it sends you by type and name as a
convenience to simplify your scripts, but it's actually redundant
since each object contains its own apiVersion
, kind
, and metadata.name
.
It's important to include the apiVersion
and kind
in objects
you return, and also to ensure that you list every type of
attachment resource you plan to create in the
DecoratorController spec.
If the parent resource is cluster scoped and the child resource is namespaced,
it's important to include the .metadata.namespace
since the namespace cannot
be inferred from the parent's namespace.
Any objects sent as attachments in the request that you decline to return in your response list will be deleted. However, you shouldn't directly copy attachments from the request into the response because they're in different forms.
Instead, you should think of each entry in the list of attachments
as being
sent to kubectl apply
.
That is, you should set only the fields that you care about.
You can optionally set resyncAfterSeconds
to a value greater than 0 to request
that the sync
hook be called again with this particular parent object after
some delay (specified in seconds, with decimal fractions allowed).
Unlike the controller-wide resyncPeriodSeconds
, this is a
one-time request (not a request to start periodic resyncs), although you can
always return another resyncAfterSeconds
value from subsequent sync
calls.
Also unlike the controller-wide setting, this request only applies to the
particular parent object that this sync
call sent, so you can request
different delays (or omit the request) depending on the state of each object.
Note that your webhook handler must return a response with a status code of 200
to be considered successful. Metacontroller will wait for a response for up to the
amount defined in the Webhook spec.
Finalize Hook
If the finalize
hook is defined, Metacontroller will add a finalizer to the
parent object, which will prevent it from being deleted until your hook has had
a chance to run and the response indicates that you're done cleaning up.
This is useful for doing ordered teardown of attachments, or for cleaning up
resources you may have created in an external system.
If you don't define a finalize
hook, then when a parent object is deleted,
the garbage collector will delete all your attachments immediately,
and no hooks will be called.
In addition to finalizing when an object is deleted, Metacontroller will also
call your finalize
hook on objects that were previously sent to sync
but now no longer match the DecoratorController's label and annotation selectors.
This allows you to clean up after yourself when the object has been updated to
opt out of the functionality added by your decorator, even if the object is not
being deleted.
If you don't define a finalize
hook, then when the object opts out, any
attachments you added will remain until the object is deleted,
and no hooks will be called.
The semantics of the finalize
hook are mostly equivalent to those of
the sync
hook.
Metacontroller will attempt to reconcile the desired states you return in the
attachments
field, and will set labels and annotations as requested.
The main difference is that finalize
will be called instead of sync
when
it's time to clean up because the parent object is pending deletion.
Note that, just like sync
, your finalize
handler must be idempotent.
Metacontroller might call your hook multiple times as the observed state
changes, possibly even after you first indicate that you're done finalizing.
Your handler should know how to check what still needs to be done
and report success if there's nothing left to do.
Both sync
and finalize
have a request field called finalizing
that
indicates which hook was actually called.
This lets you implement finalize
either as a separate handler or as a check
within your sync
handler, depending on how much logic they share.
To use the same handler for both, just define a finalize
hook and set it to
the same value as your sync
hook.
Finalize Hook Request
The finalize
hook request has all the same fields as the
sync
hook request, with the following changes:
Field | Description |
---|---|
finalizing | This is always true for the finalize hook. See the finalize hook for details. |
If you share the same handler for both sync
and finalize
, you can use the
finalizing
field to tell whether it's time to clean up or whether it's a
normal sync.
If you define a separate handler just for finalize
, there's no need to check
the finalizing
field since it will always be true
.
Finalize Hook Response
The finalize
hook response has all the same fields as the
sync
hook response, with the following additions:
Field | Description |
---|---|
finalized | A boolean indicating whether you are done finalizing. |
To perform ordered teardown, you can generate attachments just like you would for
sync
, but omit some attachments from the desired state depending on the observed
set of attachments that are left.
For example, if you observe [A,B,C]
, generate only [A,B]
as your desired
state; if you observe [A,B]
, generate only [A]
; if you observe [A]
,
return an empty desired list []
.
Once the observed state passed in with the finalize
request meets all your
criteria (e.g. no more attachments were observed), and you have checked all
other criteria (e.g. no corresponding external resource exists), return true
for the finalized
field in your response.
Note that you should not return finalized: true
the first time you return
a desired state that you consider "final", since there's no guarantee that your
desired state will be reached immediately.
Instead, you should wait until the observed state matches what you want.
If the observed state passed in with the request doesn't meet your criteria,
you can return a successful response (HTTP code 200) with finalized: false
,
and Metacontroller will call your hook again automatically if anything changes
in the observed state.
If the only thing you're still waiting for is a state change in an external
system, and you don't need to assert any new desired state for your children,
returning success from the finalize
hook may mean that Metacontroller doesn't
call your hook again until the next periodic resync.
To reduce the delay, you can request a one-time, per-object resync by setting
resyncAfterSeconds
in your hook response, giving you
a chance to recheck the external state without holding up a slot in the work
queue.
Customize Hook
Customize Hook
If the customize hook is defined, Metacontroller will ask for which related objects, or classes of objects that your sync and finalize hooks need to know about. This is useful for mapping across many objects. One example would be a controller that lets you specify ConfigMaps to be placed in every Namespace. Another use-case is being able to reference other objects, e.g. the env section from a core Pod object. If you don't define a customize hook, then the related section of the hooks will be empty.
The customize
hook will not provide any information about the current state of
the cluster. Thus, the set of related objects may only depend on the state of
the parent object.
This hook may also accept other fields in future, for other customizations.
Customize Hook Request
A separate request will be sent for each parent object, so your hook only needs to think about one parent at a time.
The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:
Field | Description |
---|---|
controller | The whole CompositeController object, like what you might get from kubectl get compositecontroller <name> -o json . |
parent | The parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json . |
Customize Hook Response
The body of your response should be a JSON object with the following fields:
Field | Description |
---|---|
relatedResources | A list of JSON objects (ResourceRules ) representing all the desired related resource descriptions (). |
The relatedResources
field should contain a flat list of objects,
not an associative array.
Each ResourceRule
object should be a JSON object with the following fields:
Field | Description |
---|---|
apiVersion | The API <group>/<version> of the parent resource, or just <version> for core APIs. (e.g. v1 , apps/v1 , batch/v1 ) |
resource | The canonical, lowercase, plural name of the parent resource. (e.g. deployments , replicasets , statefulsets ) |
labelSelector | A v1.LabelSelector object. Omit if not used (i.e. Namespace or Names should be used) |
namespace | Optional. The Namespace to select in |
names | Optional. A list of strings, representing individual objects to return |
Important note
Please note that you can specify label selector or Namespace/Names, not both in the same ResourceRule
.
If the parent resource is cluster scoped and the related resource is namespaced, the namespace may be used to restrict which objects to look at. If the parent resource is namespaced, the related resources must come from the same namespace. Specifying the namespace is optional, but if specified must match.
Note that your webhook handler must return a response with a status code of 200
to be considered successful. Metacontroller will wait for a response for up to the
amount defined in the Webhook spec.
Example
Let's take a look at Global Config Map example custom resource object:
---
apiVersion: examples.metacontroller.io/v1alpha1
kind: GlobalConfigMap
metadata:
name: globalsettings
spec:
sourceName: globalsettings
sourceNamespace: global
it tells that we would like to have globalsettings
ConfigMap from global
namespace
present in each namespace.
The customize hook request will looks like :
{
'controller': '...',
'parent': '...'
}
and we need to extract information identyfying source ConfigMap.
Controller returns :
[
{
'apiVersion': 'v1',
'resource': 'configmaps',
'namespace': ${parent['spec']['sourceNamespace']},
'names': [${parent['spec']['sourceName']}]
}, {
'apiVersion': 'v1',
'resource': 'namespaces',
'labelSelector': {}
}
]
The first RelatedRule
describes that given configmap should be returned (it will be used as souce for our propagation).
The second RelatedRule
describes that we want to recieve also all namespaces in the cluster ('labelSelector': {}
means - select all objects).
With those rules, call to the sync
hook will have non empty related
field (if resources exists in the cluster), in which all objects matching given criteria will be present.
Hook
This page describes how hook targets are defined in various APIs.
Each hook that you define as part of using one of the hook-based APIs has the following fields:
Field | Description |
---|---|
webhook | Specify how to invoke this hook over HTTP(S). |
Example
webhook:
url: http://my-controller-svc/sync
Webhook
Each Webhook has the following fields:
Field | Description |
---|---|
etag | A configuration for etag logic |
url | A full URL for the webhook (e.g. http://my-controller-svc/hook ). If present, this overrides any values provided for path and service . |
timeout | A duration (in the format of Go's time.Duration) indicating the time that Metacontroller should wait for a resserviceponse. If the webhook takes longer than this time, the webhook call is aborted and retried later. Defaults to 10s. |
path | A path to be appended to the accompanying service to reach this hook (e.g. /hook ). Ignored if full url is specified. |
service | A reference to a Kubernetes Service through which this hook can be reached. |
Service Reference
Within a webhook
, the service
field has the following subfields:
Field | Description |
---|---|
name | The metadata.name of the target Service. |
namespace | The metadata.namespace of the target Service. |
port | The port number to connect to on the target Service. Defaults to 80 . |
protocol | The protocol to use for the target Service. Defaults to http . |
Etag Reference
More details in rfc7232.
Etag is a hash of response content, controller that supports etag notion should add "ETag" header to each 200 response. Metacontrollers that support "ETag" should send the "If-None-Match" header with value of ETag of cached content. If content has not changed, controller should reply with "304 Not modified" or "412 Precondition Failed", otherwise it sends 200 with "ETag" header.
This logic helps save traffic and CPU time on webhook processing.
Within a webhook
, the eTag
field has the following subfields:
Enabled *bool `json:"enabled,omitempty"`
CacheTimeoutSeconds *int32 `json:"cacheTimeoutSeconds,omitempty"`
CacheCleanupSeconds *int32 `json:"cacheCleanupSeconds,omitempty"`
Field | Description |
---|---|
Enabled | true or false. Default is false |
CacheTimeoutSeconds | Time in seconds after which ETag cache record is forgotten |
CacheCleanupSeconds | How often ETag is running garbage collector to cleanup forgotten records |
Design Docs
MapController
This is a design proposal for an API called MapController.
MapController
This is a design proposal for an API called MapController.
Background
Metacontroller APIs are meant to represent common controller patterns. The goal of these APIs as a group is to strike a balance between being flexible enough to handle unforeseen use cases and providing strong enough "rails" to avoid pushing the hard parts onto users. The initial strategy is to target controller patterns that are analogous to proven design patterns in functional or object-oriented programming.
For example, CompositeController lets you define the canonical relationship between some object (the parent node) and the objects that are directly under it in an ownership tree (child nodes). This is analogous to the Composite pattern in that it lets you manage a group of child objects as if were one object (by manipulating only the parent object).
Similarly, DecoratorController lets you add new child nodes to a parent node that already has some other behavior. This is analogous to the Decorator pattern in that it lets you dynamically wrap new behavior around select instances of an existing object type without having to create a new type.
Problem Statement
The problem that MapController addresses is that neither CompositeController nor DecoratorController allow you to make decisions based on objects that aren't owned by the particular parent object being processed. That's because in the absence of a parent-child relationship, there are arbitrarily many ways you could pick what other objects you want to look at.
To avoid having to send every object in a given resource (e.g. every Pod) on every hook invocation, there must be some way to tell Metacontroller which objects you need to see (that you don't own) to compute your desired state. Rather than try to embed various options for declaring these relationships (object name? label selector? field selector?) into each existing Metacontroller API, the goal of MapController is to provide a solution that's orthogonal to the existing APIs.
In other words, we attempt to separate the task of looking at non-owned objects (MapController) from the task of defining objects that are composed of other objects (CompositeController) so that users can mix and match these APIs (and future APIs) as needed without being limited to the precise scenarios we're able to anticipate.
Proposed Solution
MapController lets you define a collection of objects owned by a parent object, where each child object is generated by some mapping from a non-owned object. This is analogous to the general concept of a map function in that it calls your hook for each object in some input list (of non-owned objects), and creates an output list (of child objects) containing the results of each call.
A single sync
pass for a MapController roughly resembles this pseudocode:
def sync_map_controller():
input_list = get_matching_objects(input_resource, input_selector)
output_list = list()
foreach input_object in input_list:
output_list.append(map_hook(input_object))
reconcile_objects(output_list)
where map_hook()
is the only code that the MapController user writes,
as a lambda hook.
In general, MapController addresses use cases that can be described as, "For every matching X object that already exists, I want to create some number of Y objects according to the parameters stored in the parent object."
Alternatives Considered
Friend Resources
Add a new type of "non-child" resource to CompositeController called "friend resources". Along with all the matching children, we would also send all matching objects of the friend resource types to the sync hook request.
Matching would be determined with the parent's selector, just like for children. However, we would not require friends to have a ControllerRef pointing to the parent (the parent-friend relationship is non-exclusive), and the parent will not attempt to adopt friends.
The sync hook response would not contain friends, because we don't want to force you to list a desired state for all your friends every time. This means you cannot edit or delete your friends.
This approach was not chosen because:
- We have to send the entire list of matching friends as one big hook request. This complicates the user's hook code because they probably need to loop over each friend. It's also inefficient for patterns like "for every X (where there are a lot of X's), create a Y" since we have to sync every X if any one of them changes, and we can't process any of them in parallel.
- It's tied in with the CompositeController API, and doing something similar for other APIs like DecoratorController would require both duplicated and different effort (see Decorator Resources).
- It either forces you to use the same selector to find friends as you use to claim children, or it complicates the API with multiple selectors for different resources, which becomes difficult to reason about.
- If we force the same selector to apply to both friends and children,
we also force you to explicitly specify a meaningful set of labels.
You can't use selector generation (
controller-uid: ###
) for cases when you don't need orphaning and adoption; your friends won't match that selector.
Decorator Resources
Add a new type of resource to DecoratorController called a decorator resource, which contains objects that inform the behavior of the decorator. This would allow controllers that look at non-owned resources as part of computing the desired state of their children.
In particular, you could use DecoratorController to create attachments (extra children) on a parent object, while basing your desired state on information in another object (the decorator resource) that is not owned by that parent.
This approach was not chosen because:
- It's unclear how we would "link" objects of the decorator resource to particular parent objects being processed. Would we apply the parent selector to find decorator objects? Or apply a selector inside the decorator object to determine if it matches the parent object? Whatever we choose, it will likely be unintuitive and confusing for users.
- It's unclear what should happen if multiple decorator objects match a single parent object. We could send multiple decorator objects to the hook, but that just passes the complexity on to the user.
- It's unclear whether decorator objects are expected to take part in ownership of the objects created. Depending on the use case, users might want attachments to be owned by just the parent, just the decorator, or both. This configuration adds to the cognitive overhead of using the API, and there's no one default that's more intuitive than the others.
Example
The example use case we'll consider in this doc is a controller called SnapshotSchedule that creates periodic backups of PVCs with the VolumeSnapshot API. Notice that it's natural to express this in the form we defined above: "For every matching PVC, I want to create some VolumeSnapshot objects."
CompositeController doesn't fit this use case because the PVCs are created and potentially owned by something other than the SnapshotSchedule object. For example, the PVCs might have been created by a StatefulSet. Instead of creating PVCs, we want to look at all the PVCs that already exist and take action on certain ones.
DecoratorController doesn't fit this use case because it doesn't make sense for the VolumeSnapshots we create to be owned by the PVC from which the snapshot was taken. The lifecycle of a VolumeSnapshot has to be separate from the PVC because the whole point is that you should be able to recover the data if the PVC goes away. Since the PVC doesn't own the VolumeSnapshots, it doesn't make sense to think of the snapshots as a decoration on PVC (an additional feature of the PVC API).
An instance of SnapshotSchedule might look like this:
apiVersion: snapshot.k8s.io/v1
kind: SnapshotSchedule
metadata:
name: my-app-snapshots
spec:
snapshotInterval: 6h
snapshotTTL: 10d
selector:
matchLabels:
app: my-app
It contains a selector that determines which PVCs this schedule applies to, and some parameters that determine how often to take snapshots, as well as when to retire old snapshots.
API
Below is a sample MapController spec that could be used to implement the SnapshotSchedule controller:
apiVersion: metacontroller.k8s.io/v1alpha1
kind: MapController
metadata:
name: snapshotschedule-controller
spec:
parentResource:
apiVersion: snapshot.k8s.io/v1
resource: snapshotschedules
inputResources:
- apiVersion: v1
resource: persistentvolumeclaims
outputResources:
- apiVersion: volumesnapshot.external-storage.k8s.io/v1
resource: volumesnapshots
resyncPeriodSeconds: 5
hooks:
map:
webhook:
url: http://snapshotschedule-controller.metacontroller/map
tombstone:
webhook:
url: http://snapshotschedule-controller.metacontroller/tombstone
Parent Resource
The parent resource is the SnapshotSchedule itself, and anything this controller
creates will be owned by this parent.
The schedule thus acts like a bucket containing snapshots: if you delete the
schedule, the snapshots inside it will go away too, unless you specify to orphan
them as part of the delete operation (e.g. with --cascade=false
when using
kubectl delete
).
Notably, this ties the lifecycles of snapshots to the reason they exist
(the backup policy that the user defined), rather than tying them to the entity
that they are about (the PVC).
Input Resources
The input resources (in this case just PVC) are the inputs to the conceptual "map" function. We allow multiple input resources because users might want to write a controller that performs the same action for several different input types. We shouldn't force them to create multiple MapControllers with largely identical behavior.
The duck-typed spec.selector
field (assumed to be metav1.LabelSelector
) in
the parent object is used to filter which input objects to process.
If the selector is empty, we will process all objects of the input types in the
same namespace as the parent.
We will also ignore input objects whose controllerRef points to the particular parent object being processed. That would imply that the same resource (e.g. ConfigMap) is listed as both an input and an output in a given MapController spec. This allows use cases such as generating ConfigMaps from other ConfigMaps by doing some transformation on the data, while protecting against accidental recursion if the label selector is chosen poorly.
If there are multiple input resources, they are processed independently, with no attempt to correlate them. That is, the map hook will still be called with only a single input object each time, although the kind of that object might be different from one call to the next.
Output Resources
The output resources (in this case just VolumeSnapshot) are the types of objects that the user intends to create and hold in the conceptual "bucket" that the parent object represents. We allow multiple output resources because users might think of their controller as spitting out a few different things. We shouldn't force them to create a CompositeController too just so they can emit multiple outputs, especially if those outputs are not conceptually part of one larger whole.
For a given input object, the user can generate any number of output objects. We will tag those output objects in some way to associate them with the object that we sent as input. The tag makes it possible to group those objects and send them along with future map hook requests.
In pseudocode, a sync
pass could be thought of like the following:
// Get all matching objects from all input resources.
inputObjects := []Object{}
for _, inputResource := range inputResources {
inputObjects = append(inputObjects, getMatchingObjects(inputResource, parentSelector)...)
}
// Call the once hook for each input object.
for _, inputObject := range inputObjects {
// Compute some opaque string identifying this input object.
mapKey := makeMapKey(inputObject)
// Gather observed objects of the output resources that are tagged with this key.
observedOutputs := []Object{}
for _, outputResource := range outputResources {
// Gather all outputs owned by this parent.
allOutputs := getOwnedObjects(outputResource, parent)
// Filter to only those tagged for this input.
observedOutputs = append(observedOutputs, filterByMapKey(allOutputs, mapKey)...)
}
// Call user's map hook, passing observed state.
mapResult := mapHook(parent, inputObject, observedOutputs)
for _, obj := range mapResult.Outputs {
// Tag outputs to identify which input they came from.
setMapKey(obj, mapKey)
}
// Manage child objects by reconciling observed and desired outputs.
manageChildren(observedOutputs, mapResult.Outputs)
}
Detached Outputs
If an input object disappears, we may find that the parent owns one or more output objects that are tagged as having been generated from an input object that no longer exists. Note that this does not mean these objects have been orphaned, in the sense of having no ownerRef/controllerRef; the controllerRef will still point to the parent object. It's only our MapController-specific "tag" that has become a broken link.
By default, we will delete any such detached outputs so that controller authors don't have to think about them. However, the SnapshotSchedule example shows that sometimes it will be important to give users control over what happens to these objects. In that example, the user would want to keep detached VolumeSnapshots since they might be needed to restore the now-missing PVC.
We could offer a declarative knob to either always delete detached outputs, or always keep them, but that would be awkwardly restrictive. The controller author would have fine-grained control over the lifecycle of "live" outputs, but would suddenly lose that control when the outputs become detached.
Instead, we propose to define an optional tombstone hook that sends information about a particular group of detached outputs (belonging to a particular input object that is now gone), and asks the user to decide which ones to keep. For example, SnapshotSchedule would likely want to keep detached VolumeSnapshots around until the usual expiry timeout.
For now, we will not allow the hook to edit detached outputs because we don't want to commit to sending the body of the missing input object, since it may not be available. Without that input object, the hook author presumably wouldn't have enough information to decide on an updated desired state anyway. We can reexamine this if users come up with compelling use cases.
Status Aggregation
One notable omission from the map hook, as compared with the sync hook from CompositeController, is that the user does not return any status object. That's because each map hook invocation only sends enough context to process a single input object and its associated output objects. The hook author therefore doesn't have enough information to compute the overall status of the parent object.
We could define another hook to which we send all inputs and outputs for a given parent, and ask the user to return the overall status. However, that would defeat one of the main goals of MapController because such a monolithic hook request could get quite large for the type of use cases we expect for a controller that says, "do this for every X," and also because that would place the burden of aggregating status across the whole collection onto the user.
Instead, Metacontroller will compute an aggregated status for the collection based on some generic rules:
For each input resource, we will report the number of matching objects we observed as a status field on the parent object, named after the plural resource name.
The exact format will be an implementation detail, but for example it might look like:
status:
inputs:
persistentvolumeclaims:
total: 20
...
For each output resource, we will report the total number of objects owned by
this parent across all map keys.
In addition, we will automatically aggregate conditions found on output objects,
and report how many objects we own with that condition set to True
.
For example:
status:
...
outputs:
volumesnapshots:
total: 100
ready: 97
...
Hooks
Map Hook
We call the map hook to translate an input object into zero or more output objects.
Map Hook Request
Field | Description |
---|---|
controller | The whole MapController object, like what you might get from kubectl get mapcontroller <name> -o json . |
parent | The parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json . |
mapKey | An opaque string that uniquely identifies the group of outputs that belong to this input object. |
input | The input object, like what you might get from kubectl get <input-resource> <input-name> -o json . |
outputs | An associative array of output objects that the parent already created for the given input object. |
Map Hook Response
Field | Description |
---|---|
outputs | A list of JSON objects representing all the desired outputs for the given input object. |
Tombstone Hook
We call the tombstone hook, if defined, to ask whether we should keep any of a group of output objects whose corresponding input object is gone. If no tombstone hook is defined, we will always delete any such orphans as soon as the input object disappears.
Tombstone Hook Request
Field | Description |
---|---|
controller | The whole MapController object, like what you might get from kubectl get mapcontroller <name> -o json . |
parent | The parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json . |
mapKey | An opaque string that uniquely identifies the group of outputs that belong to this input object. |
outputs | An associative array of output objects that the parent already created for the given input object. |
Tombstone Hook Response
Field | Description |
---|---|
outputs | A list of output objects to keep, even though the associated input object is gone. All other outputs belonging to this input will be deleted. |
Contributor Guide
This section contains information for people who want to hack on or contribute to Metacontroller.
See the User Guide if you just want to use Metacontroller.
GitHub
Building
The page describes how to build Metacontroller for yourself.
Building
The page describes how to build Metacontroller for yourself.
First, check out the code:
# If you're going to build locally, make sure to
# place the repo according to the Go import path:
# $GOPATH/src/metacontroller.io
cd $GOPATH/src
git clone git@github.com:metacontroller/metacontroller.git metacontroller
cd metacontroller
Then you can build a metacontroller
binary like so:
make build
Local build and development
Check debug section
Documentation build
Documentation is generated from .md
files with mdbook.
To generate documentation, you need to install:
- mdbook
- mdbook plugins:
- linkcheck - verifies link corectness
- toc - creates TOC's - table of content
- graphviz - generation of dot diagrams
- open-on-gh - adds open-on-gh link
- graphviz
To generate documentation
cd docs
mdbook build
There will bebook
folder generated with html content.
You can also use mdbook serve
to expose documentation on http://localhost:3000
.
Tests
To run tests, first make sure you can successfully complete a local build.
Unit Tests
Unit tests in Metacontroller focus on code that does some kind of non-trival
local computation without depending on calls to remote servers -- for example,
the ./dynamic/apply
package.
Unit tests live in _test.go
files alongside the code that they test.
To run only unit tests (excluding integration tests)
for all Metacontroller packages, use this command:
make unit-test
Integration Tests
Integration tests in Metacontroller focus on verifying behavior at the level of calls to remote services like user-provided webhooks and the Kubernetes API server. Since Metacontroller's job is ultimately to manipulate Kubernetes API objects in response to other Kubernetes API objects, most of the important features or behaviors of Metacontroller can and should be tested at this level.
In the integration test environment, we start a standalone kube-apiserver
to
serve the REST APIs, and an etcd
instance to back it.
We do not run any kubelets (Nodes), nor any controllers other than
Metacontroller.
This makes it easy for tests to control exactly what API objects Metacontroller
sees without interference from the normal controller for each API,
and also greatly reduces the requirements to run tests.
Other than the Metacontroller codebase, all you need to run integration tests
is to download a few binaries from a Kubernetes release.
You can run the following script from the test/integration
directory in to
order to fetch the versions of these binaries currently used in continuous
integration, and place them in ./hack/bin
:
hack/get-kube-binaries.sh
You can then run the integration tests with this command, which will
automatically set the PATH to include ./hack/bin
:
make integration-test
Unlike unit tests, integration tests do not live alongside the code they test,
but instead are gathered in ./test/integration/...
.
This makes it easier to run them separately, since they require a special
environment, and also enforces that they test packages at the level of their
public interfaces.
End-to-End Tests
End-to-end tests in Metacontroller focus on verifying example workflows that we
expect to be typical for end users. That is, we run the same kubectl
commands
that a human might run when using Metacontroller.
Since these tests verify end-to-end behavior, they require a fully-functioning
Kubernetes cluster.
Before running them, you should have kubectl
in your PATH, and it should be
configured to talk to a suitable, empty test cluster that has had the
Metacontroller manifests applied.
Then you can run the end-to-end tests against your cluster with the following:
cd examples
./test.sh
This will run all the end-to-end tests in series, and print the location of a log file containing the output of the latest test that was run.
You can also run each test individually, which will show the output as it runs. For example:
cd examples/bluegreen
./test.sh
Note that currently our continuous integration only runs unit and integration tests on PRs, since those don't require a full cluster. If you have access to a suitable test cluster, you can help speed up review of your PR by running these end-to-end tests yourself to see if they catch anything.
Local development and debugging
Tips and tricks for contributors
Local run of metacontroller
There are different flavours of manifests shipped to help with local development:
- manifests/dev
- manifests/debug
Development build
The main difference it that image defined in manifest is localhost/metacontroller:dev
, therefore:
- apply dev manifests -
kubectl apply -k manifests/dev
- build docker image with command -
make image
- this will compile the binary and build the container image - load image into cluster (i.e.
kind load docker-image localhost/metacontroller:dev
in kind) - restart pod (i.e.
kubectl delete pod/metacontroller-0 --namespace metacontroller
)
Debug build
Debug requires building go sources in special way, which is done with make build_debug
; the following image
built with the Dockerfile.debug
dockerfile will then add it to the debug Docker image:
- apply debug manifests -
kubectl apply -k manifests/debug
- build debug binary and image -
make image_debug
- load image into cluster (i.e.
kind load docker-image localhost/metacontroller:debug
in kind) - restart pod
- on startup,
go
process will wait for debugger on port 40000 - port forward port 40000 from container into localhost, i.e.
kubectl port-forward metacontroller-0 40000:40000
- attach
go
debugger to port 40000 on localhost