Introduction

Metacontroller is an add-on for Kubernetes that makes it easy to write and deploy custom controllers. Although the open-source project was started at Google, the add-on works the same in any Kubernetes cluster.

While custom resources provide storage for new types of objects, custom controllers define the behavior of a new extension to the Kubernetes API. Just like the CustomResourceDefinition (CRD) API makes it easy to request storage for a custom resource, the Metacontroller APIs make it easy to define behavior for a new extension API or add custom behavior to existing APIs.

Simple Automation

Kubernetes provides a lot of powerful automation through its built-in APIs, but sometimes you just want to tweak one little thing or add a bit of logic on top. With Metacontroller, you can write and deploy new level-triggered API logic in minutes.

The code for your custom controller could be as simple as this example in Jsonnet that adds a label to Pods:

// This example is written in Jsonnet (a JSON templating language),
// but you can write hooks in any language.
function(request) {
  local pod = request.object,
  local labelKey = pod.metadata.annotations["pod-name-label"],

  // Inject the Pod name as a label with the key requested in the annotation.
  labels: {
    [labelKey]: pod.metadata.name
  }
}

Since all you need to provide is a webhook that understands JSON, you can use any programming language, often without any dependencies beyond the standard library. The code above is not a snippet; it's the entire script.

You can quickly deploy your code through any FaaS platform that offers HTTP(S) endpoints, or just load your script into a ConfigMap and launch a simple HTTP server to run it:

kubectl create configmap service-per-pod-hooks -n metacontroller --from-file=hooks

Finally, you declaratively specify how your script interacts with the Kubernetes API, which is analogous to writing a CustomResourceDefinition (to specify how to store objects):

apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
  name: pod-name-label
spec:
  resources:
  - apiVersion: v1
    resource: pods
    annotationSelector:
      matchExpressions:
      - {key: pod-name-label, operator: Exists}
  hooks:
    sync:
      webhook:
        url: http://service-per-pod.metacontroller/sync-pod-name-label

This declarative specification means that your code never has to talk to the Kubernetes API, so you don't need to import any Kubernetes client library nor depend on any code provided by Kubernetes. You merely receive JSON describing the observed state of the world and return JSON describing your desired state.

Metacontroller remotely handles all interaction with the Kubernetes API. It runs a level-triggered reconciliation loop on your behalf, much the way CRD provides a declarative interface to request that the API Server store objects on your behalf.

Reusable Building Blocks

In addition to making ad hoc automation simple, Metacontroller also makes it easier to build and compose general-purpose abstractions.

For example, many built-in workload APIs like StatefulSet are almost trivial to reimplement as Metacontroller hooks, meaning you can easily fork and customize such APIs. Feature requests that used to take months to implement in the core Kubernetes repository can be hacked together in an afternoon by anyone who wants them.

You can also compose existing APIs into higher-level abstractions, such as how BlueGreenDeployment builds on top of the ReplicaSet and Service APIs.

Users can even invent new general-purpose APIs like IndexedJob, which is a Job-like API that provides unique Pod identities like StatefulSet.

Complex Orchestration

Extension APIs implemented with Metacontroller can also build on top of other extension APIs that are themselves implemented with Metacontroller. This pattern can be used to compose complex orchestration out of simple building blocks that each do one thing well.

For example, the Vitess Operator is implemented entirely as Jsonnet webhooks with Metacontroller. The end result is much more complex than ad hoc automation or even general-purpose workload abstractions, but the key is that this complexity arises solely from the challenge of orchestrating Vitess, a distributed MySQL clustering system.

Building Operators with Metacontroller frees developers from learning the internal machinery of implementing Kubernetes controllers and APIs, allowing them to focus on solving problems in the application domain. It also means they can take advantage of existing API machinery like shared caches without having to write their Operators in Go.

Metacontroller's webhook APIs are designed to make it feel like you're writing a one-shot, client-side generator that spits out JSON that gets piped to kubectl apply.

In other words, if you already know how to manually manage an application in Kubernetes with kubectl, Metacontroller lets you write automation for that app without having to learn a new language or how to use Kubernetes client libraries.

Get Started

Introduction

Metacontroller is an add-on for Kubernetes that makes it easy to write and deploy custom controllers. Although the open-source project was started at Google, the add-on works the same in any Kubernetes cluster.

While custom resources provide storage for new types of objects, custom controllers define the behavior of a new extension to the Kubernetes API. Just like the CustomResourceDefinition (CRD) API makes it easy to request storage for a custom resource, the Metacontroller APIs make it easy to define behavior for a new extension API or add custom behavior to existing APIs.

Simple Automation

Kubernetes provides a lot of powerful automation through its built-in APIs, but sometimes you just want to tweak one little thing or add a bit of logic on top. With Metacontroller, you can write and deploy new level-triggered API logic in minutes.

The code for your custom controller could be as simple as this example in Jsonnet that adds a label to Pods:

// This example is written in Jsonnet (a JSON templating language),
// but you can write hooks in any language.
function(request) {
  local pod = request.object,
  local labelKey = pod.metadata.annotations["pod-name-label"],

  // Inject the Pod name as a label with the key requested in the annotation.
  labels: {
    [labelKey]: pod.metadata.name
  }
}

Since all you need to provide is a webhook that understands JSON, you can use any programming language, often without any dependencies beyond the standard library. The code above is not a snippet; it's the entire script.

You can quickly deploy your code through any FaaS platform that offers HTTP(S) endpoints, or just load your script into a ConfigMap and launch a simple HTTP server to run it:

kubectl create configmap service-per-pod-hooks -n metacontroller --from-file=hooks

Finally, you declaratively specify how your script interacts with the Kubernetes API, which is analogous to writing a CustomResourceDefinition (to specify how to store objects):

apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
  name: pod-name-label
spec:
  resources:
  - apiVersion: v1
    resource: pods
    annotationSelector:
      matchExpressions:
      - {key: pod-name-label, operator: Exists}
  hooks:
    sync:
      webhook:
        url: http://service-per-pod.metacontroller/sync-pod-name-label

This declarative specification means that your code never has to talk to the Kubernetes API, so you don't need to import any Kubernetes client library nor depend on any code provided by Kubernetes. You merely receive JSON describing the observed state of the world and return JSON describing your desired state.

Metacontroller remotely handles all interaction with the Kubernetes API. It runs a level-triggered reconciliation loop on your behalf, much the way CRD provides a declarative interface to request that the API Server store objects on your behalf.

Reusable Building Blocks

In addition to making ad hoc automation simple, Metacontroller also makes it easier to build and compose general-purpose abstractions.

For example, many built-in workload APIs like StatefulSet are almost trivial to reimplement as Metacontroller hooks, meaning you can easily fork and customize such APIs. Feature requests that used to take months to implement in the core Kubernetes repository can be hacked together in an afternoon by anyone who wants them.

You can also compose existing APIs into higher-level abstractions, such as how BlueGreenDeployment builds on top of the ReplicaSet and Service APIs.

Users can even invent new general-purpose APIs like IndexedJob, which is a Job-like API that provides unique Pod identities like StatefulSet.

Complex Orchestration

Extension APIs implemented with Metacontroller can also build on top of other extension APIs that are themselves implemented with Metacontroller. This pattern can be used to compose complex orchestration out of simple building blocks that each do one thing well.

For example, the Vitess Operator is implemented entirely as Jsonnet webhooks with Metacontroller. The end result is much more complex than ad hoc automation or even general-purpose workload abstractions, but the key is that this complexity arises solely from the challenge of orchestrating Vitess, a distributed MySQL clustering system.

Building Operators with Metacontroller frees developers from learning the internal machinery of implementing Kubernetes controllers and APIs, allowing them to focus on solving problems in the application domain. It also means they can take advantage of existing API machinery like shared caches without having to write their Operators in Go.

Metacontroller's webhook APIs are designed to make it feel like you're writing a one-shot, client-side generator that spits out JSON that gets piped to kubectl apply.

In other words, if you already know how to manually manage an application in Kubernetes with kubectl, Metacontroller lets you write automation for that app without having to learn a new language or how to use Kubernetes client libraries.

Get Started

Examples

This page lists some examples of what you can make with Metacontroller.

If you'd like to add a link to another example that demonstrates a new language or technique, please send a pull request against this document.

CompositeController

CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects based on the desired state specified in a parent object. Workload controllers like Deployment and StatefulSet are examples of existing controllers that fit this pattern.

CatSet (JavaScript)

CatSet is a rewrite of StatefulSet, including rolling updates, as a CompositeController. It shows that existing workload controllers already use a pattern that could fit within a CompositeController, namely managing child objects based on a parent spec.

BlueGreenDeployment (JavaScript)

BlueGreenDeployment is an alternative to Deployment that implements a Blue-Green rollout strategy. It shows how CompositeController can be used to add various automation on top of built-in APIs like ReplicaSet.

IndexedJob (Python)

IndexedJob is an alternative to Job that gives each Pod a unique index, like StatefulSet. It shows how to write a CompositeController in Python, and also demonstrates selector generation.

DecoratorController

DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which resources to watch, as well as filters on labels and annotations.

For each object you watch, you can add, edit, or remove labels and annotations, as well as create new objects and attach them. Unlike CompositeController, these new objects don't have to match the main object's label selector. Since they're attached to the main object, they'll be cleaned up automatically when the main object is deleted.

Service Per Pod (Jsonnet)

Service Per Pod is an example DecoratorController that creates an individual Service for every Pod in a StatefulSet (e.g. to give them static IPs), effectively adding new behavior to StatefulSet without having to reimplement it.

Customize hook examples

Customize hook is addition to Composite/Decorator controllers, extending information given in sync hook of other objects (called related) in addition to parent.

ConfigMapPropagation

ConfigMapPropagation is a simple mechanism to propagate given ConfigMap to other namespaces, specified in given objects. Source ConfigMap is also specified. This is also an example how Status subresource should be handled.

Global Config Map

Global Config Map is similar to ConfigMapPropagation. but populates ConfigMap to all namespaces.

Secret propagation

Secret propagation is modyfication of ConfigMapPropagation concept, using label selector on Namespace object to choose where to propagate Secret.

Concepts

This page provides some background on terms that are used throughout the Metacontroller documentation.

Kubernetes Concepts

These are some of the general Kubernetes Concepts that are particularly relevant to Metacontroller.

Resource

In the context of the Kubernetes API, a resource is a REST-style collection of API objects. When writing controllers, it's important to understand the following terminology.

Resource Name

There are many ways to refer to a resource. For example, you may have noticed that you can fetch ReplicaSets with any of the following commands:

kubectl get rs          # short name
kubectl get replicaset  # singular name
kubectl get replicasets # plural name

When writing controllers, it's important to note that the plural name is the canonical form when interacting with the REST API (it's in the URL) and API discovery (entries are keyed by plural name).

So, whenever Metacontroller asks for a resource name, you should use the canonical, lowercase, plural form (e.g. replicasets).

API Group

Each resource lives inside a particular API group, which helps different API authors avoid name conflicts. For example, you can have two resources with the same name as long as they are in different API groups.

API Version

Each API group has one or more available API versions. It's important to note that Kubernetes API versions are format versions. That is, each version is a different lens through which you can view objects in the collection, but you'll see the same set of underlying objects no matter which lens you view them through.

The API group and version are often combined in the form <group>/<version>, such as in the apiVersion field of an API object. APIs in the core group (like Pod) omit the group name in such cases, specifying only <version>.

API Kind

Whereas a resource is a collection of objects served at a particular REST path, the kind of a resource represents something like the type or class of those objects.

Since Kubernetes resources and kinds must have a 1-to-1 correspondence within a given API group, the resource name and kind are often used interchangeably in Kubernetes documentation. However, it's important to distinguish the resource and kind when writing controllers.

The kind is often the same as the singular resource name, except that it's written in UpperCamelCase. This is the form that you use when writing JSON or YAML manifests, and so it's also the form you should use when generating objects within a lambda hook:

apiVersion: apps/v1
kind: ReplicaSet
[...]

Custom Resource

A custom resource is any resource that's installed through dynamic API registration (either through CRD or aggregation), rather than by being compiled directly into the Kubernetes API server.

Controller

Distributed components in the Kubernetes control plane communicate with each other by posting records in a shared datastore (like a public message board), rather than sending direct messages (like email).

This design helps avoid silos of information. All participants can see what everyone is saying to everyone else, so each participant can easily access whatever information it needs to make the best decision, even as those needs change. The lack of silos also means extensions have the same power as built-in features.

In the context of the Kubernetes control plane, a controller is a long-running, automated, autonomous agent that participates in the control plane via this shared datastore (the Kubernetes API server). In the message board analogy, you can think of controllers like bots.

A given controller might participate by:

  • observing objects in the API server as inputs and creating or updating other objects in the API server as outputs (e.g. creating Pods for a ReplicaSet);
  • observing objects in the API server as inputs and taking action in some other domain (e.g. spawning containers for a Pod);
  • creating or updating objects in the API server to report observations from some other domain (e.g. "the container is running");
  • or any combination of the above.

Custom Controller

A custom controller is any controller that can be installed, upgraded, and removed in a running cluster, independently of the cluster's own lifecycle.

Metacontroller Concepts

These are some concepts that are specific to Metacontroller.

Metacontroller

Metacontroller is a server that extends Kubernetes with APIs that encapsulate the common parts of writing custom controllers.

Just like kube-controller-manager, this server hosts multiple controllers. However, the set of hosted controllers changes dynamically in response to updates in objects of the Metacontroller API types. Metacontroller is thus itself a controller that watches the Metacontroller API objects and launches hosted controllers in response. In other words, it's a controller-controller -- hence the name.

Lambda Controller

When you create a controller with one of the Metacontroller APIs, you provide a function that contains only the business logic specific to your controller. Since these functions are called via webhooks, you can write them in any language that can understand HTTP and JSON, and optionally host them with a Functions-as-a-Service provider.

The Metacontroller server then executes a control loop on your behalf, calling your function whenever necessary to decide what to do.

These callback-based controllers are called lambda controllers. To keep the interface as simple as possible, each lambda controller API targets a specific controller pattern, such as:

Support for other types of controller patterns will be added in the future, such as coordinating between Kubernetes API objects and external state in another domain.

Lambda Hook

Each lambda controller API defines a set of hooks, which it calls to let you implement your business logic.

Currently, these lambda hooks must be implemented as webhooks, but other mechanisms could be added in the future, such as gRPC or embedded scripting languages.

Features

This is a high-level overview of what Metacontroller provides for Kubernetes controller authors.

Dynamic Scripting

With Metacontroller's hook-based design, you can write controllers in any language while still taking advantage of the efficient machinery we developed in Go for core controllers.

This makes Metacontroller especially useful for rapid development of automation in dynamic scripting languages like Python or JavaScript, although you're also free to use statically-typed languages like Go or Java.

To support fast ramp-up and iteration on your ideas, Metacontroller makes it possible to write controllers with:

  • No schema/IDL
  • No generated code
  • No library dependencies
  • No container image build/push

Controller Best Practices

Controllers you write with Metacontroller automatically behave like first-class citizens out of the box, before you write any code.

All interaction with the Kubernetes API happens inside the Metacontroller server in response to your instructions. This allows Metacontroller to implement best practices learned from writing core controllers without polluting your business logic.

Even the simplest Hello, World example with Metacontroller already takes care of:

  • Label selectors (for defining flexible collections of objects)
  • Orphan/adopt semantics (controller reference)
  • Garbage collection (owner references for automatic cleanup)
  • Watches (for low latency)
  • Caching (shared informers/reflectors/listers)
  • Work queues (deduplicated parallelism)
  • Optimistic concurrency (resource version)
  • Retries with exponential backoff
  • Periodic relist/resync

Declarative Watches

Rather than writing boilerplate code for each type of resource you want to watch, you simply list those resources declaratively:

childResources:
- apiVersion: v1
  resource: pods
- apiVersion: v1
  resource: persistentvolumeclaims

Behind the scenes, Metacontroller sets up watch streams that are shared across all controllers that use Metacontroller.

That means, for example, that you can create as many lambda controllers as you want that watch Pods, and the API server will only need to send one Pod watch stream (to Metacontroller itself).

Metacontroller then acts like a demultiplexer, determining which controllers will care about a given event in the stream and triggering their hooks only as needed.

Declarative Reconciliation

A large part of the expressiveness of the Kubernetes API is due to its focus on declarative management of cluster state, which lets you directly specify an end state without specifying how to get there. Metacontroller expands on this philosophy, allowing you to define controllers in terms of what they want without specifying how to get there.

Instead of thinking about imperative operations like create/read/update/delete, you just generate a list of all the things you want to exist. Based on the current cluster state, Metacontroller will then determine what actions are required to move the cluster towards your desired state and maintain it once its there.

Just like the built-in controllers, the reconciliation that Metacontroller performs for you is level-triggered so it's resilient to downtime (missed events), yet optimized for low latency and low API load through shared watches and caches.

However, the clear separation of deciding what you want (the hook you write) from running a low-latency, level-triggered reconciliation loop (what Metacontroller does for you) means you don't have to think about this.

Declarative Declarative Rolling Update

Another big contributor to the power of Kubernetes APIs like Deployment and StatefulSet is the ability to declaratively specify gradual state transitions. When you update your app's container image or configuration, for example, these controllers will slowly roll out Pods with the new template and automatically pause if things don't look right.

Under the hood, implementing gradual state transitions with level-triggered reconcilation loops involves careful bookkeeping with auxilliary records, which is why StatefulSet originally launched without rolling updates. Metacontroller lets you easily build your own APIs that offer declarative rolling updates without making you think about all this additional bookkeeping.

In fact, Metacontroller provides a declarative interface for configuring how you want to implement declarative rolling updates in your controller (declarative declarative rolling update), so you don't have to write any code to take advantage of this feature.

For example, adding support for rolling updates to a Metacontroller-based rewrite of StatefulSet looks essentially like this:

   childResources:
   - apiVersion: v1
     resource: pods
+    updateStrategy:
+      method: RollingRecreate
+      statusChecks:
+        conditions:
+        - type: Ready
+          status: "True"

For comparison, the corresponding pull request to add rolling updates to StatefulSet itself involved over 9,000 lines of changes to business logic, boilerplate, and generated code.

FAQ

This page answers some common questions encountered while evaluating, setting up, and using Metacontroller.

If you have any questions that aren't answered here, please ask on the mailing list or Slack channel.

Evaluating Metacontroller

How does Metacontroller compare with other tools?

See the features page for a list of the things that are most unique about Metacontroller's approach.

In general, Metacontroller aims to make common patterns as simple as possible, without necessarily supporting the full flexibility you would have if you wrote a controller from scratch. The philosophy is analogous to that of CustomResourceDefinition (CRD), where the main API server does all the heavy lifting for you, but you don't have as much control as you would if you wrote your own API server and connected it through aggregation.

Just like CRD, Metacontroller started with a small set of capabilities and is expanding over time to support more customization and more use cases as we gain confidence in the abstractions. Depending on your use case, you may prefer one of the alternative tools that took the opposite approach of first allowing everything and then building "rails" over time to encourage best practices and simplify development.

What is Metacontroller good for?

Metacontroller is intended to be a generic tool for creating many kinds of Kubernetes controllers, but one of its earliest motivating use cases was to simplify development of custom workload automation, so it's particularly well-suited for this.

For example, if you've ever thought, "I wish StatefulSet would do this one thing differently," Metacontroller gives you the tools to define your own custom behavior without reinventing the wheel.

Metacontroller is also well-suited to people who prefer languages other than Go, but still want to benefit from the efficient API machinery that was developed in Go for the core Kubernetes controllers.

Lastly, Metacontroller is good for rapid development of automation on top of APIs that already exist as Kubernetes resources, such as:

  • ad hoc scripting ("make an X for every Y")
  • configuration abstraction ("when I say A, that means {X,Y,Z}")
  • higher-level automation of custom APIs added by Operators
  • gluing an external CRUD API into the Kubernetes control plane with a simple translation layer

What is Metacontroller not good for?

Metacontroller is not a good fit when you need to examine a large number of objects to answer a single hook request. For example, if you need to be sent a list of all Pods or all Nodes in order to decide on your desired state, we'd have to call your hook with the full list of all Pods or Nodes any time any one of them changed. However, it might be a good fit if your desired behavior can be naturally broken down into per-Pod or per-Node tasks, since then we'd only need to call your hook with each object that changed.

Metacontroller is also not a good fit for writing controllers that perform long sequences of imperative steps -- for example, a single hook that executes many steps of a workflow by creating various children at the right times. That's because Metacontroller hooks work best when they use a functional style (no side effects, and output depends only on input), which is an awkward style for defining imperative sequences.

Do I have to use CRD?

It's common to use CRD, but Metacontroller doesn't know or care whether a resource is built-in or custom, nor whether it's served by CRD or by an aggregated API server.

Metacontroller uses API discovery and the dynamic client to treat all resources the same, so you can write automation for any type of resource. Using the dynamic client also means Metacontroller doesn't need to be updated when new APIs or fields are added in subsequent Kubernetes releases.

What does the name Metacontroller mean?

The name Metacontroller comes from the English words meta and controller. Metacontroller is a controller controller -- a controller that controls other controllers.

How do you pronounce Metacontroller?

Please see the pronunciation guide.

Setting Up Metacontroller

Do I need to be a cluster admin to install Metacontroller?

Installing Metacontroller requires permission to both install CRDs (representing the Metacontroller APIs themselves) and grant permissions for Metacontroller to access other resources on behalf of the controllers it hosts.

Why is Metacontroller shared cluster-wide?

Metacontroller currently only supports cluster-wide installation because it's modeled after the built-in kube-controller-manager component to achieve the same benefits of sharing watches and caches.

Also, resources in general (either built-in or custom) can only be installed cluster-wide, and a Kubernetes API object is conventionally intended to mean the same thing regardless of what namespace it's in.

Why does Metacontroller need these permissions?

During alpha, Metacontroller simply requests wildcard permission to all resources so the controllers it hosts can access anything they want. For this reason, you should only give trusted users access to the Metacontroller APIs that create hosted controllers.

By contrast, core controllers are restricted to only the minimal set of permissions needed to do their jobs.

Does Metacontroller have to be in its own namespace?

The default installation manifests put Metacontroller in its own namespace to make it easy to see what's there and clean up if necessary, but it can run anywhere. The metacontroller namespace is also used in examples for similar convenience reasons, but you can run webhooks in any namespace or even host them outside the cluster.

Developing with Metacontroller

Which languages can I write hooks in?

You can write lambda hooks (the business logic for your controller) in any language, as long as you can host it as a webhook that accepts and returns JSON. Regardless of which language you use for your business logic, Metacontroller uses the efficient machinery written in Go for the core controllers to interact with the API server on your behalf.

How do I access the Kubernetes API from my hook?

You don't! Or at least, you don't have to, and it's best not to. Instead, you just declare what objects you care about and Metacontroller will send them to you as part of the hook request. Then, your hook should simply return a list of desired objects. Metacontroller will take care of reconciling your desired state.

Can I call external APIs from my hook?

Yes. Your hook code can do whatever it wants as part of computing a response to a Metacontroller hook request, including calling external APIs.

The main thing to be careful of is to avoid synchronously waiting for long-running tasks to finish, since that will hold up one of a fixed number of concurrent slots in the queue of triggers for that hook. Instead, if your hook needs to wait for some condition that's checked through an external API, you should return a status that indicates this pending state, and set a resync period so you get a chance to check the condition again later.

How can I make sure external resources get cleaned up?

If you allocate external resources as part of your hook, you should also implement a finalize hook to make sure you get a chance to clean up those external resources when the Kubernetes API object for which you created them goes away.

Does Metacontroller support "apply" semantics?

Yes, Metacontroller enforces apply semantics, which means your controller will play nicely with other automation as long as you only fill in the fields that you care about in the objects you return.

How do I host my hook?

You can host your lambda hooks with an HTTP server library in your chosen language, with a standalone HTTP server, or with a Functions-as-a-Service platform. See the examples page for approaches in various languages.

How can I provide a programmatic client for my API?

Since Metacontroller uses the dynamic client on your behalf, you can write your controller's business logic without any client library at all. That also means you can write a "dynamically typed" controller without creating static schema (either Kubernetes' Go IDL or OpenAPI) or generating a client.

However, if you want to provide a static client for users of your API, nothing about Metacontroller prevents you from writing Go IDL or OpenAPI and generating a client the same way you would without Metacontroller.

What are the best practices for designing controllers?

Please see the dedicated best practices guide.

How do I troubleshoot problems?

Please see the dedicated troubleshooting guide.

How to pronounce Metacontroller

Metacontroller is pronounced as me-ta-con-trol-ler.

User Guide

This section contains general tips and step-by-step tutorials for using Metacontroller.

See the API Reference for details about all the available options.

Installation

This page describes how to install Metacontroller, either to develop your own controllers or just to run third-party controllers that depend on it.

Create a Controller

This tutorial walks through a simple example of creating a controller in Python with Metacontroller.

Best Practices

Metacontroller will take care of merging your change to importantField while preserving the fields you don't care about that were set by others.

Troubleshooting

This is a collection of tips for debugging controllers written with Metacontroller.

Installation

This page describes how to install Metacontroller, either to develop your own controllers or just to run third-party controllers that depend on it.

Docker images

Images are hosted in two places:

Feel free to use whatever suits your need, they identical. Note - currently in helm charts the dockerhub one's are used.

Prerequisites

  • Kubernetes v1.17+ (because of maintainability, e2e test suite might not cover all releases)
  • You should have kubectl available and configured to talk to the desired cluster.

Grant yourself cluster-admin (GKE only)

Due to a known issue in GKE, you'll need to first grant yourself cluster-admin privileges before you can install the necessary RBAC manifests.

kubectl create clusterrolebinding <user>-cluster-admin-binding --clusterrole=cluster-admin --user=<user>@<domain>

Replace <user> and <domain> above based on the account you use to authenticate to GKE.

Install Metacontroller using Kustomize

# Apply all set of production resources defined in kustomization.yaml in `production` directory .
kubectl apply -k https://github.com/metacontroller/metacontroller/manifests/production

If you prefer to build and host your own images, please see the build instructions in the contributor guide.

If your kubectl version does does not support -k flag, please install resources mentioned in manifests/production/kustomization.yaml one by one manually with kubectl apply -f {{filename}} command.

Install Metacontroller using Helm

Alternatively, metacontroller can be installed using an Helm chart.

Migrating from /GoogleCloudPlatform/metacontroller

As current version of metacontroller uses different name of the finalizer than GCP version (GCP - metacontroller.app, current version - metacontroller.io) thus after installing metacontroller you might need to clean up old finalizers, i.e. by running:

kubectl get <comma separated list of your resource types here> --no-headers --all-namespaces | awk '{print $2 " -n " $1}' | xargs -L1 -P 50 -r kubectl patch -p '{"metadata":{"finalizers": [null]}}' --type=merge

Install Metacontroller using Helm

Building the chart from source code

The chart can be built from metacontroller source:

git clone https://github.com/metacontroller/metacontroller.git
cd  metacontroller
helm package deploy/helm/metacontroller --destination deploy/helm

Installing the chart from package

helm install metacontroller deploy/helm/metacontroller-helm-v*.tgz

Installing chart from ghcr.io

Charts are published as packages on ghcr.io

You can pull them like:

  • HELM_EXPERIMENTAL_OCI=1 helm pull oci://ghcr.io/metacontroller/metacontroller-helm --version=<version>

as OCI is currently (at least for helm 3.8.x) a beta feature.

Configuration

ParameterDescriptionDefault
commandCommand which is used to start metacontroller/usr/bin/metacontroller
commandArgsCommand arguments which are used to start metacontroller. See configuration.md for additional details.[ "--zap-log-level=4", "--discovery-interval=20s", "--cache-flush-interval=30m" ]
rbac.createCreate and use RBAC resourcestrue
image.repositoryImage repositorymetacontrollerio/metacontroller
image.pullPolicyImage pull policyIfNotPresent
image.tagImage tag"" (Chart.AppVersion)
imagePullSecretsImage pull secrets[]
nameOverrideOverride the deployment name"" (Chart.Name)
namespaceOverrideOverride the deployment namespace"" (Release.Namespace)
fullnameOverrideOverride the deployment full name"" (Release.Namespace-Chart.Name)
serviceAccount.createCreate service accounttrue
serviceAccount.annotationsServiceAccount annotations{}
serviceAccount.nameService account name to use, when empty will be set to created account if serviceAccount.create is set else to default""
podAnnotationsPod annotations{}
podSecurityContextPod security context{}
securityContextContainer security context{}
resourcesCPU/Memory resource requests/limits{}
nodeSelectorNode labels for pod assignment{}
tolerationsToleration labels for pod assignment[]
affinityAffinity settings for pod assignment{}
priorityClassNameThe name of the PriorityClass that will be assigned to metacontroller""
clusterRole.aggregationRuleThe aggregationRule applied to metacontroller ClusterRole{}
clusterRole.rulesThe rules applied to metacontroller ClusterRole{ "apiGroups": "*", "resources": "*", "verbs": "*" }
replicasSpecifies the number of metacontroller pods that will be deployed1
podDisruptionBudgetThe podDisruptionBudget applied to metacontroller pods{}
service.enabledIf true, then create a Service to expose portsfalse
service.portsList of ports that are exposed on the Service[]

Configuration

This page describes how to configure Metacontroller.

Command line flags

The Metacontroller server has a few settings that can be configured with command-line flags (by editing the Metacontroller StatefulSet in manifests/metacontroller.yaml):

FlagDescription
--zap-log-levelZap log level to configure the verbosity of logging. Can be one of ‘debug’, ‘info’, ‘error’, or any integer value > 0 which corresponds to custom debug levels of increasing verbosity(e.g. --zap-log-level=5). Level 4 logs Metacontroller's interaction with the API server. Levels 5 and up additionally log details of Metacontroller's invocation of lambda hooks. See the troubleshooting guide for more.
--zap-develDevelopment Mode (e.g. --zap-devel) defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn).
--zap-encoderZap log encoding - json or console (e.g. --zap-encoder='json') defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn).
--zap-stacktrace-levelZap Level at and above which stacktraces are captured - one of info or error (e.g. --zap-stacktrace-level='info').
--discovery-intervalHow often to refresh discovery cache to pick up newly-installed resources (e.g. --discovery-interval=10s).
--cache-flush-intervalHow often to flush local caches and relist objects from the API server (e.g. --cache-flush-interval=30m).
--metrics-addressThe address to bind metrics endpoint - /metrics (e.g. --metrics-address=:9999). It can be set to "0" to disable the metrics serving.
--kubeconfigPath to kubeconfig file (same format as used by kubectl); if not specified, use in-cluster config (e.g. --kubeconfig=/path/to/kubeconfig).
--client-go-qpsNumber of queries per second client-go is allowed to make (default 5, e.g. --client-go-qps=100)
--client-go-burstAllowed burst queries for client-go (default 10, e.g. --client-go-burst=200)
--workersNumber of sync workers to run (default 5, e.g. --workers=100)
--events-qpsRate of events flowing per object (default - 1 event per 5 minutes, e.g. --events-qps=0.0033)
--events-burstNumber of events allowed to send per object (default 25, e.g. --events-burst=25)
--pprof-addressEnable pprof and bind to endpoint /debug/pprof, set to 0 to disable pprof serving (default 0, e.g. --pprof-address=:6060)
--leader-electionDetermines whether or not to use leader election when starting metacontroller (default false, e.g., --leader-election)
--leader-election-resource-lockDetermines which resource lock to use for leader election (default leases, e.g., --leader-election-resource-lock=leases). Valid resource locks are endpoints, configmaps, leases, endpointsleases, or configmapsleases. See the client-go documentation leaderelection/resourcelock for additional information.
--leader-election-namespaceDetermines the namespace in which the leader election resource will be created. If metacontroller is running in-cluster, the default leader election namespace is the same namespace as metacontroller. If metacontroller is running out-of-cluster, the default leader election namespace is undefined. If you are running metacontroller out-of-cluster with leader election enabled, you must specify the leader election namespace. (e.g., --leader-election-namespace=metacontroller)
--leader-election-idDetermines the name of the resource that leader election will use for holding the leader lock. For example, if the leader election id is metacontroller and the leader election resource lock is leases, then a resource of kind leases with metadata.name metacontroller will hold the leader lock. (default metacontroller, e.g., --leader-election-id=metacontroller)
--health-probe-bind-addressThe address the health probes endpoint binds to (default ":8081", e.g., --health-probe-bind-address=":8081")
--target-label-selectorLabel selector used to restrict an instance of metacontroller to manage specific Composite and Decorator controllers, which enables the ability to run multiple metacontroller instances on the same cluster (e.g. --target-label-selector=controller-group=cicd")

Logging flags are being set by controller-runtime, more on the meaning of them can be found here

Running multiple instances

Metacontroller can be setup to run multiple instances in the same Kubernetes cluster that can watch resources based on separate grouping or as a way to split responsibilities; which can also act as a scaling aid.

This is made possible by configuring Metacontroller with the target-label-selector argument.

Further details on this feature can be found here.

Pros

  • Clean separation of different Metacontroller instances, in case of
    • Permissions needed to manage its controllers (they can be limited to what the actual operator needs)
    • Allowing the Sidecar pattern - so in the pod there is Metacontroller pod and operator pod, metacontroller manages only this operator.
  • Allow scaling (in a primitive way in term of separation of concerns / grouping)

Cons

  • Metacontroller fighting over resources - can be caused by two Metacontroller instances managing the same CRD
    • Care needs to be taken when configuring multiple Metacontroller instances and not just deploy the default configuration but properly set the target-label-selector value for each Metacontroller instance to be unique if what it is managing.

Example

1. Configure Metacontroller

Add the --target-label-selector argument to Metacontroller binary arguments; below this is inside the Kubernetes deployment spec for Metacontroller:

...
    spec:
      containers:
      - args:
        - --zap-devel
        - --zap-log-level=5
        - --discovery-interval=5s
        - --target-label-selector=controller-group=cicd
...

You can also use the more advanced features like Equality-based requirement and Set-base requirement when defining your target label selector.

2. Specify labels on the Controller

Below is an example of a Decorator Controller that has the appropriate label added so that this instance of Metacontroller can target and manage it:

apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
  name: pod-name-label
  labels:
    controller-group: cicd
spec:
  resources:
  - apiVersion: v1
    resource: pods
    annotationSelector:
      matchExpressions:
      - {key: pod-name-label, operator: Exists}
  hooks:
    sync:
      webhook:
        url: http://service-per-pod.metacontroller/sync-pod-name-label

Create a Controller

This tutorial walks through a simple example of creating a controller in Python with Metacontroller.

Prerequisites

  • Kubernetes v1.9+
  • You should have kubectl available and configured to talk to the desired cluster.
  • You should have already installed Metacontroller.

Hello, World!

In this example, we'll create a useless controller that runs a single Pod that prints a greeting to its standard output. Once you're familiar with the general process, you can look through the examples page to find concepts that actually do something useful.

To make cleanup easier, first create a new Namespace called hello:

kubectl create namespace hello

We'll put all our Namespace-scoped objects there by adding -n hello to the kubectl commands.

Define a custom resource

Our example controller will implement the behavior for a new API represented as a custom resource.

First, let's use the built-in CustomResourceDefinition API to set up a storage location (a helloworlds resource) for objects of our custom type (HelloWorld).

Save the following to a file called crd.yaml:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: helloworlds.example.com
spec:
  group: example.com
  names:
    kind: HelloWorld
    plural: helloworlds
    singular: helloworld
  scope: Namespaced
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              who:
                type: string
    subresources:
     status: {}

Then apply it to your cluster:

kubectl apply -f crd.yaml

Define a custom controller

For each HelloWorld object, we're going to create a Pod as a child object, so we'll use the CompositeController API to implement a controller that defines this parent-child relationship.

Save the following to a file called controller.yaml:

apiVersion: metacontroller.k8s.io/v1alpha1
kind: CompositeController
metadata:
  name: hello-controller
spec:
  generateSelector: true
  parentResource:
    apiVersion: example.com/v1
    resource: helloworlds
  childResources:
  - apiVersion: v1
    resource: pods
    updateStrategy:
      method: Recreate
  hooks:
    sync:
      webhook:
        url: http://hello-controller.hello/sync

Then apply it to your cluster:

kubectl apply -f controller.yaml

This tells Metacontroller to start a reconciling control loop for you, running inside the Metacontroller server. The parameters under spec: let you tune the behavior of the controller declaratively.

In this case:

  • We set generateSelector to true to mimic the built-in Job API since we're running a Pod to completion and don't want to share Pods across invocations.
  • The parentResource is our custom resource called helloworlds.
  • The idea of CompositeController is that the parent resource represents objects that are composed of other objects. A HelloWorld is composed of just a Pod, so we have only one entry in the childResources list.
  • For each child resource, we can optionally set an updateStrategy to specify what to do if a child object needs to be updated. Since Pods are effectively immutable, we use the Recreate method, which means, "delete the outdated object and create a new one".
  • Finally, we tell Metacontroller how to invoke the sync webhook, which is where we'll define the business logic of our controller. The example relies on in-cluster DNS to resolve the address of the hello-controller Service (which we'll define below) within the hello Namespace.

Write a webhook

Metacontroller will handle the controllery bits for us, but we still need to tell it what our controller actually does.

To define our business logic, we write a webhook that generates child objects based on the parent spec, which is provided as JSON in the webhook request. The sync hook request contains additional information as well, but the parent spec is all we need for this example.

You can write Metacontroller hooks in any language, but Python is particularly nice because its dictionary type is convenient for programmatically building JSON objects (like the Pod object below).

If you have a preferred Functions-as-a-Service framework, you can use that to write your webhook, but we'll keep this example self-contained by relying on the basic HTTP server module in the Python standard library. The do_POST() method handles decoding and encoding the request and response as JSON.

The real hook logic is in the sync() method, and consists primarily of building a Pod object. Because Metacontroller uses apply semantics, you can simply return the Pod object as if you were creating it, every time. If the Pod already exists, Metacontroller will take care of updates according to your update strategy.

In this case, we set the update method to Recreate, so an existing Pod would be deleted and replaced if it doesn't match the desired state returned by your hook. Notice, however, that the hook code below doesn't need to mention any of that because it's only responsible for computing the desired state; the Metacontroller server takes care of reconciling with the observed state.

Save the following to a file called sync.py:

from http.server import BaseHTTPRequestHandler, HTTPServer
import json

class Controller(BaseHTTPRequestHandler):
  def sync(self, parent, children):
    # Compute status based on observed state.
    desired_status = {
      "pods": len(children["Pod.v1"])
    }

    # Generate the desired child object(s).
    who = parent.get("spec", {}).get("who", "World")
    desired_pods = [
      {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
          "name": parent["metadata"]["name"]
        },
        "spec": {
          "restartPolicy": "OnFailure",
          "containers": [
            {
              "name": "hello",
              "image": "busybox",
              "command": ["echo", "Hello, %s!" % who]
            }
          ]
        }
      }
    ]

    return {"status": desired_status, "children": desired_pods}

  def do_POST(self):
    # Serve the sync() function as a JSON webhook.
    observed = json.loads(self.rfile.read(int(self.headers.get("content-length"))))
    desired = self.sync(observed["parent"], observed["children"])

    self.send_response(200)
    self.send_header("Content-type", "application/json")
    self.end_headers()
    self.wfile.write(json.dumps(desired).encode())

HTTPServer(("", 80), Controller).serve_forever()

Then load it into your cluster as a ConfigMap:

kubectl -n hello create configmap hello-controller --from-file=sync.py

Note: The -n hello flag is important to put the ConfigMap in the hello namespace we created for the tutorial.

Deploy the webhook

Finally, since we wrote our hook as a self-contained Python web server, we need to deploy it somewhere that Metacontroller can reach. Luckily, we have this thing called Kubernetes which is great at hosting stateless web services.

Since our hook consists of only a small Python script, we'll use a generic Python container image and mount the script from the ConfigMap we created.

Save the following to a file called webhook.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello-controller
  template:
    metadata:
      labels:
        app: hello-controller
    spec:
      containers:
      - name: controller
        image: python:3
        command: ["python3", "/hooks/sync.py"]
        volumeMounts:
        - name: hooks
          mountPath: /hooks
      volumes:
      - name: hooks
        configMap:
          name: hello-controller
---
apiVersion: v1
kind: Service
metadata:
  name: hello-controller
spec:
  selector:
    app: hello-controller
  ports:
  - port: 80

Then apply it to your cluster:

kubectl -n hello apply -f webhook.yaml

Try it out

Now we can create HelloWorld objects and see what they do.

Save the following to a file called hello.yaml:

apiVersion: example.com/v1
kind: HelloWorld
metadata:
  name: your-name
spec:
  who: Your Name

Then apply it to your cluster:

kubectl -n hello apply -f hello.yaml

Our controller should see this and create a Pod that prints a greeting and then exits.

kubectl -n hello get pods

You should see something like this:

NAME                                READY     STATUS      RESTARTS   AGE
hello-controller-746fc7c4dc-rzslh   1/1       Running     0          2m
your-name                           0/1       Completed   0          15s

Then you can check the logs on the Completed Pod:

kubectl -n hello logs your-name

Which should look like this:

Hello, Your Name!

Now let's look at what happens when you update the parent object, for example to change the name:

kubectl -n hello patch helloworld your-name --type=merge -p '{"spec":{"who":"My Name"}}'

If you now check the Pod logs again:

kubectl -n hello logs your-name

You should see that the Pod was updated (actually deleted and recreated) to print a greeting to the new name, even though the hook code doesn't mention anything about updates.

Hello, My Name!

Clean up

Another thing Metacontroller does for you by default is set up links so that child objects are removed by the garbage collector when the parent goes away (assuming your cluster is version 1.8+).

You can check this by deleting the parent:

kubectl -n hello delete helloworld your-name

And then checking for the child Pod:

kubectl -n hello get pods

You should see that the child Pod was cleaned up automatically, so only the webhook Pod remains:

NAME                                READY     STATUS      RESTARTS   AGE
hello-controller-746fc7c4dc-rzslh   1/1       Running     0          3m

When you're done with the tutorial, you should remove the controller, CRD, and Namespace as follows:

kubectl delete compositecontroller hello-controller
kubectl delete crd helloworlds.example.com
kubectl delete ns hello

Next Steps

Constraints and best practices

This is a collection of recommendations for writing controllers with Metacontroller.

If you have something to add to the collection, please send a pull request against this document.

Constraints

Objects relationship

Because of limitations of Kubernetes garbage collection we have following restrictions between objects:

ParentChildRelated
Cluster- Cluster
- Namespaced (any namespace)
- Cluster
- Namespaced (any namespace)
Namespaced- Namespaced (the same namespace as parent)- Namespaced (the same namespace as parent)

Lambda Hooks

Apply Semantics

Because Metacontroller uses apply semantics, you don't have to think about whether a given object needs to be created (because it doesn't exist) or patched (because it exists and some fields don't match your desired state). In either case, you should generate a fresh object from scratch with only the fields you care about filled in.

For example, suppose you create an object like this:

apiVersion: example.com/v1
kind: Foo
metadata:
  name: my-foo
spec:
  importantField: 1

Then later you decide to change the value of importantField to 2.

Since Kubernetes API objects can be edited by the API server, users, and other controllers to collaboratively produce emergent behavior, the object you observe might now look like this:

apiVersion: example.com/v1
kind: Foo
metadata:
  name: my-foo
  stuffFilledByAPIServer: blah
spec:
  importantField: 1
  otherField: 5

To avoid overwriting the parts of the object you don't care about, you would ordinarily need to either build a patch or use a retry loop to send concurrency-safe updates. With apply semantics, you instead just call your "generate object" function again with the new values you want, and return this (as JSON):

apiVersion: example.com/v1
kind: Foo
metadata:
  name: my-foo
spec:
  importantField: 2

Metacontroller will take care of merging your change to importantField while preserving the fields you don't care about that were set by others.

Side Effects

Your hook code should generally be free of side effects whenever possible. Ideally, you should interpret a call to your hook as asking, "Hypothetically, if the observed state of the world were like this, what would your desired state be?"

In particular, Metacontroller may ask you about such hypothetical scenarios during rolling updates, when your object is undergoing a slow transition between two desired states. If your hook has to produce side effects to work, you should avoid enabling rolling updates on that controller.

Status

If your object uses the Spec/Status convention, keep in mind that the Status returned from your hook should ideally reflect a judgement on only the observed objects that were sent to you. The Status you compute should not yet account for your desired state, because the actual state of the world may not match what you want yet.

For example, if you observe 2 Pods, but you return a desired list of 3 Pods, you should return a Status that reflects only the observed Pods (e.g. replicas: 2). This is important so that Status reflects present reality, not future desires.

Working with Status subresource in metacontroller

If you would like to expose and use the Status subresource in your custom resource, you should take care of:

  1. having a proper CRD schema definition for Status section in order to let metacontroller update it successfully - it must be a part of CRD schema, i.e.
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: configmappropagations.examples.metacontroller.io
spec:
  ...
  versions:
  - name: v1alpha1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            ...
          status:
            type: object
            properties:
              expected_copies:
                type: integer
              actual_copies:
                type: integer
              observedGeneration:
                type: integer
        required:
        - spec
    subresources:
      status: {}
  1. your controller must be strict about the types in the schema defined in CRD, i.e., in example above do not try to set any of the integer fields as strings, or add additional fields there.

To read more about Status subresource please look at:

  • Kubernetes documentation - https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#status-subresource

Troubleshooting

This is a collection of tips for debugging controllers written with Metacontroller.

If you have something to add to the collection, please send a pull request against this document.

Events

As metacontroller emits kubernetes Events for internal actions, you might check events on parent object, like:

kubectl describe secretpropagations.examples.metacontroller.io <name>

where, at the end, you will see all events related with given parent:

Name:         secret-propagation
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  examples.metacontroller.io/v1alpha1
Kind:         SecretPropagation
Metadata:
  Creation Timestamp:  2021-07-14T20:25:09Z
...
Spec:
  Source Name:       shareable
  Source Namespace:  omega
  Target Namespace Label Selector:
    Match Labels:
      Propagate:  true
Status:
  Working:  fine
Events:
  Type     Reason     Age               From            Message
  ----     ------     ----              ----            -------
  Warning  SyncError  1s (x11 over 8s)  metacontroller  Sync error: sync hook failed for SecretPropagation /secret-propagation: sync hook failed: http error: Post "http://secret-propagation-controller.metacontroller/sync": dial tcp 10.96.138.14:80: connect: connection refused

You can access also events using kubectl get events, which return all events from given namespace. As metacontroller CRD's are might be cluster wide, they can land in default namespace:

> kubectl get events -n default  
39m         Normal    Started                 compositecontroller/secret-propagation-controller      Started controller: secret-propagation-controller
39m         Normal    Starting                compositecontroller/secret-propagation-controller      Starting controller: secret-propagation-controller
39m         Normal    Stopping                compositecontroller/secret-propagation-controller      Stopping controller: secret-propagation-controller
39m         Normal    Stopped                 compositecontroller/secret-propagation-controller      Stopped controller: secret-propagation-controller
6m25s       Normal    Started                 compositecontroller/secret-propagation-controller      Started controller: secret-propagation-controller
6m25s       Normal    Starting                compositecontroller/secret-propagation-controller      Starting controller: secret-propagation-controller
2m27s       Normal    Stopping                compositecontroller/secret-propagation-controller      Stopping controller: secret-propagation-controller
2m27s       Normal    Stopped                 compositecontroller/secret-propagation-controller      Stopped controller: secret-propagation-controller

Metacontroller Logs

Until Metacontroller emits events, the first place to look when troubleshooting controller behavior is the logs for the Metacontroller server itself.

For example, you can fetch the last 25 lines with a command like this:

kubectl -n metacontroller logs --tail=25 -l app=metacontroller

Log Levels

You can customize the verbosity of the Metacontroller server's logs with the --zap-log-level flag.

At all log levels, Metacontroller will log the progress of server startup and shutdown, as well as major changes like starting and stopping hosted controllers.

At level 4 and above, Metacontroller will log actions (like create/update/delete) on individual objects (like Pods) that it takes on behalf of hosted controllers. It will also log when it decides to sync a given controller as well as events that may trigger a sync.

At level 5 and above, Metacontroller will log the diffs between existing objects, and the desired state of those objects returned by controller hooks.

At level 6 and above, Metacontroller will log every hook invocation as well as the JSON request and response bodies.

Common Log Messages

Since API discovery info is refreshed periodically, you may see log messages like this when you start a controller that depends on a recently-installed CRD:

failed to sync CompositeController "my-controller": discovery: can't find resource <resource> in apiVersion <group>/<version>

Usually, this should fix itself within about 30s when the new CRD is discovered. If this message continues indefinitely, check that the resource name and API group/version are correct.

You may also notice periodic log messages like this:

Watch close - *unstructured.Unstructured total <X> items received

This comes from the underlying client-go library, and just indicates when the shared caches are periodically flushed to place an upper bound on cache inconsistency due to potential silent failures in long-running watches.

Webhook Logs

If you return an HTTP error code (e.g., 500) from your webhook, the Metacontroller server will log the text of the response body.

If you need more detail on what's happening inside your hook code, as opposed to what Metacontroller does for you, you'll need to add log statements to your own code and inspect the logs on your webhook server.

API reference

This section contains detailed reference information for the APIs offered by Metacontroller.

See the user guide for introductions and step-by-step walkthroughs.

Apply Semantics

This page describes how Metacontroller emulates kubectl apply.

CompositeController

CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects...

ControllerRevision

ControllerRevision is an internal API used by Metacontroller to implement declarative rolling updates.

DecoratorController

DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which re...

Hook

This page describes how hook targets are defined in various APIs.

Apply Semantics

This page describes how Metacontroller emulates kubectl apply.

In most cases, you should be able to think of Metacontroller's apply semantics as being the same as kubectl apply, but there are some differences.

Motivation

This section explains why Metacontroller uses apply semantics.

As an example, suppose you create a simple Pod like this with kubectl apply -f:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  labels:
    app: my-app
spec:
  containers:
  - name: nginx
    image: nginx

If you then read back the Pod you created with kubectl get pod my-pod -o yaml, you'll see a lot of extra fields filled in that you never set:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
      nginx'
  creationTimestamp: 2018-04-13T00:46:51Z
  labels:
    app: my-app
  name: my-pod
  namespace: default
  resourceVersion: "28573496"
  selfLink: /api/v1/namespaces/default/pods/my-pod
  uid: 27f1b2e1-3eb4-11e8-88d2-42010a800051
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources:
      requests:
        cpu: 100m
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
[...]

These fields may represent materialized default values and other metadata set by the API server, values set by built-in admission control or external admission plugins, or even values set by other controllers.

Rather than sifting through all that to find the fields you care about, kubectl apply lets you go back to your original, simple file, and make a change:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  labels:
    app: my-app
    role: staging # added a label
spec:
  containers:
  - name: nginx
    image: nginx

If you try to kubectl create -f your updated file, it will fail because you can't create something that already exists. If you try to kubectl replace -f your updated file, it will fail because it thinks you're trying to unset all those extra fields.

However, if you use kubectl apply -f with your updated file, it will update only the part you changed (adding a label), and leave all those extra fields untouched.

Metacontroller treats the desired objects you return from your hook in much the same way (but with some differences, such as support for strategic merge inside CRDs). As a result, you should always return the short form containing only the fields you care about, not the long form containing all the extra fields.

This generally means you should use the same code path to update things as you do to create them. Just generate a full JSON object from scratch every time, containing all the fields you care about, and only the fields you care about.

Metacontroller will figure out whether the object needs to be created or updated, and which fields it should and shouldn't touch in the case of an update.

Dynamic Apply

The biggest difference between kubectl's implementation of apply and Metacontroller's is that Metacontroller can emulate strategic merge inside CRDs.

For example, suppose you have a CRD with an embedded Pod template:

apiVersion: ctl.enisoc.com/v1
kind: CatSet # this resource is served via CRD
metadata:
  name: my-catset
spec:
  template: # embedded Pod template in CRD
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web

You create this with apply:

kubectl apply -f catset.yaml

The promise of apply is that it will "apply the changes you’ve made, without overwriting any automated changes to properties you haven’t specified".

As an example, suppose some other automation decides to edit your Pod template and add a sidecar container:

apiVersion: ctl.enisoc.com/v1
kind: CatSet # this resource is served via CRD
metadata:
  name: my-catset
spec:
  template: # embedded Pod template in CRD
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
      - name: sidecar
        image: log-uploader # fake sidecar example

Now suppose you change something in your local file and reapply it:

kubectl apply -f catset.yaml

Because kubectl apply doesn't support strategic merge inside CRDs, this will completely replace the containers list with yours, removing the sidecar container. By contrast, if this had been a Deployment or StatefulSet, kubectl apply would have preserved the sidecar container.

As a result, if a controller uses kubectl's apply implementation with CRDs, that controller will fight against automation that tries to add sidecar containers or makes other changes to lists of objects that Kubernetes expects to be treated like associative arrays (ports, volumes, etc.).

To avoid this fighting, and to make the experience of using CRDs beter match that of native resources, Metacontroller uses an alternative implementation of apply logic that's based on convention instead of configuration.

Conventions

The main convention that Metacontroller enforces on apply semantics is how to detect and handle "associative lists".

In Kubernetes API conventions, an associative list is a list of objects or scalars that should be treated as if it were a map (associative array), but because of limitations in JSON/YAML it looks the same as an ordered list when serialized.

For native resources, kubectl apply determines which lists are associative lists by configuration: it must have compiled-in knowledge of all the resources, and metadata about how each of their fields should be treated. There is currently no mechanism for CRDs to specify this metadata, which is why kubectl apply falls back to assuming all lists are "atomic", and should never be merged (only replaced entirely).

Even if there were a mechanism for CRDs to specify metadata for every field (e.g. through extensions to OpenAPI), it's not clear that it makes sense to require every CRD author to do so in order for their resources to behave correctly when used with kubecl apply. One alternative that has been considered for such "schemaless CRDs" is to establish a convention -- as long as your CRD follows the convention, you don't need to provide configuration.

Metacontroller implements one such convention that empirically handles many common cases encountered when embedding Pod templates in CRDs (although it has limitations), developed by surveying the use of associative lists across the resources built into Kubernetes:

  • A list is detected as an associative list if and only if all of the following conditions are met:
    • All items in the list are JSON objects (not scalars, nor other lists).
    • All objects in the list have some field name in common, where that field name is one of the conventional merge keys (most commonly name).
  • If a list is detected as an associative list, the conventional field name that all objects have in common (e.g. name) is used as the merge key.
    • If more than one conventional merge key might work, pick only one according to a fixed order.

This allows Metacontroller to "do the right thing" in the majority of cases, without requiring advance knowledge about the resources it's working with -- knowledge that's not available anywhere in the case of CRDs.

In the future, Metacontroller will likely switch from this custom apply implementation to server-side apply, which is trying to solve the broader problem for all components that interact with the Kubernetes API. However, it's not yet clear whether that proposal will embrace schemaless CRDs and support apply semantics on them.

Limitations

A convention-based approach is necessarily more limiting than the native apply implementation, which supports arbitrary per-field configuration. The trade-off is that conventions reduce boilerplate and lower the barrier to entry for simple use cases.

This section lists some examples of configurations that the native apply allows, but are currently not supported in Metacontroller's convention-based apply. If any of these are blockers for you, please file an issue describing your use case.

  • Atomic object lists
    • A list of objects that share one of the conventional keys, but should nevertheless be treated atomically (replaced rather than merged).
  • Unconventional associative list keys
    • An associative list that doesn't use one of the conventional keys.
  • Multi-field associative list keys
    • A key that's composed of two or more fields (e.g. both port and protocol).
  • Scalar-valued associative lists
    • A list of scalars (not objects) that should be merged as if the scalar values were field names in an object.

CompositeController

CompositeController is an API provided by Metacontroller, designed to facilitate custom controllers whose primary purpose is to manage a set of child objects based on the desired state specified in a parent object.

Workload controllers like Deployment and StatefulSet are examples of existing controllers that fit this pattern.

This page is a detailed reference of all the features available in this API. See the Create a Controller guide for a step-by-step walkthrough.

Example

This example CompositeController defines a controller that behaves like StatefulSet.

apiVersion: metacontroller.k8s.io/v1alpha1
kind: CompositeController
metadata:
  name: catset-controller
spec:
  parentResource:
    apiVersion: ctl.enisoc.com/v1
    resource: catsets
    revisionHistory:
      fieldPaths:
      - spec.template
  childResources:
  - apiVersion: v1
    resource: pods
    updateStrategy:
      method: RollingRecreate
      statusChecks:
        conditions:
        - type: Ready
          status: "True"
  - apiVersion: v1
    resource: persistentvolumeclaims
  hooks:
    sync:
      webhook:
        url: http://catset-controller.metacontroller/sync
        timeout: 10s

Spec

A CompositeController spec has the following fields:

FieldDescription
parentResourceA single resource rule specifying the parent resource.
childResourcesA list of resource rules specifying the child resources.
resyncPeriodSecondsHow often, in seconds, you want every parent object to be resynced (sent to your hook), even if no changes are detected.
generateSelectorIf true, ignore the selector in each parent object and instead generate a unique selector that prevents overlap with other objects.
hooksA set of lambda hooks for defining your controller's behavior.

Parent Resource

The parent resource is the "entry point" for the CompositeController. It should contain the information your controller needs to create children, such as a Pod template if your controller creates Pods. This is often a custom resource that you define (e.g. with CRD), and for which you are now implementing a custom controller.

CompositeController expects to have full control over this resource. That is, you shouldn't define a CompositeController with a parent resource that already has its own controller. See DecoratorController for an API that's better suited for adding behavior to existing resources.

The parentResource rule has the following fields:

FieldDescription
apiVersionThe API <group>/<version> of the parent resource, or just <version> for core APIs. (e.g. v1, apps/v1, batch/v1)
resourceThe canonical, lowercase, plural name of the parent resource. (e.g. deployments, replicasets, statefulsets)
labelSelectorAn optional label selector for narrowing down the objects to target. When not set defaults to all objects
revisionHistoryIf any child resources use rolling updates, this field specifies how parent revisions are tracked.
ignoreStatusChangesAn optional field through which status changes can be ignored for reconcilation. If set to true, only spec changes or labels/annotations changes will reconcile the parent resource.

Label Selector

Kubernetes APIs use labels and selectors to define subsets of objects, such as the Pods managed by a given ReplicaSet.

The parent resource of a CompositeController is assumed to have a spec.selector that matches the form of spec.selector in built-in resources like Deployment and StatefulSet (with matchLabels and/or matchExpressions).

If the parent object doesn't have this field, or it can't be parsed in the expected label selector format, the sync hook for that parent will fail, unless you are using selector generation.

The parent's label selector determines which child objects a given parent will try to manage, according to the ControllerRef rules. Metacontroller automatically handles orphaning and adoption for you, and will only send you the observed states of children you own.

These rules imply:

  • Children you create must have labels that satisfy the parent's selector, or else they will be immediately orphaned and you'll never see them again.
  • If other controllers or users create orphaned objects that match the parent's selector, Metacontroller will try to adopt them for you.
  • If Metacontroller adopts an object, and you subsequently decline to list that object in your desired list of children, it will get deleted (because you now own it, but said you don't want it).

To avoid confusion, it's therefore important that users of your custom controller specify a spec.selector (on each parent object) that is sufficiently precise to discriminate its child objects from those of other parents in the same namespace.

Revision History

Within the parentResource rule, the revisionHistory field has the following subfields:

FieldDescription
fieldPathsA list of field path strings (e.g. spec.template) specifying which parent fields trigger rolling updates of children (for any child resources that use rolling updates). Changes to other parent fields (e.g. spec.replicas) apply immediately. Defaults to ["spec"], meaning any change in the parent's spec triggers a rolling update.

Child Resources

This list should contain a rule for every type of child resource that your controller creates on behalf of each parent.

Each entry in the childResources list has the following fields:

FieldDescription
apiVersionThe API group/version of the child resource, or just version for core APIs. (e.g. v1, apps/v1, batch/v1)
resourceThe canonical, lowercase, plural name of the child resource. (e.g. deployments, replicasets, statefulsets)
updateStrategyAn optional field that specifies how to update children when they already exist but don't match your desired state. If no update strategy is specified, children of that type will never be updated if they already exist.

Child Update Strategy

Within each rule in the childResources list, the updateStrategy field has the following subfields:

FieldDescription
methodA string indicating the overall method that should be used for updating this type of child resource. The default is OnDelete, which means don't try to update children that already exist.
statusChecksIf any rolling update method is selected, children that have already been updated must pass these status checks before the rollout will continue, please also read this section

Child Update Methods

Within each child resource's updateStrategy, the method field can have these values:

MethodDescription
OnDeleteDon't update existing children unless they get deleted by some other agent.
RecreateImmediately delete any children that differ from the desired state, and recreate them in the desired state.
InPlaceImmediately update any children that differ from the desired state.
RollingRecreateDelete each child that differs from the desired state, one at a time, and recreate each child before moving on to the next one. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks.
RollingInPlaceUpdate each child that differs from the desired state, one at a time. Pause the rollout if at any time one of the children that have already been updated fails one or more status checks.

Child Update Status Checks

Within each updateStrategy, the statusChecks field has the following subfields:

FieldDescription
conditionsA list of status condition checks that must all pass on already-updated children for the rollout to continue.

Status Condition Check

Within a set of statusChecks, each item in the conditions list has the following subfields:

FieldDescription
typeA string specifying the status condition type to check.
statusA string specifying the required status of the given status condition. If none is specified, the condition's status is not checked.
reasonA string specifying the required reason of the given status condition. If none is specified, the condition's reason is not checked.

Resync Period

By default, your sync hook will only be called when something changes in one of the resources you're watching, or when the local cache is flushed.

Sometimes you may want to sync periodically even if nothing has changed in the Kubernetes API objects, either to simply observe the passage of time, or because your hook takes external state into account. For example, CronJob uses a periodic resync to check whether it's time to start a new Job.

The resyncPeriodSeconds value specifies how often to do this. Each time it triggers, Metacontroller will send sync hook requests for all objects of the parent resource type, with the latest observed values of all the necessary objects.

Note that these objects will be retrieved from Metacontroller's local cache (kept up-to-date through watches), so adding a resync shouldn't add more load on the API server, unless you actually change objects. For example, it's relatively cheap to use this setting to poll until it's time to trigger some change, as long as most sync calls result in a no-op (no CRUD operations needed to achieve desired state).

Generate Selector

Usually, each parent object managed by a CompositeController must have its own user-specified label selector, just like each Deployment has its own label selector in spec.selector. However, sometimes it makes more sense to let the user of your API pretend there are no labels or label selectors.

For example, the built-in Job API doesn't make you specify labels for your Pods, and you can leave spec.selector unset. Because each Job object represents a unique invocation at a point in time, you wouldn't expect a newly-created Job to be satisfied by finding a pre-existing Pod that just happens to have the right labels. On the other hand, a ReplicaSet assumes all Pods that match its selector are interchangeable, so it would be happy to have one less replica it has to create.

If you set spec.generateSelector to true in your CompositeController definition, Metacontroller will do the following:

  • When creating children for you, Metacontroller will automatically add a label that points to the parent object's unique ID (metadata.uid).
  • Metacontroller will not expect each parent object to contain a spec.selector, and will ignore the value even if one is set.
  • Metacontroller will manage children as if each parent object had an "imaginary" label selector that points to the unique ID label that Metacontroller added to all your children.

The end result is that you and the users of your API don't have to think about labels or selectors, similar to the Job API. The downside is that your API won't support all the same capabilities as built-in APIs. For example, with ReplicaSet or StatefulSet, you can delete the controller with kubectl delete --cascade=false to keep the Pods around, and later create a new controller with the same selector to adopt those existing Pods instead of making new ones from scratch.

Hooks

Within the CompositeController spec, the hooks field has the following subfields:

FieldDescription
syncSpecifies how to call your sync hook, if any.
finalizeSpecifies how to call your finalize hook, if any.
customizeSpecifies how to call your customize hook, if any.

Each field of hooks contains subfields that specify how to invoke that hook, such as by sending a request to a webhook.

Sync Hook

The sync hook is how you specify which children to create/maintain for a given parent -- in other words, your desired state.

Based on the CompositeController spec, Metacontroller gathers up all the resources you said you need to decide on the desired state, and sends you their latest observed states.

After you return your desired state, Metacontroller begins to take action to converge towards it -- creating, deleting, and updating objects as appropriate.

A simple way to think about your sync hook implementation is like a script that generates JSON to be sent to kubectl apply. However, unlike a one-off client-side generator, your script has access to the latest observed state in the cluster, and will automatically get called any time that observed state changes.

Sync Hook Request

A separate request will be sent for each parent object, so your hook only needs to think about one parent at a time.

The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:

FieldDescription
controllerThe whole CompositeController object, like what you might get from kubectl get compositecontroller <name> -o json.
parentThe parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json.
childrenAn associative array of child objects that already exist.
relatedAn associative array of related objects that exists, if customize hook was specified. See the customize hook
finalizingThis is always false for the sync hook. See the finalize hook for details.

Each field of the children object represents one of the types of child resources you specified in your CompositeController spec. The field name for each child type is <Kind>.<apiVersion>, where <apiVersion> could be just <version> (for a core resource) or <group>/<version>, just like you'd write in a YAML file.

For example, the field name for Pods would be Pod.v1, while the field name for StatefulSets might be StatefulSet.apps/v1.

For resources that exist in multiple versions, the apiVersion you specify in the child resource rule is the one you'll be sent. Metacontroller requires you to be explicit about the version you expect because it does conversion for you as needed, so your hook doesn't need to know how to convert between different versions of a given resource.

Within each child type (e.g. in children['Pod.v1']), there is another associative array that maps from the child's path relative to the parent to the JSON representation, like what you might get from kubectl get <child-resource> <child-name> -o json.

If the parent and child are of the same scope - both cluster or both namespace - then the key is only the child's .metadata.name. If the parent is cluster scoped and the child is namespace scoped, then the key will be of the form {.metadata.namespace}/{.metadata.name}. This is to disambiguate between two children with the same name in different namespaces. A parent may never be namespace scoped while a child is cluster scoped.

For example, a Pod named my-pod in the my-namespace namespace could be accessed as follows if the parent is also in my-namespace:

request.children['Pod.v1']['my-pod']

Alternatively, if the parent resource is cluster scoped, the Pod could be accessed as:

request.children['Pod.v1']['my-namespace/my-pod']

Note that you will only be sent children that you "own" according to the ControllerRef rules. That means, for a given parent object, you will only see children whose labels match the parent's label selector, and that don't belong to any other parent.

There will always be an entry in children for every child resource rule, even if no children of that type were observed at the time of the sync. For example, if you listed Pods as a child resource rule, but no existing Pods matched the parent's selector, you will receive:

{
  "children": {
    "Pod.v1": {}
  }
}

as opposed to:

{
  "children": {}
}

Related resources, represented under related field, are present in the same form as children, but representing resources matching customize hook response for given parent object. Those object are not managed by controller, therefore are unmodificable, but you can use them to calculate children's. Some existing examples implementing this approach are :

  • ConfigMapPropagation - makes copy of given ConfigMap in several namespaces.
  • GlobalConfigMap - makes copy of given ConfigMap in every namespace.
  • SecretPropagation - makes copy of given Secret in reach namespace satisfying label selector.

Please note, than when related resources is updated, sync hook is triggered again (even if parent object and children does not change) - and you can recalculate children state according to fresh view of related objects.

Sync Hook Response

The body of your response should be a JSON object with the following fields:

FieldDescription
statusA JSON object that will completely replace the status field within the parent object.
childrenA list of JSON objects representing all the desired children for this parent object.
resyncAfterSecondsSet the delay (in seconds, as a float) before an optional, one-time, per-object resync.

What you put in status is up to you, but usually it's best to follow conventions established by controllers like Deployment. You should compute status based only on the children that existed when your hook was called; status represents a report on the last observed state, not the new desired state.

The children field should contain a flat list of objects, not an associative array. Metacontroller groups the objects it sends you by type and name as a convenience to simplify your scripts, but it's actually redundant since each object contains its own apiVersion, kind, and metadata.name.

It's important to include the apiVersion and kind in objects you return, and also to ensure that you list every type of child resource you plan to create in the CompositeController spec.

If the parent resource is cluster scoped and the child resource is namespaced, it's important to include the .metadata.namespace since the namespace cannot be inferred from the parent's namespace.

Any objects sent as children in the request that you decline to return in your response list will be deleted. However, you shouldn't directly copy children from the request into the response because they're in different forms.

Instead, you should think of each entry in the list of children as being sent to kubectl apply. That is, you should set only the fields that you care about.

You can optionally set resyncAfterSeconds to a value greater than 0 to request that the sync hook be called again with this particular parent object after some delay (specified in seconds, with decimal fractions allowed). Unlike the controller-wide resyncPeriodSeconds, this is a one-time request (not a request to start periodic resyncs), although you can always return another resyncAfterSeconds value from subsequent sync calls. Also unlike the controller-wide setting, this request only applies to the particular parent object that this sync call sent, so you can request different delays (or omit the request) depending on the state of each object.

Note that your webhook handler must return a response with a status code of 200 to be considered successful. Metacontroller will wait for a response for up to the amount defined in the Webhook spec.

Finalize Hook

If the finalize hook is defined, Metacontroller will add a finalizer to the parent object, which will prevent it from being deleted until your hook has had a chance to run and the response indicates that you're done cleaning up.

This is useful for doing ordered teardown of children, or for cleaning up resources you may have created in an external system. If you don't define a finalize hook, then when a parent object is deleted, the garbage collector will delete all your children immediately, and no hooks will be called.

The semantics of the finalize hook are mostly equivalent to those of the sync hook. Metacontroller will attempt to reconcile the desired states you return in the children field, and will set status on the parent. The main difference is that finalize will be called instead of sync when it's time to clean up because the parent object is pending deletion.

Note that, just like sync, your finalize handler must be idempotent. Metacontroller might call your hook multiple times as the observed state changes, possibly even after you first indicate that you're done finalizing. Your handler should know how to check what still needs to be done and report success if there's nothing left to do.

Both sync and finalize have a request field called finalizing that indicates which hook was actually called. This lets you implement finalize either as a separate handler or as a check within your sync handler, depending on how much logic they share. To use the same handler for both, just define a finalize hook and set it to the same value as your sync hook.

Finalize Hook Request

The finalize hook request has all the same fields as the sync hook request, with the following changes:

FieldDescription
finalizingThis is always true for the finalize hook. See the finalize hook for details.

If you share the same handler for both sync and finalize, you can use the finalizing field to tell whether it's time to clean up or whether it's a normal sync. If you define a separate handler just for finalize, there's no need to check the finalizing field since it will always be true.

Finalize Hook Response

The finalize hook response has all the same fields as the sync hook response, with the following additions:

FieldDescription
finalizedA boolean indicating whether you are done finalizing.

To perform ordered teardown, you can generate children just like you would for sync, but omit some children from the desired state depending on the observed set of children that are left. For example, if you observe [A,B,C], generate only [A,B] as your desired state; if you observe [A,B], generate only [A]; if you observe [A], return an empty desired list [].

Once the observed state passed in with the finalize request meets all your criteria (e.g. no more children were observed), and you have checked all other criteria (e.g. no corresponding external resource exists), return true for the finalized field in your response.

Note that you should not return finalized: true the first time you return a desired state that you consider "final", since there's no guarantee that your desired state will be reached immediately. Instead, you should wait until the observed state matches what you want.

If the observed state passed in with the request doesn't meet your criteria, you can return a successful response (HTTP code 200) with finalized: false, and Metacontroller will call your hook again automatically if anything changes in the observed state.

If the only thing you're still waiting for is a state change in an external system, and you don't need to assert any new desired state for your children, returning success from the finalize hook may mean that Metacontroller doesn't call your hook again until the next periodic resync. To reduce the delay, you can request a one-time, per-object resync by setting resyncAfterSeconds in your hook response, giving you a chance to recheck the external state without holding up a slot in the work queue.

Customize Hook

See Customize hook spec

ControllerRevision

ControllerRevision is an internal API used by Metacontroller to implement declarative rolling updates.

Users of Metacontroller normally shouldn't need to know about this API, but it is documented here for Metacontroller contributors, as well as for troubleshooting.

Note that this is different from the ControllerRevision in apps/v1, although it serves a similar purpose. You will likely need to use a fully-qualified resource name to inspect Metacontroller's ControllerRevisions:

kubectl get controllerrevisions.metacontroller.k8s.io

Each ControllerRevision's name is a combination of the name and API group (excluding the version suffix) of the resource that it's a revision of, as well as a hash that is deterministic yet unique (used only for idempotent creation, not for lookup).

By default, ControllerRevisions belonging to a particular parent instance will get garbage-collected if the parent is deleted. However, it is possible to orphan ControllerRevisions during parent deletion, and then create a replacement parent to adopt them. ControllerRevisions are adopted based on the parent's label selector, the same way controllers like ReplicaSet adopt Pods.

Example

apiVersion: metacontroller.k8s.io/v1alpha1
kind: ControllerRevision
metadata:
  name: catsets.ctl.enisoc.com-5463ba99b804a121d35d14a5ab74546d1e8ba953
  labels:
    app: nginx
    component: backend
    metacontroller.k8s.io/apiGroup: ctl.enisoc.com
    metacontroller.k8s.io/resource: catsets
parentPatch:
  spec:
    template:
      [...]
children:
- apiGroup: ""
  kind: Pod
  names:
  - nginx-backend-0
  - nginx-backend-1
  - nginx-backend-2

Parent Patch

The parentPatch field stores a partial representation of the parent object at a given revision, containing only those fields listed by the lambda controller author as participating in rolling updates.

For example, if a CompositeController's revision history specifies a fieldPaths list of ["spec.template"], the parent patch will contain only spec.template and any subfields nested within it.

This mirrors the selective behavior of rolling updates in built-in APIs like Deployment and StatefulSet. Any fields that aren't part of the parent patch take effect immediately, rather than rolling out gradually.

Children

The children field stores a list of child objects that "belong" to this particular revision of the parent.

This is how Metacontroller keeps track of the current desired revision of a given child. For example, if a Pod that hasn't been updated yet gets deleted by a Node drain, it should be replaced at the revision it was on before it got deleted, not at the latest revision.

When Metacontroller decides it's time to update a given child to another revision, it first records this intention by updating the relevant ControllerRevision objects. After committing these records, it then begins updating that child according to the configured child update strategy. This ensures that the intermediate progress of the rollout is persisted in the API server so it survives process restarts.

Children are grouped by API Group (excluding the version suffix) and Kind. For each Group-Kind, we store a list of object names. Note that parent and children must be in the same namespace, and ControllerRevisions for a given parent also live in that parent's namespace.

DecoratorController

DecoratorController is an API provided by Metacontroller, designed to facilitate adding new behavior to existing resources. You can define rules for which resources to watch, as well as filters on labels and annotations.

This page is a detailed reference of all the features available in this API. See the Create a Controller guide for a step-by-step walkthrough.

Example

This example DecoratorController attaches a Service for each Pod belonging to a StatefulSet, for any StatefulSet that requests this behavior through a set of annotations.

apiVersion: metacontroller.k8s.io/v1alpha1
kind: DecoratorController
metadata:
  name: service-per-pod
spec:
  resources:
  - apiVersion: apps/v1
    resource: statefulsets
    annotationSelector:
      matchExpressions:
      - {key: service-per-pod-label, operator: Exists}
      - {key: service-per-pod-ports, operator: Exists}
  attachments:
  - apiVersion: v1
    resource: services
  hooks:
    sync:
      webhook:
        url: http://service-per-pod.metacontroller/sync-service-per-pod
        timeout: 10s

Spec

A DecoratorController spec has the following fields:

FieldDescription
resourcesA list of resource rules specifying which objects to target for decoration (adding behavior).
attachmentsA list of resource rules specifying what this decorator can attach to the target resources.
resyncPeriodSecondsHow often, in seconds, you want every target object to be resynced (sent to your hook), even if no changes are detected.
hooksA set of lambda hooks for defining your controller's behavior.

Resources

Each DecoratorController can target one or more types of resources. For every object that matches one of these rules, Metacontroller will call your sync hook to ask for your desired state.

Each entry in the resources list has the following fields:

FieldDescription
apiVersionThe API <group>/<version> of the target resource, or just <version> for core APIs. (e.g. v1, apps/v1, batch/v1)
resourceThe canonical, lowercase, plural name of the target resource. (e.g. deployments, replicasets, statefulsets)
labelSelectorAn optional label selector for narrowing down the objects to target.
annotationSelectorAn optional annotation selector for narrowing down the objects to target.
ignoreStatusChangesAn optional field through which status changes can be ignored for reconcilation. If set to true, only spec changes or labels/annotations changes will reconcile the parent resource.

Label Selector

The labelSelector field within a resource rule has the following subfields:

FieldDescription
matchLabelsA map of key-value pairs representing labels that must exist and have the specified values in order for an object to satisfy the selector.
matchExpressionsA list of set-based requirements on labels in order for an object to satisfy the selector.

This label selector has the same format and semantics as the selector in built-in APIs like Deployment.

If a labelSelector is specified for a given resource type, the DecoratorController will ignore any objects of that type that don't satisfy the selector.

If a resource rule has both a labelSelector and an annotationSelector, the DecoratorController will only target objects of that type that satisfy both selectors.

Annotation Selector

The annotationSelector field within a resource rule has the following subfields:

FieldDescription
matchAnnotationsA map of key-value pairs representing annotations that must exist and have the specified values in order for an object to satisfy the selector.
matchExpressionsA list of set-based requirements on annotations in order for an object to satisfy the selector.

The annotation selector has an analogous format and semantics to the label selector (note the field name matchAnnotations rather than matchLabels).

If an annotationSelector is specified for a given resource type, the DecoratorController will ignore any objects of that type that don't satisfy the selector.

If a resource rule has both a labelSelector and an annotationSelector, the DecoratorController will only target objects of that type that satisfy both selectors.

Attachments

This list should contain a rule for every type of resource your controller wants to attach to an object of one of the targeted resources.

Unlike child resources in CompositeController, attachments are not related to the target object through labels and label selectors. This allows you to attach arbitrary things (which may not have any labels) to other arbitrary things (which may not even have a selector).

Instead, attachments are only connected to the target object through owner references, meaning they will get cleaned up if the target object is deleted.

Each entry in the attachments list has the following fields:

FieldDescription
apiVersionThe API group/version of the attached resource, or just version for core APIs. (e.g. v1, apps/v1, batch/v1)
resourceThe canonical, lowercase, plural name of the attached resource. (e.g. deployments, replicasets, statefulsets)
updateStrategyAn optional field that specifies how to update attachments when they already exist but don't match your desired state. If no update strategy is specified, attachments of that type will never be updated if they already exist.

Attachment Update Strategy

Within each rule in the attachments list, the updateStrategy field has the following subfields:

FieldDescription
methodA string indicating the overall method that should be used for updating this type of attachment resource. The default is OnDelete, which means don't try to update attachments that already exist.

Attachment Update Methods

Within each attachment resource's updateStrategy, the method field can have these values:

MethodDescription
OnDeleteDon't update existing attachments unless they get deleted by some other agent.
RecreateImmediately delete any attachments that differ from the desired state, and recreate them in the desired state.
InPlaceImmediately update any attachments that differ from the desired state.

Note that DecoratorController doesn't directly support rolling update of attachments because you can compose such behavior by attaching a CompositeController (or any other API that supports declarative rolling update, like Deployment or StatefulSet).

Resync Period

The resyncPeriodSeconds field in DecoratorController's spec works similarly to the same field in CompositeController.

Hooks

Within the DecoratorController spec, the hooks field has the following subfields:

FieldDescription
syncSpecifies how to call your sync hook, if any.
finalizeSpecifies how to call your finalize hook, if any.
customizeSpecifies how to call your customize hook, if any.

Each field of hooks contains subfields that specify how to invoke that hook, such as by sending a request to a webhook.

Sync Hook

The sync hook is how you specify which attachments to create/maintain for a given target object -- in other words, your desired state.

Based on the DecoratorController spec, Metacontroller gathers up all the resources you said you need to decide on the desired state, and sends you their latest observed states.

After you return your desired state, Metacontroller begins to take action to converge towards it -- creating, deleting, and updating objects as appropriate.

A simple way to think about your sync hook implementation is like a script that generates JSON to be sent to kubectl apply. However, unlike a one-off client-side generator, your script has access to the latest observed state in the cluster, and will automatically get called any time that observed state changes.

Sync Hook Request

A separate request will be sent for each target object, so your hook only needs to think about one target object at a time.

The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:

FieldDescription
controllerThe whole DecoratorController object, like what you might get from kubectl get decoratorcontroller <name> -o json.
objectThe target object, like what you might get from kubectl get <target-resource> <target-name> -o json.
attachmentsAn associative array of attachments that already exist.
relatedAn associative array of related objects that exists, if customize hook was specified. See the customize hook
finalizingThis is always false for the sync hook. See the finalize hook for details.

Each field of the attachments object represents one of the types of attachment resources in your DecoratorController spec. The field name for each attachment type is <Kind>.<apiVersion>, where <apiVersion> could be just <version> (for a core resource) or <group>/<version>, just like you'd write in a YAML file.

For example, the field name for Pods would be Pod.v1, while the field name for StatefulSets might be StatefulSet.apps/v1.

For resources that exist in multiple versions, the apiVersion you specify in the attachment resource rule is the one you'll be sent. Metacontroller requires you to be explicit about the version you expect because it does conversion for you as needed, so your hook doesn't need to know how to convert between different versions of a given resource.

Within each attachment type (e.g. in attachments['Pod.v1']), there is another associative array that maps from the attachment's path relative to the parent to the JSON representation, like what you might get from kubectl get <attachment-resource> <attachment-name> -o json.

If the parent and attachment are of the same scope - both cluster or both namespace - then the key is only the object's .metadata.name. If the parent is cluster scoped and the attachment is namespace scoped, then the key will be of the form {.metadata.namespace}/{.metadata.name}. This is to disambiguate between two attachments with the same name in different namespaces. A parent may never be namespace scoped while an attachment is cluster scoped.

For example, a Pod named my-pod in the my-namespace namespace could be accessed as follows if the parent is also in my-namespace:

request.attachments['Pod.v1']['my-pod']

Alternatively, if the parent resource is cluster scoped, the Pod could be accessed as:

request.attachments['Pod.v1']['my-namespace/my-pod']

Note that you will only be sent objects that are owned by the target (i.e. objects you attached), not all objects of that resource type.

There will always be an entry in attachments for every attachment resource rule, even if no attachments of that type were observed at the time of the sync. For example, if you listed Pods as an attachment resource rule, but no existing Pods have been attached, you will receive:

{
  "attachments": {
    "Pod.v1": {}
  }
}

as opposed to:

{
  "attachments": {}
}

Related resources, represented under related field, are present in the same form as attachements, but representing resources matching customize hook response for given parent object. Those object are not managed by controller, therefore are unmodificable, but you can use them to calculate attachements. Some existing examples implementing this approach are :

  • ConfigMapPropagation - makes copy of given ConfigMap in several namespaces.
  • GlobalConfigMap - makes copy of given ConfigMap in every namespace.
  • SecretPropagation - makes copy of given Secret in reach namespace satisfying label selector.

Please note, than when related resources is updated, sync hook is triggered again (even if parent object and attachements does not change) - and you can recalculate children state according to fresh view of related objects.

Sync Hook Response

The body of your response should be a JSON object with the following fields:

FieldDescription
labelsA map of key-value pairs for labels to set on the target object.
annotationsA map of key-value pairs for annotations to set on the target object.
statusA JSON object that will completely replace the status field within the target object. Leave unspecified or null to avoid changing status.
attachmentsA list of JSON objects representing all the desired attachments for this target object.
resyncAfterSecondsSet the delay (in seconds, as a float) before an optional, one-time, per-object resync.

By convention, the controller for a given resource should not modify its own spec, so your decorator can't mutate the target's spec.

As a result, decorators currently cannot modify the target object except to optionally set labels, annotations, and status on it. Note that if the target resource already has its own controller, that controller might ignore and overwrite any status updates you make.

The attachments field should contain a flat list of objects, not an associative array. Metacontroller groups the objects it sends you by type and name as a convenience to simplify your scripts, but it's actually redundant since each object contains its own apiVersion, kind, and metadata.name.

It's important to include the apiVersion and kind in objects you return, and also to ensure that you list every type of attachment resource you plan to create in the DecoratorController spec.

If the parent resource is cluster scoped and the child resource is namespaced, it's important to include the .metadata.namespace since the namespace cannot be inferred from the parent's namespace.

Any objects sent as attachments in the request that you decline to return in your response list will be deleted. However, you shouldn't directly copy attachments from the request into the response because they're in different forms.

Instead, you should think of each entry in the list of attachments as being sent to kubectl apply. That is, you should set only the fields that you care about.

You can optionally set resyncAfterSeconds to a value greater than 0 to request that the sync hook be called again with this particular parent object after some delay (specified in seconds, with decimal fractions allowed). Unlike the controller-wide resyncPeriodSeconds, this is a one-time request (not a request to start periodic resyncs), although you can always return another resyncAfterSeconds value from subsequent sync calls. Also unlike the controller-wide setting, this request only applies to the particular parent object that this sync call sent, so you can request different delays (or omit the request) depending on the state of each object.

Note that your webhook handler must return a response with a status code of 200 to be considered successful. Metacontroller will wait for a response for up to the amount defined in the Webhook spec.

Finalize Hook

If the finalize hook is defined, Metacontroller will add a finalizer to the parent object, which will prevent it from being deleted until your hook has had a chance to run and the response indicates that you're done cleaning up.

This is useful for doing ordered teardown of attachments, or for cleaning up resources you may have created in an external system. If you don't define a finalize hook, then when a parent object is deleted, the garbage collector will delete all your attachments immediately, and no hooks will be called.

In addition to finalizing when an object is deleted, Metacontroller will also call your finalize hook on objects that were previously sent to sync but now no longer match the DecoratorController's label and annotation selectors. This allows you to clean up after yourself when the object has been updated to opt out of the functionality added by your decorator, even if the object is not being deleted. If you don't define a finalize hook, then when the object opts out, any attachments you added will remain until the object is deleted, and no hooks will be called.

The semantics of the finalize hook are mostly equivalent to those of the sync hook. Metacontroller will attempt to reconcile the desired states you return in the attachments field, and will set labels and annotations as requested. The main difference is that finalize will be called instead of sync when it's time to clean up because the parent object is pending deletion.

Note that, just like sync, your finalize handler must be idempotent. Metacontroller might call your hook multiple times as the observed state changes, possibly even after you first indicate that you're done finalizing. Your handler should know how to check what still needs to be done and report success if there's nothing left to do.

Both sync and finalize have a request field called finalizing that indicates which hook was actually called. This lets you implement finalize either as a separate handler or as a check within your sync handler, depending on how much logic they share. To use the same handler for both, just define a finalize hook and set it to the same value as your sync hook.

Finalize Hook Request

The finalize hook request has all the same fields as the sync hook request, with the following changes:

FieldDescription
finalizingThis is always true for the finalize hook. See the finalize hook for details.

If you share the same handler for both sync and finalize, you can use the finalizing field to tell whether it's time to clean up or whether it's a normal sync. If you define a separate handler just for finalize, there's no need to check the finalizing field since it will always be true.

Finalize Hook Response

The finalize hook response has all the same fields as the sync hook response, with the following additions:

FieldDescription
finalizedA boolean indicating whether you are done finalizing.

To perform ordered teardown, you can generate attachments just like you would for sync, but omit some attachments from the desired state depending on the observed set of attachments that are left. For example, if you observe [A,B,C], generate only [A,B] as your desired state; if you observe [A,B], generate only [A]; if you observe [A], return an empty desired list [].

Once the observed state passed in with the finalize request meets all your criteria (e.g. no more attachments were observed), and you have checked all other criteria (e.g. no corresponding external resource exists), return true for the finalized field in your response.

Note that you should not return finalized: true the first time you return a desired state that you consider "final", since there's no guarantee that your desired state will be reached immediately. Instead, you should wait until the observed state matches what you want.

If the observed state passed in with the request doesn't meet your criteria, you can return a successful response (HTTP code 200) with finalized: false, and Metacontroller will call your hook again automatically if anything changes in the observed state.

If the only thing you're still waiting for is a state change in an external system, and you don't need to assert any new desired state for your children, returning success from the finalize hook may mean that Metacontroller doesn't call your hook again until the next periodic resync. To reduce the delay, you can request a one-time, per-object resync by setting resyncAfterSeconds in your hook response, giving you a chance to recheck the external state without holding up a slot in the work queue.

Customize Hook

See Customize hook spec

Customize Hook

If the customize hook is defined, Metacontroller will ask for which related objects, or classes of objects that your sync and finalize hooks need to know about. This is useful for mapping across many objects. One example would be a controller that lets you specify ConfigMaps to be placed in every Namespace. Another use-case is being able to reference other objects, e.g. the env section from a core Pod object. If you don't define a customize hook, then the related section of the hooks will be empty.

The customize hook will not provide any information about the current state of the cluster. Thus, the set of related objects may only depend on the state of the parent object.

This hook may also accept other fields in future, for other customizations.

Customize Hook Request

A separate request will be sent for each parent object, so your hook only needs to think about one parent at a time.

The body of the request (a POST in the case of a webhook) will be a JSON object with the following fields:

FieldDescription
controllerThe whole CompositeController object, like what you might get from kubectl get compositecontroller <name> -o json.
parentThe parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json.

Customize Hook Response

The body of your response should be a JSON object with the following fields:

FieldDescription
relatedResourcesA list of JSON objects (ResourceRules) representing all the desired related resource descriptions ().

The relatedResources field should contain a flat list of objects, not an associative array.

Each ResourceRule object should be a JSON object with the following fields:

FieldDescription
apiVersionThe API <group>/<version> of the parent resource, or just <version> for core APIs. (e.g. v1, apps/v1, batch/v1)
resourceThe canonical, lowercase, plural name of the parent resource. (e.g. deployments, replicasets, statefulsets)
labelSelectorA v1.LabelSelector object. Omit if not used (i.e. Namespace or Names should be used)
namespaceOptional. The Namespace to select in
namesOptional. A list of strings, representing individual objects to return

Important note Please note that you can specify label selector or Namespace/Names, not both in the same ResourceRule.

If the parent resource is cluster scoped and the related resource is namespaced, the namespace may be used to restrict which objects to look at. If the parent resource is namespaced, the related resources must come from the same namespace. Specifying the namespace is optional, but if specified must match.

Note that your webhook handler must return a response with a status code of 200 to be considered successful. Metacontroller will wait for a response for up to the amount defined in the Webhook spec.

Example

Let's take a look at Global Config Map example custom resource object:

---
apiVersion: examples.metacontroller.io/v1alpha1
kind: GlobalConfigMap
metadata:
  name: globalsettings
spec:
  sourceName: globalsettings
  sourceNamespace: global

it tells that we would like to have globalsettings ConfigMap from global namespace present in each namespace.

The customize hook request will looks like :

{
    'controller': '...',
    'parent': '...'
}

and we need to extract information identyfying source ConfigMap.

Controller returns :

[
    {
        'apiVersion': 'v1',
        'resource': 'configmaps',
        'namespace': ${parent['spec']['sourceNamespace']},
        'names': [${parent['spec']['sourceName']}]
    }, {
        'apiVersion': 'v1',
        'resource': 'namespaces',
        'labelSelector': {}
    }
]

The first RelatedRule describes that given configmap should be returned (it will be used as souce for our propagation).

The second RelatedRule describes that we want to recieve also all namespaces in the cluster ('labelSelector': {} means - select all objects).

With those rules, call to the sync hook will have non empty related field (if resources exists in the cluster), in which all objects matching given criteria will be present.

Hook

This page describes how hook targets are defined in various APIs.

Each hook that you define as part of using one of the hook-based APIs has the following fields:

FieldDescription
webhookSpecify how to invoke this hook over HTTP(S).

Example

webhook:
  url: http://my-controller-svc/sync

Webhook

Each Webhook has the following fields:

FieldDescription
etagA configuration for etag logic
urlA full URL for the webhook (e.g. http://my-controller-svc/hook). If present, this overrides any values provided for path and service.
timeoutA duration (in the format of Go's time.Duration) indicating the time that Metacontroller should wait for a resserviceponse. If the webhook takes longer than this time, the webhook call is aborted and retried later. Defaults to 10s.
pathA path to be appended to the accompanying service to reach this hook (e.g. /hook). Ignored if full url is specified.
serviceA reference to a Kubernetes Service through which this hook can be reached.

Service Reference

Within a webhook, the service field has the following subfields:

FieldDescription
nameThe metadata.name of the target Service.
namespaceThe metadata.namespace of the target Service.
portThe port number to connect to on the target Service. Defaults to 80.
protocolThe protocol to use for the target Service. Defaults to http.

Etag Reference

More details in rfc7232.

Etag is a hash of response content, controller that supports etag notion should add "ETag" header to each 200 response. Metacontrollers that support "ETag" should send the "If-None-Match" header with value of ETag of cached content. If content has not changed, controller should reply with "304 Not modified" or "412 Precondition Failed", otherwise it sends 200 with "ETag" header.

This logic helps save traffic and CPU time on webhook processing.

Within a webhook, the eTag field has the following subfields:

Enabled             *bool  `json:"enabled,omitempty"`
CacheTimeoutSeconds *int32 `json:"cacheTimeoutSeconds,omitempty"`
CacheCleanupSeconds *int32 `json:"cacheCleanupSeconds,omitempty"`
FieldDescription
Enabledtrue or false. Default is false
CacheTimeoutSecondsTime in seconds after which ETag cache record is forgotten
CacheCleanupSecondsHow often ETag is running garbage collector to cleanup forgotten records

Design Docs

MapController

This is a design proposal for an API called MapController.

MapController

This is a design proposal for an API called MapController.

Background

Metacontroller APIs are meant to represent common controller patterns. The goal of these APIs as a group is to strike a balance between being flexible enough to handle unforeseen use cases and providing strong enough "rails" to avoid pushing the hard parts onto users. The initial strategy is to target controller patterns that are analogous to proven design patterns in functional or object-oriented programming.

For example, CompositeController lets you define the canonical relationship between some object (the parent node) and the objects that are directly under it in an ownership tree (child nodes). This is analogous to the Composite pattern in that it lets you manage a group of child objects as if were one object (by manipulating only the parent object).

Similarly, DecoratorController lets you add new child nodes to a parent node that already has some other behavior. This is analogous to the Decorator pattern in that it lets you dynamically wrap new behavior around select instances of an existing object type without having to create a new type.

Problem Statement

The problem that MapController addresses is that neither CompositeController nor DecoratorController allow you to make decisions based on objects that aren't owned by the particular parent object being processed. That's because in the absence of a parent-child relationship, there are arbitrarily many ways you could pick what other objects you want to look at.

To avoid having to send every object in a given resource (e.g. every Pod) on every hook invocation, there must be some way to tell Metacontroller which objects you need to see (that you don't own) to compute your desired state. Rather than try to embed various options for declaring these relationships (object name? label selector? field selector?) into each existing Metacontroller API, the goal of MapController is to provide a solution that's orthogonal to the existing APIs.

In other words, we attempt to separate the task of looking at non-owned objects (MapController) from the task of defining objects that are composed of other objects (CompositeController) so that users can mix and match these APIs (and future APIs) as needed without being limited to the precise scenarios we're able to anticipate.

Proposed Solution

MapController lets you define a collection of objects owned by a parent object, where each child object is generated by some mapping from a non-owned object. This is analogous to the general concept of a map function in that it calls your hook for each object in some input list (of non-owned objects), and creates an output list (of child objects) containing the results of each call.

A single sync pass for a MapController roughly resembles this pseudocode:

def sync_map_controller():
  input_list = get_matching_objects(input_resource, input_selector)
  output_list = list()

  foreach input_object in input_list:
    output_list.append(map_hook(input_object))

  reconcile_objects(output_list)

where map_hook() is the only code that the MapController user writes, as a lambda hook.

In general, MapController addresses use cases that can be described as, "For every matching X object that already exists, I want to create some number of Y objects according to the parameters stored in the parent object."

Alternatives Considered

Friend Resources

Add a new type of "non-child" resource to CompositeController called "friend resources". Along with all the matching children, we would also send all matching objects of the friend resource types to the sync hook request.

Matching would be determined with the parent's selector, just like for children. However, we would not require friends to have a ControllerRef pointing to the parent (the parent-friend relationship is non-exclusive), and the parent will not attempt to adopt friends.

The sync hook response would not contain friends, because we don't want to force you to list a desired state for all your friends every time. This means you cannot edit or delete your friends.

This approach was not chosen because:

  1. We have to send the entire list of matching friends as one big hook request. This complicates the user's hook code because they probably need to loop over each friend. It's also inefficient for patterns like "for every X (where there are a lot of X's), create a Y" since we have to sync every X if any one of them changes, and we can't process any of them in parallel.
  2. It's tied in with the CompositeController API, and doing something similar for other APIs like DecoratorController would require both duplicated and different effort (see Decorator Resources).
  3. It either forces you to use the same selector to find friends as you use to claim children, or it complicates the API with multiple selectors for different resources, which becomes difficult to reason about.
  4. If we force the same selector to apply to both friends and children, we also force you to explicitly specify a meaningful set of labels. You can't use selector generation (controller-uid: ###) for cases when you don't need orphaning and adoption; your friends won't match that selector.

Decorator Resources

Add a new type of resource to DecoratorController called a decorator resource, which contains objects that inform the behavior of the decorator. This would allow controllers that look at non-owned resources as part of computing the desired state of their children.

In particular, you could use DecoratorController to create attachments (extra children) on a parent object, while basing your desired state on information in another object (the decorator resource) that is not owned by that parent.

This approach was not chosen because:

  1. It's unclear how we would "link" objects of the decorator resource to particular parent objects being processed. Would we apply the parent selector to find decorator objects? Or apply a selector inside the decorator object to determine if it matches the parent object? Whatever we choose, it will likely be unintuitive and confusing for users.
  2. It's unclear what should happen if multiple decorator objects match a single parent object. We could send multiple decorator objects to the hook, but that just passes the complexity on to the user.
  3. It's unclear whether decorator objects are expected to take part in ownership of the objects created. Depending on the use case, users might want attachments to be owned by just the parent, just the decorator, or both. This configuration adds to the cognitive overhead of using the API, and there's no one default that's more intuitive than the others.

Example

The example use case we'll consider in this doc is a controller called SnapshotSchedule that creates periodic backups of PVCs with the VolumeSnapshot API. Notice that it's natural to express this in the form we defined above: "For every matching PVC, I want to create some VolumeSnapshot objects."

CompositeController doesn't fit this use case because the PVCs are created and potentially owned by something other than the SnapshotSchedule object. For example, the PVCs might have been created by a StatefulSet. Instead of creating PVCs, we want to look at all the PVCs that already exist and take action on certain ones.

DecoratorController doesn't fit this use case because it doesn't make sense for the VolumeSnapshots we create to be owned by the PVC from which the snapshot was taken. The lifecycle of a VolumeSnapshot has to be separate from the PVC because the whole point is that you should be able to recover the data if the PVC goes away. Since the PVC doesn't own the VolumeSnapshots, it doesn't make sense to think of the snapshots as a decoration on PVC (an additional feature of the PVC API).

An instance of SnapshotSchedule might look like this:

apiVersion: snapshot.k8s.io/v1
kind: SnapshotSchedule
metadata:
  name: my-app-snapshots
spec:
  snapshotInterval: 6h
  snapshotTTL: 10d
  selector:
    matchLabels:
      app: my-app

It contains a selector that determines which PVCs this schedule applies to, and some parameters that determine how often to take snapshots, as well as when to retire old snapshots.

API

Below is a sample MapController spec that could be used to implement the SnapshotSchedule controller:

apiVersion: metacontroller.k8s.io/v1alpha1
kind: MapController
metadata:
  name: snapshotschedule-controller
spec:
  parentResource:
    apiVersion: snapshot.k8s.io/v1
    resource: snapshotschedules
  inputResources:
  - apiVersion: v1
    resource: persistentvolumeclaims
  outputResources:
  - apiVersion: volumesnapshot.external-storage.k8s.io/v1
    resource: volumesnapshots
  resyncPeriodSeconds: 5
  hooks:
    map:
      webhook:
        url: http://snapshotschedule-controller.metacontroller/map
    tombstone:
      webhook:
        url: http://snapshotschedule-controller.metacontroller/tombstone

Parent Resource

The parent resource is the SnapshotSchedule itself, and anything this controller creates will be owned by this parent. The schedule thus acts like a bucket containing snapshots: if you delete the schedule, the snapshots inside it will go away too, unless you specify to orphan them as part of the delete operation (e.g. with --cascade=false when using kubectl delete). Notably, this ties the lifecycles of snapshots to the reason they exist (the backup policy that the user defined), rather than tying them to the entity that they are about (the PVC).

Input Resources

The input resources (in this case just PVC) are the inputs to the conceptual "map" function. We allow multiple input resources because users might want to write a controller that performs the same action for several different input types. We shouldn't force them to create multiple MapControllers with largely identical behavior.

The duck-typed spec.selector field (assumed to be metav1.LabelSelector) in the parent object is used to filter which input objects to process. If the selector is empty, we will process all objects of the input types in the same namespace as the parent.

We will also ignore input objects whose controllerRef points to the particular parent object being processed. That would imply that the same resource (e.g. ConfigMap) is listed as both an input and an output in a given MapController spec. This allows use cases such as generating ConfigMaps from other ConfigMaps by doing some transformation on the data, while protecting against accidental recursion if the label selector is chosen poorly.

If there are multiple input resources, they are processed independently, with no attempt to correlate them. That is, the map hook will still be called with only a single input object each time, although the kind of that object might be different from one call to the next.

Output Resources

The output resources (in this case just VolumeSnapshot) are the types of objects that the user intends to create and hold in the conceptual "bucket" that the parent object represents. We allow multiple output resources because users might think of their controller as spitting out a few different things. We shouldn't force them to create a CompositeController too just so they can emit multiple outputs, especially if those outputs are not conceptually part of one larger whole.

For a given input object, the user can generate any number of output objects. We will tag those output objects in some way to associate them with the object that we sent as input. The tag makes it possible to group those objects and send them along with future map hook requests.

In pseudocode, a sync pass could be thought of like the following:

// Get all matching objects from all input resources.
inputObjects := []Object{}
for _, inputResource := range inputResources {
  inputObjects = append(inputObjects, getMatchingObjects(inputResource, parentSelector)...)
}
// Call the once hook for each input object.
for _, inputObject := range inputObjects {
  // Compute some opaque string identifying this input object.
  mapKey := makeMapKey(inputObject)

  // Gather observed objects of the output resources that are tagged with this key.
  observedOutputs := []Object{}
  for _, outputResource := range outputResources {
    // Gather all outputs owned by this parent.
    allOutputs := getOwnedObjects(outputResource, parent)
    // Filter to only those tagged for this input.
    observedOutputs = append(observedOutputs, filterByMapKey(allOutputs, mapKey)...)
  }

  // Call user's map hook, passing observed state.
  mapResult := mapHook(parent, inputObject, observedOutputs)
  for _, obj := range mapResult.Outputs {
    // Tag outputs to identify which input they came from.
    setMapKey(obj, mapKey)
  }
  // Manage child objects by reconciling observed and desired outputs.
  manageChildren(observedOutputs, mapResult.Outputs)
}

Detached Outputs

If an input object disappears, we may find that the parent owns one or more output objects that are tagged as having been generated from an input object that no longer exists. Note that this does not mean these objects have been orphaned, in the sense of having no ownerRef/controllerRef; the controllerRef will still point to the parent object. It's only our MapController-specific "tag" that has become a broken link.

By default, we will delete any such detached outputs so that controller authors don't have to think about them. However, the SnapshotSchedule example shows that sometimes it will be important to give users control over what happens to these objects. In that example, the user would want to keep detached VolumeSnapshots since they might be needed to restore the now-missing PVC.

We could offer a declarative knob to either always delete detached outputs, or always keep them, but that would be awkwardly restrictive. The controller author would have fine-grained control over the lifecycle of "live" outputs, but would suddenly lose that control when the outputs become detached.

Instead, we propose to define an optional tombstone hook that sends information about a particular group of detached outputs (belonging to a particular input object that is now gone), and asks the user to decide which ones to keep. For example, SnapshotSchedule would likely want to keep detached VolumeSnapshots around until the usual expiry timeout.

For now, we will not allow the hook to edit detached outputs because we don't want to commit to sending the body of the missing input object, since it may not be available. Without that input object, the hook author presumably wouldn't have enough information to decide on an updated desired state anyway. We can reexamine this if users come up with compelling use cases.

Status Aggregation

One notable omission from the map hook, as compared with the sync hook from CompositeController, is that the user does not return any status object. That's because each map hook invocation only sends enough context to process a single input object and its associated output objects. The hook author therefore doesn't have enough information to compute the overall status of the parent object.

We could define another hook to which we send all inputs and outputs for a given parent, and ask the user to return the overall status. However, that would defeat one of the main goals of MapController because such a monolithic hook request could get quite large for the type of use cases we expect for a controller that says, "do this for every X," and also because that would place the burden of aggregating status across the whole collection onto the user.

Instead, Metacontroller will compute an aggregated status for the collection based on some generic rules:

For each input resource, we will report the number of matching objects we observed as a status field on the parent object, named after the plural resource name.

The exact format will be an implementation detail, but for example it might look like:

status:
  inputs:
    persistentvolumeclaims:
      total: 20
  ...

For each output resource, we will report the total number of objects owned by this parent across all map keys. In addition, we will automatically aggregate conditions found on output objects, and report how many objects we own with that condition set to True.

For example:

status:
  ...
  outputs:
    volumesnapshots:
      total: 100
      ready: 97
  ...

Hooks

Map Hook

We call the map hook to translate an input object into zero or more output objects.

Map Hook Request

FieldDescription
controllerThe whole MapController object, like what you might get from kubectl get mapcontroller <name> -o json.
parentThe parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json.
mapKeyAn opaque string that uniquely identifies the group of outputs that belong to this input object.
inputThe input object, like what you might get from kubectl get <input-resource> <input-name> -o json.
outputsAn associative array of output objects that the parent already created for the given input object.

Map Hook Response

FieldDescription
outputsA list of JSON objects representing all the desired outputs for the given input object.

Tombstone Hook

We call the tombstone hook, if defined, to ask whether we should keep any of a group of output objects whose corresponding input object is gone. If no tombstone hook is defined, we will always delete any such orphans as soon as the input object disappears.

Tombstone Hook Request

FieldDescription
controllerThe whole MapController object, like what you might get from kubectl get mapcontroller <name> -o json.
parentThe parent object, like what you might get from kubectl get <parent-resource> <parent-name> -o json.
mapKeyAn opaque string that uniquely identifies the group of outputs that belong to this input object.
outputsAn associative array of output objects that the parent already created for the given input object.

Tombstone Hook Response

FieldDescription
outputsA list of output objects to keep, even though the associated input object is gone. All other outputs belonging to this input will be deleted.

Contributor Guide

This section contains information for people who want to hack on or contribute to Metacontroller.

See the User Guide if you just want to use Metacontroller.

GitHub

Building

The page describes how to build Metacontroller for yourself.

Building

The page describes how to build Metacontroller for yourself.

First, check out the code:

# If you're going to build locally, make sure to
# place the repo according to the Go import path:
#   $GOPATH/src/metacontroller.io
cd $GOPATH/src
git clone git@github.com:metacontroller/metacontroller.git metacontroller
cd metacontroller

Then you can build a metacontroller binary like so:

make build

Local build and development

Check debug section

Documentation build

Documentation is generated from .md files with mdbook. To generate documentation, you need to install:

  • mdbook
  • mdbook plugins:
  • graphviz

To generate documentation

  • cd docs
  • mdbook build There will be book folder generated with html content.

You can also use mdbook serve to expose documentation on http://localhost:3000.

Tests

To run tests, first make sure you can successfully complete a local build.

Unit Tests

Unit tests in Metacontroller focus on code that does some kind of non-trival local computation without depending on calls to remote servers -- for example, the ./dynamic/apply package.

Unit tests live in _test.go files alongside the code that they test. To run only unit tests (excluding integration tests) for all Metacontroller packages, use this command:

make unit-test

Integration Tests

Integration tests in Metacontroller focus on verifying behavior at the level of calls to remote services like user-provided webhooks and the Kubernetes API server. Since Metacontroller's job is ultimately to manipulate Kubernetes API objects in response to other Kubernetes API objects, most of the important features or behaviors of Metacontroller can and should be tested at this level.

In the integration test environment, we start a standalone kube-apiserver to serve the REST APIs, and an etcd instance to back it. We do not run any kubelets (Nodes), nor any controllers other than Metacontroller. This makes it easy for tests to control exactly what API objects Metacontroller sees without interference from the normal controller for each API, and also greatly reduces the requirements to run tests.

Other than the Metacontroller codebase, all you need to run integration tests is to download a few binaries from a Kubernetes release. You can run the following script from the test/integration directory in to order to fetch the versions of these binaries currently used in continuous integration, and place them in ./hack/bin:

hack/get-kube-binaries.sh

You can then run the integration tests with this command, which will automatically set the PATH to include ./hack/bin:

make integration-test

Unlike unit tests, integration tests do not live alongside the code they test, but instead are gathered in ./test/integration/.... This makes it easier to run them separately, since they require a special environment, and also enforces that they test packages at the level of their public interfaces.

End-to-End Tests

End-to-end tests in Metacontroller focus on verifying example workflows that we expect to be typical for end users. That is, we run the same kubectl commands that a human might run when using Metacontroller.

Since these tests verify end-to-end behavior, they require a fully-functioning Kubernetes cluster. Before running them, you should have kubectl in your PATH, and it should be configured to talk to a suitable, empty test cluster that has had the Metacontroller manifests applied.

Then you can run the end-to-end tests against your cluster with the following:

cd examples
./test.sh

This will run all the end-to-end tests in series, and print the location of a log file containing the output of the latest test that was run.

You can also run each test individually, which will show the output as it runs. For example:

cd examples/bluegreen
./test.sh

Note that currently our continuous integration only runs unit and integration tests on PRs, since those don't require a full cluster. If you have access to a suitable test cluster, you can help speed up review of your PR by running these end-to-end tests yourself to see if they catch anything.

Local development and debugging

Tips and tricks for contributors

Local run of metacontroller

There are different flavours of manifests shipped to help with local development:

  • manifests/dev
  • manifests/debug

Development build

The main difference it that image defined in manifest is localhost/metacontroller:dev, therefore:

  • apply dev manifests - kubectl apply -k manifests/dev
  • build docker image with command - make image - this will compile the binary and build the container image
  • load image into cluster (i.e. kind load docker-image localhost/metacontroller:dev in kind)
  • restart pod (i.e. kubectl delete pod/metacontroller-0 --namespace metacontroller)

Debug build

Debug requires building go sources in special way, which is done with make build_debug; the following image built with the Dockerfile.debug dockerfile will then add it to the debug Docker image:

  • apply debug manifests - kubectl apply -k manifests/debug
  • build debug binary and image - make image_debug
  • load image into cluster (i.e. kind load docker-image localhost/metacontroller:debug in kind)
  • restart pod
  • on startup, go process will wait for debugger on port 40000
  • port forward port 40000 from container into localhost, i.e. kubectl port-forward metacontroller-0 40000:40000
  • attach go debugger to port 40000 on localhost