MapController

This is a design proposal for an API called MapController.

Background
Problem Statement
Proposed Solution
Alternatives Considered
- Friend Resources
- Decorator Resources
Example
API
Hooks
- Map Hook
  - Map Hook Request
  - Map Hook Response
- Tombstone Hook
  - Tombstone Hook Request
  - Tombstone Hook Response

Background

Metacontroller APIs are meant to represent common controller patterns. The goal of these APIs as a group is to strike a balance between being flexible enough to handle unforeseen use cases and providing strong enough "rails" to avoid pushing the hard parts onto users. The initial strategy is to target controller patterns that are analogous to proven design patterns in functional or object-oriented programming.

For example, CompositeController lets you define the canonical relationship between some object (the parent node) and the objects that are directly under it in an ownership tree (child nodes). This is analogous to the Composite pattern in that it lets you manage a group of child objects as if were one object (by manipulating only the parent object).

Similarly, DecoratorController lets you add new child nodes to a parent node that already has some other behavior. This is analogous to the Decorator pattern in that it lets you dynamically wrap new behavior around select instances of an existing object type without having to create a new type.

Problem Statement

The problem that MapController addresses is that neither CompositeController nor DecoratorController allow you to make decisions based on objects that aren't owned by the particular parent object being processed. That's because in the absence of a parent-child relationship, there are arbitrarily many ways you could pick what other objects you want to look at.

To avoid having to send every object in a given resource (e.g. every Pod) on every hook invocation, there must be some way to tell Metacontroller which objects you need to see (that you don't own) to compute your desired state. Rather than try to embed various options for declaring these relationships (object name? label selector? field selector?) into each existing Metacontroller API, the goal of MapController is to provide a solution that's orthogonal to the existing APIs.

In other words, we attempt to separate the task of looking at non-owned objects (MapController) from the task of defining objects that are composed of other objects (CompositeController) so that users can mix and match these APIs (and future APIs) as needed without being limited to the precise scenarios we're able to anticipate.

Proposed Solution

MapController lets you define a collection of objects owned by a parent object, where each child object is generated by some mapping from a non-owned object. This is analogous to the general concept of a map function in that it calls your hook for each object in some input list (of non-owned objects), and creates an output list (of child objects) containing the results of each call.

A single sync pass for a MapController roughly resembles this pseudocode:

def sync_map_controller():
  input_list = get_matching_objects(input_resource, input_selector)
  output_list = list()

  foreach input_object in input_list:
    output_list.append(map_hook(input_object))

  reconcile_objects(output_list)

where map_hook() is the only code that the MapController user writes, as a lambda hook.

In general, MapController addresses use cases that can be described as, "For every matching X object that already exists, I want to create some number of Y objects according to the parameters stored in the parent object."

Alternatives Considered

Friend Resources

Add a new type of "non-child" resource to CompositeController called "friend resources". Along with all the matching children, we would also send all matching objects of the friend resource types to the sync hook request.

Matching would be determined with the parent's selector, just like for children. However, we would not require friends to have a ControllerRef pointing to the parent (the parent-friend relationship is non-exclusive), and the parent will not attempt to adopt friends.

The sync hook response would not contain friends, because we don't want to force you to list a desired state for all your friends every time. This means you cannot edit or delete your friends.

This approach was not chosen because:

We have to send the entire list of matching friends as one big hook request. This complicates the user's hook code because they probably need to loop over each friend. It's also inefficient for patterns like "for every X (where there are a lot of X's), create a Y" since we have to sync every X if any one of them changes, and we can't process any of them in parallel.
It's tied in with the CompositeController API, and doing something similar for other APIs like DecoratorController would require both duplicated and different effort (see Decorator Resources).
It either forces you to use the same selector to find friends as you use to claim children, or it complicates the API with multiple selectors for different resources, which becomes difficult to reason about.
If we force the same selector to apply to both friends and children, we also force you to explicitly specify a meaningful set of labels. You can't use selector generation (controller-uid: ###) for cases when you don't need orphaning and adoption; your friends won't match that selector.

Decorator Resources

Add a new type of resource to DecoratorController called a decorator resource, which contains objects that inform the behavior of the decorator. This would allow controllers that look at non-owned resources as part of computing the desired state of their children.

In particular, you could use DecoratorController to create attachments (extra children) on a parent object, while basing your desired state on information in another object (the decorator resource) that is not owned by that parent.

This approach was not chosen because:

It's unclear how we would "link" objects of the decorator resource to particular parent objects being processed. Would we apply the parent selector to find decorator objects? Or apply a selector inside the decorator object to determine if it matches the parent object? Whatever we choose, it will likely be unintuitive and confusing for users.
It's unclear what should happen if multiple decorator objects match a single parent object. We could send multiple decorator objects to the hook, but that just passes the complexity on to the user.
It's unclear whether decorator objects are expected to take part in ownership of the objects created. Depending on the use case, users might want attachments to be owned by just the parent, just the decorator, or both. This configuration adds to the cognitive overhead of using the API, and there's no one default that's more intuitive than the others.

Example

The example use case we'll consider in this doc is a controller called SnapshotSchedule that creates periodic backups of PVCs with the VolumeSnapshot API. Notice that it's natural to express this in the form we defined above: "For every matching PVC, I want to create some VolumeSnapshot objects."

CompositeController doesn't fit this use case because the PVCs are created and potentially owned by something other than the SnapshotSchedule object. For example, the PVCs might have been created by a StatefulSet. Instead of creating PVCs, we want to look at all the PVCs that already exist and take action on certain ones.

DecoratorController doesn't fit this use case because it doesn't make sense for the VolumeSnapshots we create to be owned by the PVC from which the snapshot was taken. The lifecycle of a VolumeSnapshot has to be separate from the PVC because the whole point is that you should be able to recover the data if the PVC goes away. Since the PVC doesn't own the VolumeSnapshots, it doesn't make sense to think of the snapshots as a decoration on PVC (an additional feature of the PVC API).

An instance of SnapshotSchedule might look like this:

apiVersion: snapshot.k8s.io/v1
kind: SnapshotSchedule
metadata:
  name: my-app-snapshots
spec:
  snapshotInterval: 6h
  snapshotTTL: 10d
  selector:
    matchLabels:
      app: my-app

It contains a selector that determines which PVCs this schedule applies to, and some parameters that determine how often to take snapshots, as well as when to retire old snapshots.

API

Below is a sample MapController spec that could be used to implement the SnapshotSchedule controller:

apiVersion: metacontroller.k8s.io/v1alpha1
kind: MapController
metadata:
  name: snapshotschedule-controller
spec:
  parentResource:
    apiVersion: snapshot.k8s.io/v1
    resource: snapshotschedules
  inputResources:
  - apiVersion: v1
    resource: persistentvolumeclaims
  outputResources:
  - apiVersion: volumesnapshot.external-storage.k8s.io/v1
    resource: volumesnapshots
  resyncPeriodSeconds: 5
  hooks:
    map:
      webhook:
        url: http://snapshotschedule-controller.metacontroller/map
    tombstone:
      webhook:
        url: http://snapshotschedule-controller.metacontroller/tombstone

Parent Resource

The parent resource is the SnapshotSchedule itself, and anything this controller creates will be owned by this parent. The schedule thus acts like a bucket containing snapshots: if you delete the schedule, the snapshots inside it will go away too, unless you specify to orphan them as part of the delete operation (e.g. with --cascade=false when using kubectl delete). Notably, this ties the lifecycles of snapshots to the reason they exist (the backup policy that the user defined), rather than tying them to the entity that they are about (the PVC).

Input Resources

The input resources (in this case just PVC) are the inputs to the conceptual "map" function. We allow multiple input resources because users might want to write a controller that performs the same action for several different input types. We shouldn't force them to create multiple MapControllers with largely identical behavior.

The duck-typed spec.selector field (assumed to be metav1.LabelSelector) in the parent object is used to filter which input objects to process. If the selector is empty, we will process all objects of the input types in the same namespace as the parent.

We will also ignore input objects whose controllerRef points to the particular parent object being processed. That would imply that the same resource (e.g. ConfigMap) is listed as both an input and an output in a given MapController spec. This allows use cases such as generating ConfigMaps from other ConfigMaps by doing some transformation on the data, while protecting against accidental recursion if the label selector is chosen poorly.

If there are multiple input resources, they are processed independently, with no attempt to correlate them. That is, the map hook will still be called with only a single input object each time, although the kind of that object might be different from one call to the next.

Output Resources

The output resources (in this case just VolumeSnapshot) are the types of objects that the user intends to create and hold in the conceptual "bucket" that the parent object represents. We allow multiple output resources because users might think of their controller as spitting out a few different things. We shouldn't force them to create a CompositeController too just so they can emit multiple outputs, especially if those outputs are not conceptually part of one larger whole.

For a given input object, the user can generate any number of output objects. We will tag those output objects in some way to associate them with the object that we sent as input. The tag makes it possible to group those objects and send them along with future map hook requests.

In pseudocode, a sync pass could be thought of like the following:

// Get all matching objects from all input resources.
inputObjects := []Object{}
for _, inputResource := range inputResources {
  inputObjects = append(inputObjects, getMatchingObjects(inputResource, parentSelector)...)
}
// Call the once hook for each input object.
for _, inputObject := range inputObjects {
  // Compute some opaque string identifying this input object.
  mapKey := makeMapKey(inputObject)

  // Gather observed objects of the output resources that are tagged with this key.
  observedOutputs := []Object{}
  for _, outputResource := range outputResources {
    // Gather all outputs owned by this parent.
    allOutputs := getOwnedObjects(outputResource, parent)
    // Filter to only those tagged for this input.
    observedOutputs = append(observedOutputs, filterByMapKey(allOutputs, mapKey)...)
  }

  // Call user's map hook, passing observed state.
  mapResult := mapHook(parent, inputObject, observedOutputs)
  for _, obj := range mapResult.Outputs {
    // Tag outputs to identify which input they came from.
    setMapKey(obj, mapKey)
  }
  // Manage child objects by reconciling observed and desired outputs.
  manageChildren(observedOutputs, mapResult.Outputs)
}

Detached Outputs

If an input object disappears, we may find that the parent owns one or more output objects that are tagged as having been generated from an input object that no longer exists. Note that this does not mean these objects have been orphaned, in the sense of having no ownerRef/controllerRef; the controllerRef will still point to the parent object. It's only our MapController-specific "tag" that has become a broken link.

By default, we will delete any such detached outputs so that controller authors don't have to think about them. However, the SnapshotSchedule example shows that sometimes it will be important to give users control over what happens to these objects. In that example, the user would want to keep detached VolumeSnapshots since they might be needed to restore the now-missing PVC.

We could offer a declarative knob to either always delete detached outputs, or always keep them, but that would be awkwardly restrictive. The controller author would have fine-grained control over the lifecycle of "live" outputs, but would suddenly lose that control when the outputs become detached.

Instead, we propose to define an optional tombstone hook that sends information about a particular group of detached outputs (belonging to a particular input object that is now gone), and asks the user to decide which ones to keep. For example, SnapshotSchedule would likely want to keep detached VolumeSnapshots around until the usual expiry timeout.

For now, we will not allow the hook to edit detached outputs because we don't want to commit to sending the body of the missing input object, since it may not be available. Without that input object, the hook author presumably wouldn't have enough information to decide on an updated desired state anyway. We can reexamine this if users come up with compelling use cases.

Status Aggregation

One notable omission from the map hook, as compared with the sync hook from CompositeController, is that the user does not return any status object. That's because each map hook invocation only sends enough context to process a single input object and its associated output objects. The hook author therefore doesn't have enough information to compute the overall status of the parent object.

We could define another hook to which we send all inputs and outputs for a given parent, and ask the user to return the overall status. However, that would defeat one of the main goals of MapController because such a monolithic hook request could get quite large for the type of use cases we expect for a controller that says, "do this for every X," and also because that would place the burden of aggregating status across the whole collection onto the user.

Instead, Metacontroller will compute an aggregated status for the collection based on some generic rules:

For each input resource, we will report the number of matching objects we observed as a status field on the parent object, named after the plural resource name.

The exact format will be an implementation detail, but for example it might look like:

status:
  inputs:
    persistentvolumeclaims:
      total: 20
  ...

For each output resource, we will report the total number of objects owned by this parent across all map keys. In addition, we will automatically aggregate conditions found on output objects, and report how many objects we own with that condition set to True.

For example:

status:
  ...
  outputs:
    volumesnapshots:
      total: 100
      ready: 97
  ...

Hooks

Map Hook

We call the map hook to translate an input object into zero or more output objects.

Map Hook Request

Field	Description
`controller`	The whole MapController object, like what you might get from `kubectl get mapcontroller <name> -o json`.
`parent`	The parent object, like what you might get from `kubectl get <parent-resource> <parent-name> -o json`.
`mapKey`	An opaque string that uniquely identifies the group of outputs that belong to this input object.
`input`	The input object, like what you might get from `kubectl get <input-resource> <input-name> -o json`.
`outputs`	An associative array of output objects that the parent already created for the given input object.

Map Hook Response

Field	Description
`outputs`	A list of JSON objects representing all the desired outputs for the given input object.

Tombstone Hook

We call the tombstone hook, if defined, to ask whether we should keep any of a group of output objects whose corresponding input object is gone. If no tombstone hook is defined, we will always delete any such orphans as soon as the input object disappears.

Tombstone Hook Request

Field	Description
`controller`	The whole MapController object, like what you might get from `kubectl get mapcontroller <name> -o json`.
`parent`	The parent object, like what you might get from `kubectl get <parent-resource> <parent-name> -o json`.
`mapKey`	An opaque string that uniquely identifies the group of outputs that belong to this input object.
`outputs`	An associative array of output objects that the parent already created for the given input object.

Tombstone Hook Response

Field	Description
`outputs`	A list of output objects to keep, even though the associated input object is gone. All other outputs belonging to this input will be deleted.