Skip to content

Custom tags

Writing custom tags

We use multiple dispatch for almost everything in gamma.config, including which function to call when we need to render a node in the parsed YAML tree. Adding your own tag handler requires just adding the function to the dispatch table like in the example below:

import os
from gamma.config import dispatch, ScalarNode, Tag               # (1) (2)

MyEnvTag = Tag["!myenv"]                                         # (3)

@dispatch
def render_node(node: ScalarNode, tag: MyEnvTag, **ctx):         # (4)
    """Simpler clone of !env"""
    return os.getenv(node.value)                                 # (5)

In more details:

  1. We import gamma.config.dispatch, that register our function as a dispatchable function.

  2. We import ScalaNode and Tag types that we'll use to specialize the render_node function.

  3. Because we want to dispatch on a specific tag, we create a specialized MyEnvTag type from the general Tag type using a parameter. We call these parametric or value types.

  4. We annotate our function with the specific types. The name (eg. render_node) and order of the arguments matter! The dispatch mechanics ignore keyword-only or untyped arguments.

  5. Whatever we return will be the value returned when accessing the configuration.

And that's it! You just need to ensure the code is loaded before rendering the config value.

Render node arguments

The node argument comes directly from ruaml.yaml package. For ScalarNodes usually you're interested in the node.value, that provides the exact string in the YAML file.

If you want to use the standard YAML inference from ruamel.yaml package, you can do as follows:

# ... other imports as above ...
from gamma.config import yaml
# ...

@dispatch
def render_node(node: ScalarNode, tag: MyEnvTag, **ctx):
    """Simpler clone of !env"""
    val = os.getenv(node.value)
    return yaml.load(val)  # parse the string into YAML core scalar types (str, int,
                           # float, bool, timestamp, null)

For MappingNode or SequenceNode, the node.value object is more complex. To avoid having to write your own recursive parsing logic, you can use the to_dict dump function to get a rendered object, including child nodes. (it works with Sequences as well)

# ... other imports as above ...
from gamma.config import to_dict
# ...

@dispatch
def render_node(node: ScalarNode, tag: MyObjTag, **ctx):
    """Simply return the dict/list value of the node"""
    val = to_dict(node)
    return val      # parse node recursively, handing children as needed

The ctx kwargs dict allows you to access contextual information when rendering and may contain the following entries:

  • key: The node key, also as a ruamel YAML node.
  • config: The current ConfigNode object.
  • dump: Flag indicating we're dumping the data to a potentially insecure destination, so sensitive data should not be returned.
  • path: The URI path for URI-style tag dispatch (see below).

URI-style tags

The default tag dispatch mechanism is to dispatch on the resolved tag value using a parameterized Tag subtype. For instance, foo: !mytag 1 will only dispatch on (node: ScalarNode, tag: Tag["!mytag"]).

However, if the tag contains a : (colon), we're assuming the tag is an URI-style tag of the format (![scheme]:[path]). Besides the default tag dispatch, URI-style tags will also dispatch on the scheme part only, and pass a path keyword argument to the render_node function.

An example:

foo: !mytag:mypath 1

will dispatch first on (node: ScalarNode, tag: Tag["!mytag:mypath"]) and, failing to find such method, will also try (node: ScalarNode, tag: Tag["!mytag"]) with path = "mypath" as extra keyword argument.

Extending the render context

Some tags like !j2 and !expr allow you to refer to variables in the render context. By default, we provide env and c, referring to a dict of environment variables, and to the root config itself.

We also allow you to add a _context mapping entry to any parent node to extend the render context without needing to write any code. This is useful to provide contextual parameters in a concise way in some scenarios. Example:

catalog:
  _context:
    inputs: s3://mybucket/myproject/inputs

  datasets:
    customers: !j2 "{{ inputs }}/customers"   # will reference the _context map above

If you need to extend the render context, please refer to docstrings in the source file gamma/config/render_context.py APIDocs here