Structured configuration
gamma.config.pydantic
deprecated
Prior to v0.8
we had direct support for structured configuration. This was removed
and replace with the recommended approach below.
For more complex applications, it's recommended to use an a approach called "Structured Configuration" where you validate the provided configuration entries against a schema. Python provides many ways to validate a data structure against a schema. Here we'll Pydantic as it's very popular, easy to use and provide a number of useful features.
First let's start with an example where you declare "datasets" entries, and we have
different expected parameters for each format (eg. mandatory separator
for CSV,
optional columns
for Parquet).
datasets:
foo:
format: csv
path: data/foo.csv.gz
compression: gzip
separator: ";"
bar:
format: parquet
path: data/bar.parquet
compression: snappy
columns: [col_x, col_y]
A first version of our code may create the following Pydantic model to handle this:
from pydantic import BaseModel
class Dataset(BaseModel): # (1)
format: str
path: str
compression: str
separator: Optional[str]
columns: Optional[List[str]]
def get_dataset(name: str) -> Dataset: # (2)
entry = get_config()["datasets"][name] # (3)
obj = Dataset(**to_dict(entry)) # (4)
return obj
Here we create in (1)
a Pydantic model, and an accessor function in (2)
. In the
function implementation, we get the config entry (3)
, convert to a plain dict using
to_dict
to build the object from the dictionary.
The to_dict
helper will recursively convert the nested config data structure to a
nested dictionary object, rendering dynamic values as needed. So, be aware of infinite
recursion for dynamic entries. Pydantic's BaseModel(...)
knows how to handle
recursive dicts converting to the correct datatypes.
The fact we need to model mandatory format-specific attributes (eg. separator
for CSV)
as Optional
fields is not very clean though. Pydantic has discriminated unions that
allow us to split the specification into separate format-specific types.
from typing import Literal
from beartype.typing import List, Optional, Union
from pydantic import BaseModel, Field, TypeAdapter
from typing_extensions import Annotated
from gamma.config import RootConfig, to_dict
class BaseDataset(BaseModel):
format: str
path: str
class ParquetDataset(BaseDataset):
format: Literal["parquet"]
compression: str
columns: Optional[List[str]]
class CsvDataset(BaseDataset):
format: Literal["csv"]
compression: str
separator: str
Dataset = TypeAdapter(
Annotated[Union[ParquetDataset, CsvDataset], Field(discriminator="format")]
)
def get_dataset(name: str) -> Dataset:
entry = get_config()["datasets"][name]
obj = Dataset.validate_python(to_dict(entry))
return obj
foo = get_dataset("foo")
bar = get_dataset("bar")
assert isinstance(foo, CsvDataset)
assert isinstance(bar, ParquetDataset)
In the modified full script example above:
-
We import the types from stdlib
typing
, includingLiteral
, andAnnotated
fromtyping_extensions
ortyping
depending if you're on Python 3.9+. From pydantic, we importField
andTypeAdapter
in addition toBaseModel
. -
We create our Pydantic class structure mimicking our expected model. Note that while we use class inheritance here, this is not required.
-
We declare our
Dataset
as being an "annotated" union of our target classes. We annotated it with aField
entry that provides the discriminator field. TheAnnotated
type was specified PEP 593 and here's the full documentation Finally we wrap the annotated type inTypeAdapter
to turn it into a full-fledged Pydantic model. -
In the
get_dataset
accessor, we modify it to use thevalidate_python
method.