Dataclass field metadata: Pythonic equivalence of Go struct tag

1. What is Golang’s struct tag

Recently I have been writing some Go and there is a particular language feature in Go that I really like: struct tag. Basically, a struct tag is just an arbitrary string literal that you can attach to a struct, which later on you can decode and turn into something useful via reflection. If that still sounds too cryptic, consider this example:

type MyAwesomeGoServiceConfig struct {
  Host   string `envconfig:"SERVICE_HOST" another_random_tag:"amazing"`
  Port   string `envconfig:"SERVICE_PORT" this_is_seriously_cool:"agree"`
}

The code above, apart from sounding uncharacteristically upbeat, is 100% valid Golang:

func main() {
  s := MyAwesomeGoServiceConfig{Host:"blog.limdauto.me", Port:"8080"}
  fmt.Printf("%+v\n", s)
}

// go run main.go
// Output: {Host:blog.limdauto.me Port:8080}

By itself, it doesn’t do a whole lot. However, the real magic comes when you write custom code to interpret the tags and turn them into something else. For example, envconfig is a popular Go library that interpret the envconfig tag and automatically populates the struct’s fields using values from environment variables with matching names:

func main() {
  var s MyAwesomeGoServiceConfig
  envconfig.Process("example", &s)
  fmt.Printf("%+v\n", s)
}

// go run main.go 
// Output: {Host: Port:}
// SERVICE_HOST=blog.limdauto.me SERVICE_PORT=8080 go run main.go
// Output: {Host:blog.limdauto.me Port:8080}

2. Why should I care as a Python developer?

If the example above isn’t convincing enough, please feel free to browse through the list of well-known Go struct tags. The json tag, for example, is the built-in way in Go to unmarshall and marshall JSON data into a struct instance and back. It offers a seamless way to turn JSON into a type-safe first class citizen of the language. I don’t know about you, but years of passing dictionaries around in Python have scarred me permanently. Nowadays, data can travel between systems as JSON or any other serialization format as they like, but once they enter my program, they will need to become Plain-Old Python Objects (POPOs) with well-defined schemas. I learned that principle from my good friend Max who describes his software engineering job as “loading JSON from one place and writing it to another”. Truer words have never been spoken.

As a more concrete example, imagine you are building an app to automatically create dating profiles for Star Wars characters, because why not. To do that, you will need to request the characters’ data from the Star Wars API and throw away all the fields that you don’t need for a dating profile – who cares what species Luke belongs to, it’s 2019! Wouldn’t it be cool to be able to write something like the pseudo-code below in Python:

from datetime import datetime

class StarWarsCharacter:
    name: str
    dob: datetime, tags={"json": "birth_year"}
    eye_color: str
    gender: str
    hair_color: str
    height: str
    mass: int
    skin_color: str

    # we omit the field below from the default json marshalling
    # but still need it because it's online dating
    mbti_personality_type: str, tags={"json":"-"}

# request the data from an API
import requests
luke = requests.get('https://swapi.co/api/people/1/').as_(StarWarsCharacter)

# or even
import json
response = requests.get('https://swapi.co/api/people/1/')
luke = json.load(response, StarWarsCharacter)

Fun fact: I have never watched Star Wars in my life! Please don’t hate me.

3. Surprise! Python has it too!

Feeling inspired, I thought of using my weekend to port this Golang feature over to Python. As it turns out, Python already has it as of Python 3.7+. It’s just a bit hidden and not very widely used yet, at least not to the extent of Go’s struct tags. What I’m referring to is the metadata attribute of a dataclass.field. For example, the envconfig example mentioned previously can be written as:

from dataclasses import dataclass, field


@dataclass
class MyAwesomeServiceConfig:
    Host: str = field(metadata={"envconfig":"SERVICE_HOST"})
    Port: str = field(metadata={"envconfig":"SERVICE_PORT"})

And it is almost trivial to write the processing mechnism to populate this config by reading from environment variables just by looping through the dataclass’ fields:

import os


def process(config_class):
    config_data = {}
    for field_name, field in config_class.__dataclass_fields__.items():
        config_data[field_name] = os.getenv(field.metadata['envconfig'])
    return config_class(**config_data)

You can try running the code above online here. Of course, I haven’t added any error handling or validation, so please don’t use it in production just yet. There are also a couple of other neat features from the original Golang envconfig that I haven’t mentioned. Nevertheless, it should give you an idea on how a dataclass’ field metadata could be useful, and porting the entire envconfig over might not take more than a day, so maybe next weekend?

4. Final thoughts

While writing this blog post, I found a couple of other python libraries that I would like to mention:

jsons

Please mind the s. This library provides the API I mentioned earlier to marshall and unmarshall JSON from and to dataclass, e.g. some_instance = jsons.load(some_dict, SomeClass). Granted that it doesn’t really concern with a field’s metadata yet, but I think it’s cool. A certain someone would say it’s JSON for Human™.

marshmallow_dataclass

This library uses a dataclass field’s metadata to generate marshmallow schemas from dataclasses. Very neat! This is a perfect example of why I believe this pattern will gain more momentum within the Python community in the future.