Python Fundamentals: Everything you need to know about dataclasses

  • @dataclass decorator which returns the same defined class but modified
  • field function which allow for per-field customizations.

How to create a data class

fom dataclasses import dataclass

@dataclass
class Response:
status: int
body: str
  • Enforcing type hints usage. If a field in a data class is defined without a type hint a NameError exception is raised.
  • @dataclass does not create a new class, it returns the same defined class. This allows for anything you could do in a regular class to be valid within a data class.
>>> resp = Response(status=200, body="OK")
>>> import logging 
>>> logging.basicConfig(level=logging.INFO)
>>> resp = Response(status=200, body="OK")
>>> logging.info(resp)
... INFO:root:Response(status=200, body='OK')
>>> resp_ok = Response(status=200, body="OK")
>>> resp_500 = Response(status=500, body="Error")
>>> resp_200 = Response(status=200, body="OK")
>>> resp_ok == resp_500
... False
>>> resp_ok == resp_200
... True

Field definition

  1. Using type hints and an optional default value
from dataclasses import dstaclass

@dataclass
class Response:
body: str
status: int = 200
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp_ok = Response(body="OK")
>>> logging.info(resp_ok)
... INFO:root:Response(body='OK', status=200)
>>> # Create 500 response
>>> resp_error = Response(status=500, body="error")
>>> logging.info(resp_error)
... INFO:root:Response(body='error', status=500)

Specify a default value

from dataclasses import dataclass

@dataclass
class Response:
body: str
status: int = field(default=200)
from dataclasses import dataclass

@dataclass
class Response:
status: int
body: str
headers: dict = {}
from dataclasses import dataclass, field

@dataclass
class Response:
status: int
body: str
headers: dict = field(default_factory=dict)
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp = Response(status=200, body="OK")
>>> logging.info(resp)
... INFO:root:Response(status=200, body='OK', headers={})

Include or exclude fields in automatically implemented dunder methods

from dataclasses import dataclass

@dataclass
class Response:
body: str
headers: dict = field(init=False, default_factory=dict)
status: int = 200
def __init__(self, body:str, status: int=200):
self.body = body
self.status = status
self.headers = dict()
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>>
>>> resp = Response(body="Success")
>>> logging.info(resp)
... INFO:root:Response(body='Success', headers={}, status=200)
>>>
>>> # passing a headers param on initialization will raise an srgument error.
>>> resp = Response(body="Success", headers={})
... TypeError: __init__() got an unexpected keyword argument 'headers'
>>>
>>> # 'headers' is an instance attribute and can be used after initialization.
>>> resp.headers = {"Content-Type": "application/json"}
>>> logging.info(resp)
... INFO:root:Response(body='Success', headers={'Content-Type': 'application/json'}, status=200)
from dataclasses import dataclass

@dataclass
class Response:
body: str
headers: dict = field(repr=False, init=False, default_factory=dict)
status: int = 200
>>> resp = Response(body="Success")
>>> logging.info(resp)
... INFO:root:Response(body='Success', status=200)
from dataclasses import dataclass, field

@dataclass
class Response:
body: str
headers: dict = field(compare=False, init=False, repr=False, default_factory=dict)
status: int = 200
>>> resp_json = Response(body="Success")
>>> resp_json.headers = {"Content-Type": "application/json"}
>>> resp_xml = Response(body="Success")
>>> resp_xml.headers = {"Content-Type": "application/xml"}
>>> resp_json == resp_xml
... True

Add field specific metadata

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Response:
body: Any = field(metadata={"force_str": True})
headers: dict = field(init=False, repr=False, default_factory=dict)
status: int = 200
>>> from dataclasses import fields
>>> resp = Response(body="Success")
>>> fields(resp)
...(Field(name='body',type=typing.Any,default=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'force_str': True}),_field_type=_FIELD),
Field(name='headers',type=<class 'dict'>,default=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,default_factory=<class 'dict'>,init=False,repr=False,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),
Field(name='status',type=<class 'int'>,default=200,default_factory=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
>>> body_field = next(
(field
for field in fields(resp)
if field.name == "body"),
None
)
>>> logging.info(body_field)
... INFO:root:Field(name='body',type=typing.Any,default=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f955a0e97f0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'force_str': True}),_field_type=_FIELD)
>>> logging.info(body_field.metadata)
... INFO:root:{'force_str': True}

Customize object initialization using __post_init__

from dataclasses import dataclass, field 
from typing import Any
from sys import getsizeof

@dataclass
class Response:
body: str
headers: dict = field(init=False, compare=False, default_factory=dict)
status: int = 200

def __post_init__(self):
"""Add a Content-Length header on init"""
self.headers["Content-Length"] = getsizeof(self.body)
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp = Response("Success")
>>> logging.info(resp)
... INFO:root:Response(body='Success', headers={'Content-Length': 56}, status=200)
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp = Response(body={"message": "Success"})
>>> logging.info(resp)
... INFO:root:Response(body='{"message": "Success"}', headers={'Content-Length': 71}, status=200)
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response where 'body' will be stored as a dict.
>>> resp = Response(body={"message": "Success"}, force_body_str=False)
>>> logging.info(resp)
... INFO:root:Response(body={'message': 'Success'}, headers={'Content-Length': 232}, status=200)
>>> # Create 200 response where 'body' will be stored as a string.
>>> resp_str = Response(body={"message": "Success"})
>>> logging.info(resp)
... INFO:root:Response(body='{"message": "Success"}', headers={'Content-Length': 71}, status=200)

Data classes that we can compare and order

from dataclasses import dataclass

@dataclass(order=True)
class Response:
body: str
status: int = 200
>>> resp_ok = Response(body="Success")
>>> resp_error = Response(body="Error", status=500)
>>> sorted([resp_ok, resp_error])
... [Response(body='Error', status=500), Response(body='Success', status=200)]
from dataclasses import dataclass, field 
from sys import getsizeof
@dataclass(order=True)
class Response:
body: str = field(compare=False)
status: int = field(compare=False, default=200)
_content_length: int = field(compare=True, init=False)
def __post_init__(self):
"""Calculate and store content length on init"""
self._content_length = getsizeof(self.body)
>>> resp_ok = Response(body="Success")
>>> resp_error = Response(body="Error", status=500)
>>> sorted([resp_ok, re sp_error])
...[Response(body='Error', status=500, _content_length=54), Response(body='Success', status=200, _content_length=56)]
>>> # resp_error is smaller than resp_ok because
>>> # "Error" is smaller than "Success"
>>> resp_ok = Response(body="Success")
>>> resp_error = Response(body="Failure")
>>> resp_ok == resp_error
... True
>>> # both instances are equal because Success and Failure have the same amounts of chars
>>> # and getsizeof() returns the same size for both strings.
from dataclasses import dataclass, field 
from sys import getsizeof

@dataclass(order=True)
class Response:
_content_length: int = field(compare=True, init=False)
body: str = field(compare=True)
status: int = field(compare=False, default=200)

def __post_init__(self):
"""Calculate and store content length on init"""
self._content_length = getsizeof(self.body)
>>> resp_ok = Response(body="Success")
>>> resp_error = Response(body="Failure")
>>> resp_ok == resp_error
... False

Frozen (or immutable) instances

from dataclasses import dataclass, field

@dataclass(frozen=True)
class Response:
body: str
status: int = 200
>>> resp_ok = Response(body="Success")
>>> resp_ok.body = "Done!"
... dataclasses.FrozenInstanceError cannot assign to field 'body'
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Check values of 'resp_ok'
>>> logging.info(resp_ok)
... INFO:root:Response(body='Success', status=200)
>>>
>>> object.__setattr__(resp_ok, "body", "Done!")
>>> # We have modified a "frozen" instance
>>> logging.info(resp_ok)
... INFO:root:Response(body='Done!', status=200)

Updating an object instance by replacing the entire object.

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Response:
body: str
status: int = 200

>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp_ok = Response(body="Success")
>>> logging.info(resp_ok)
... INFO:root:Response(body='Success', status=200)
>>> # Replace instance
>>> resp_ok = replace(resp_ok, body="OK")
>>> logging.info(resp_ok)
... INFO:root:Response(body='OK', status=200)

Adding class attributes

  1. Class attribute are defined outside __init__
  2. Every instance of the class will share the same value of a class attribute.
from dataclasses import dataclass
from typing import ClassVar, Any
from sys import getsizeof
from collections.abc import Callable

@dataclass
class Response:
body: str
_content_length: int = field(default=0, init=False)
status: int = 200
getsize_fun: ClassVar[Callable[[Any], int]] = getsizeof

def __post_init__(self):
"""Calculate content length by using getsize_fun"""
self._content_length = self.getsize_fun(self.body)
from functools import reduce

def calc_str_unicode_weight(self, string: str):
"""Calculates strn weight by adding each character's unicode value"""
return reduce(lambda weight, char: weight+ord(char), string, 0)

@dataclass
class ResponseUnicode(Response):
getsize_fun: ClassVar[Callable[[Any], int]] = calc_str_unicode_weight

>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response, using getsizeof to calculate content length
>>> resp_ok = Response(body="Success")
>>> logging.info(resp_ok)
... INFO:root:Response(body='Success', _content_length=56, status=200)
>>> # Override function to use when calculating content length
>>> resp_ok_unicode = ResponseUnicode(body="Success")
>>> logging.info(resp_ok_unicode)
... INFO:root:ResponseUnicode(body='Success', _content_length=729, status=200)

Inheritance in data classes

from dataclasses import dataclasses

@dataclass
class Response:
body: str
status: int
headers: dict

@dataclass
class JSONResponse(Response):
status: int = 200
headers: dict = field(default_factory=dict, init=False)

def __post_init__(self):
"""automatically add Content-Type header"""
self.headers["Content-Type"] = "application/json"
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp_ok = JSONResponse(body=json.dumps({"message": "OK"}))
>>> logging.info(resp_ok)
... INFO:root:JSONResponse(body='{"message": "OK"}', status=200, headers={'Content-Type': 'application/json'})

Hash-able object

from dataclasses import dataclass

@dataclass(frozen=True)
class Response:
body: str
status: int = 200
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> # Create 200 response
>>> resp_ok = Response(body="Success")
>>> # Create 500 response
>>> resp_error = Response(body="Error", status=500)
>>> # Create a mapping of response -> usernames
>>> responses_to_users = {
... resp_ok: ["j_mccain", "a_perez"],
... resp_error: ["d_dane", "b_rodriguez"]
... }
>>> logging.info(responses_to_users[resp_ok])
... INFO:root:['j_mccain', 'a_perez']

A use case for data classes

>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> body = JSONBody(message="Success", data={"values": ["value1", "value2"]})
>>> resp = JSONResponse(body)
>>> logging.info("resp: %s", resp)
... INFO:root:resp: JSONResponse(_content_length=48, body=JSONBody(message='Success', data={'values': ['value1', 'value2']}), pager=None, headers={'Content-Length': 48, 'Content-Type': 'application/json'}, status=200)
>>> pager = Pager(1, prev="?prev=0", next="?next=2")
>>> resp = JSONResponse(body, pager=pager)
>>> logging.info("resp: %s", resp)
... INFO:root:resp: JSONResponse(_content_length=48, body=JSONBody(message='Success', data={'values': ['value1', 'value2']}), pager=Pager(prev='?prev=0', next='?next=2'), headers={'Content-Length': 48, 'Content-Type': 'application/json'}, status=200)
>>> from dataclasses import asdict, astuple
>>> logging.info("serialized resp: %s", asdict(resp))
... INFO:root:serialized resp: {'_content_length': 48, 'body': {'message': 'Success', 'data': {'values': ['value1', 'value2']}}, 'pager': {'prev': '?prev=0', 'next': '?next=2'}, 'headers': {'Content-Length': 48, 'Content-Type': 'application/json'}, 'status': 200}
>>> logging.info("resp as tuple: %s", astuple(resp))
... INFO:root:resp astuple: (48, ('Success', {'values': ['value1', 'value2']}), ('?prev=0', '?next=2'), {'Content-Length': 48, 'Content-Type': 'application/json'}, 200)

Benefits of using data classes

  1. We can create powerful classes with less code.
  2. Type hints are enforced for every class and instance attribute.
  3. We can customize how special dunder methods are implemented.
  4. We can use data classes in the same way we use regular classes. In the previous example we used descriptors and class methods without an issue.
  5. Inheritance can be used to make it easier to use data classes.
  6. It’s esier to serialize instsances to dictionaries or tuples.
  7. We can mix regular classes and data classes.

Disadvantages of using data classes

  1. When creating data classes that can be compared and ordered the order in which you define the fields matters. Read-ability can take a hit because of this. It is recommended to try and separate fields by type. In the HTTPResponse class we have first private attributes, then instance attributes, init only parameters and class attributes.
  2. Field definition order also matters when using default values. Since __init__ 's arguments are implemented using the same order the fields are defined, we have to first define attributes without default values and then attributes with default values.
  3. When using frozen=True we cannot update values in __post_init__
  4. We have to manually optimize attribute access if needed. Meaning, adding __slots__. Real Python has a great example of this.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store