Objects without reference cycles and cyclic GC
Each instance of a class in CPython created using the class syntax is involved in a cyclic GC mechanism. This increases the memory footprint of each instance and can create memory problems in heavily loaded systems.
Is it possible to use only basic reference counting mechanism when necessary?
Let's analyze one approach based on recordclass library that will help to create classes whose instances will only be deleted using the reference counting mechanism.
Note: this is translation from original post (in russian).
A little bit about garbage collection in CPython
The primary mechanism for garbage collection in Python is reference counting. Each object contains a field that contains the current value of the references to it. An object is destroyed as soon as the value of the reference counter becomes zero. However, it does not allow the disposal of objects that contain cyclic references. For example:
lst =  lst.append(lst) del lst
In such cases, after deleting the object, the counter of references to it remains more than zero. To solve this problem, Python has an additional mechanism that tracks objects and breaks loops in the graph of references between objects. There is a good article on how the cyclic garbage collection mechanism works in CPython3 article.
Memory overhead associated with the garbage collection mechanism
Typically, the garbage collection mechanism does not cause problems. But there is certain overhead associated with it:
The header PyGC_Head is added to each instance of the class during memory allocation: at least 24 bytes in Python <= 3.7 and 16 bytes in 3.8 on a 64-bit platform.
This can create a memory shortage problem if you run many instances of the same process, in which you need to have at the same time a very large number of objects with a relatively small number of attributes, and the amount of memory is limited.
Is it sometimes possible to limit oneself to the basic mechanism of reference counting?
The garbage collection mechanism may be redundant when the class represents a non-recursive data type. For example, records containing values of a simple type (numbers, strings, date/time). To illustrate, consider a simple class:
class Point: x: int y: int
If used correctly, reference cycles are not possible. Although in Python, nothing prevents "to shoot yourself in the foot":
p = Point(0, 0) p.x = p
That is, if cyclic GC is disabled, then in this case the object will not be disposed of.
However, for the
Point class, just could be limited to a reference counting mechanism. Of course, provided that when the program is executed, reference cycles will not be created, that is, the x and y attributes will take only integer values, as was stated when defining the class. But there is no standard way to refuse cyclic GC for user defined class yet.
Modern CPython is designed so that when defining custom classes in the structure, which is responsible for the type that defines the custom class, the flag Py_TPFLAGS_HAVE_GC is always set. It determines that class instances will be included in the garbage collection mechanism. For all such objects, when created, the header PyGC_Head is added, and they are included in the list of monitored objects. If the flag
Py_TPFLAGS_HAVE_GC is not set, then only the basic reference counting mechanism works. However, a single reset of
Py_TPFLAGS_HAVE_GC will not work. You will need to make changes to the core CPython responsible for creating and destroying instances. This is still problematic because it is too big a change in the core of CPython.
About one implementation
As an example of the implementation of the idea, consider using of base class
dataobject from the recordclass project. Using it, you can create classes whose instances do not participate in the mechanism of cyclic GC (
Py_TPFLAGS_HAVE_GC is not seted and, accordingly, there is no additional header
PyGC_Head). They have exactly the same structure in memory as class instances with __slots__, but without
from recordclass import dataobject class Point(dataobject): x:int y:int >>> p = Point(1,2) >>> print(p.__sizeof__(), sys.getsizeof(p)) 32 32
For comparison, we give a similar class with
class Point: __slots__ = 'x', 'y' x:int y:int >>> p = Point(1,2) >>> print(p.__sizeof__(), sys.getsizeof(p)) # this is in python 3.7 32 64
The size difference is exactly the size of the
PyGC_Head header. For instances with several attributes, such an increase in the size of its memory footprint may be significant. For instances of the
Point class, adding
PyGC_Head results in a 2-fold increase in its size.
To achieve this effect, a special metaclass
datatype is used, which provides the setting of subclasses of
dataobject. As a result of the configuration, the flag
Py_TPFLAGS_HAVE_GC is reset, the base instance size tp_basicsize increases by the amount necessary to store additional slots for fields. The corresponding field names are listed when the class is declared (the class
Point has two of them:
datatype metaclass also provides setting the values of the slots tp_alloc, tp_new, tp_dealloc, tp_free, which implement the correct algorithms for creating and destroying instances in memory. By default, instances lack __weakref__ and __dict__ (as with class instances with
As one could see, in CPython, if necessary, it is possible to disable the mechanism of cyclic garbage collection for a particular class, when there is confidence that its instances will not form cyclic references. This will also reduce the size of each instance in memory by the size of the
In the next article we will try to demonstrate ability to reduce memory usage using classes based on dataobject.