The problem#

Yesterday I wasted about half an hour chasing a bug because I was mindlessly generating a 5x5 matrix with the list repeat syntax, aka list * integer in pure Python:

n = 5
array = [ * n] * n

When I tried to flip some specific bits in this matrix, I observed that entire columns got flipped, instead of just one element:

array = 1
print(array)

# Output: [[1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0]]

In retrospect, the behaviour makes sense, but in the middle of a much larger program, it was a bit hard to find. Furthermore, I usually have numpy available to help with multi-dimensional array, so I definitely forgot how pure Python worked.

To demonstrate what’s going on under the hood, let’s generate an array of objects with list repeat syntax:

from dataclasses import dataclass

@dataclass
class Num:
val: int

zero = Num(0)
array = [zero] * 5
print(array)

# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]

In the example above, I’m using the repeat syntax to generate an array with 5 Num objects, each having the value of 0. The problem is it actually isn’t 5 different Num objects. It’s the same Num object used 5 times.

array is array

# Output: True

Modify any of these references will cause the entire list to get updated:

array.val = 1
print(array)

# Output: [Num(val=1), Num(val=1), Num(val=1), Num(val=1), Num(val=1)]

Following this logic, in my original multi-dimensional array, I didn’t generate 5 distinct rows of number but repeated the same row of number 5 times. In other words, since array and array refer to the same list object, mutating array will mutate array as well.

The correct fix#

To correctly generate distinct objects in a list, we will have to use a list comprehension to make sure that new objects are created in each iteration:

array = [Num(val=0) for _ in range(5)]
array == array  # Output: True -- the objects in the list still have the same value (0)
array is array  # Output: False -- but they are completely different object now

Putting it all together, if you want to generate a multi-dimensional array in pure Python, use list comprehension to ensure the rows are different list objects:

array = [[0 for _ in range(num_cols)] for _ in range(num_rows)]]

The wrong, but fun, fix#

After finishing my program, I went back to take a look at CPython’s implementation of list_repeat to see why it works the way it does under the hood:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;

if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item;
for (i = 0; i < n; i++) {
items[i] = elem;
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
*p = items[j];
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}

It’s been a while since I did C, so to understand this code, it’s worth noting that:

• a is a pointer to a PyListObject
• The elements in a is accessed through a->ob_item. This is a pointer to the pointers of the underlying PyObject in the list, as defined here. That’s why we see elem = a->ob_item; with elem being a pointer to a PyObject for example.

In this implemenation, to repeat list a n times in Python, the interpreter will first allocate a new list np with the correct size. Then during each iteration from 1 to n, the value of each pointer in a->ob_item get copied over to np.

To illustrate, in our [Num(val=0)] * 5 example earlier:

• Num(val=0) will be a PyObject. Let’s say its memory address is 0xABC.
• The new list np will contain 0xABC repeated 5 times.
• Modifying any element in np will modify the object located at 0xABC.

In other words, the list repeat syntax gives me a list of the same pointer to the same object.

Fun fact: I found Guido’s original commit implementing list_repeat from 1991 here. I ❤️ git.

I think this behaviour is desirable because the alternative is to deepcopy an arbitrary PyObject, which, in my limited understanding, is virtually impossible to do safely in CPython. I tried to monkey around with memcpy a bit:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
PyTypeObject *elem_type;
Py_ssize_t elem_size;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;

if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item;
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
for (i = 0; i < n; i++) {
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
items[i] = s;
Py_INCREF(s);
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
elem = items[j];
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
*p = s;
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}

As expected, due to the shallow nature of memcpy, even though we get new pointer to the object in the list created by list_repeat, it still won’t work correctly:

from dataclasses import dataclass

@dataclass
class Num:
val: int

zero = Num(0)
array = [zero] * 5
print(array)

# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]

array is array
# Output: False // it was True before

# Even though we have new pointer, they are still not deepcopies
array.val = 1
array.val
# Output: 1

In any case, it was really fun trying to work out the reason for a seemingly trivial bug and see if I could hack the language to behave differently.