## The problem#

Yesterday I wasted about half an hour chasing a bug because I was mindlessly generating a 5x5 matrix with the list repeat syntax, aka `list * integer` in pure Python:

``````n = 5
array = [ * n] * n
``````

When I tried to flip some specific bits in this matrix, I observed that entire columns got flipped, instead of just one element:

``````array = 1
print(array)

# Output: [[1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0],
#  [1, 0, 0, 0, 0]]
``````

In retrospect, the behaviour makes sense, but in the middle of a much larger program, it was a bit hard to find. Furthermore, I usually have `numpy` available to help with multi-dimensional array, so I definitely forgot how pure Python worked.

To demonstrate what’s going on under the hood, let’s generate an array of objects with list repeat syntax:

``````from dataclasses import dataclass

@dataclass
class Num:
val: int

zero = Num(0)
array = [zero] * 5
print(array)

# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]
``````

In the example above, I’m using the repeat syntax to generate an array with 5 `Num` objects, each having the value of `0`. The problem is it actually isn’t 5 different `Num` objects. It’s the same `Num` object used 5 times.

``````array is array

# Output: True
``````

Modify any of these references will cause the entire list to get updated:

``````array.val = 1
print(array)

# Output: [Num(val=1), Num(val=1), Num(val=1), Num(val=1), Num(val=1)]
``````

Following this logic, in my original multi-dimensional array, I didn’t generate 5 distinct rows of number but repeated the same row of number 5 times. In other words, since `array` and `array` refer to the same `list` object, mutating `array` will mutate `array` as well.

## The correct fix#

To correctly generate distinct objects in a list, we will have to use a list comprehension to make sure that new objects are created in each iteration:

``````array = [Num(val=0) for _ in range(5)]
array == array  # Output: True -- the objects in the list still have the same value (0)
array is array  # Output: False -- but they are completely different object now
``````

Putting it all together, if you want to generate a multi-dimensional array in pure Python, use list comprehension to ensure the rows are different list objects:

``````array = [[0 for _ in range(num_cols)] for _ in range(num_rows)]]
``````

## The wrong, but fun, fix#

After finishing my program, I went back to take a look at CPython’s implementation of `list_repeat` to see why it works the way it does under the hood:

``````static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;

if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item;
for (i = 0; i < n; i++) {
items[i] = elem;
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
*p = items[j];
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}
``````

It’s been a while since I did C, so to understand this code, it’s worth noting that:

• `a` is a pointer to a `PyListObject`
• The elements in `a` is accessed through `a->ob_item`. This is a pointer to the pointers of the underlying PyObject in the list, as defined here. That’s why we see `elem = a->ob_item;` with `elem` being a pointer to a `PyObject` for example.

In this implemenation, to repeat list `a` `n` times in Python, the interpreter will first allocate a new list `np` with the correct size. Then during each iteration from 1 to n, the value of each pointer in `a->ob_item` get copied over to `np`.

To illustrate, in our `[Num(val=0)] * 5` example earlier:

• `Num(val=0)` will be a `PyObject`. Let’s say its memory address is `0xABC`.
• The new list `np` will contain `0xABC` repeated 5 times.
• Modifying any element in `np` will modify the object located at `0xABC`.

In other words, the list repeat syntax gives me a list of the same pointer to the same object.

Fun fact: I found Guido’s original commit implementing list_repeat from 1991 here. I ❤️ git.

I think this behaviour is desirable because the alternative is to deepcopy an arbitrary `PyObject`, which, in my limited understanding, is virtually impossible to do safely in CPython. I tried to monkey around with `memcpy` a bit:

``````static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
PyTypeObject *elem_type;
Py_ssize_t elem_size;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;

if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item;
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
for (i = 0; i < n; i++) {
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
items[i] = s;
Py_INCREF(s);
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
elem = items[j];
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
*p = s;
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}
``````

As expected, due to the shallow nature of `memcpy`, even though we get new pointer to the object in the list created by `list_repeat`, it still won’t work correctly:

``````from dataclasses import dataclass

@dataclass
class Num:
val: int

zero = Num(0)
array = [zero] * 5
print(array)

# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]

array is array
# Output: False // it was True before

# Even though we have new pointer, they are still not deepcopies
array.val = 1
array.val
# Output: 1
``````

In any case, it was really fun trying to work out the reason for a seemingly trivial bug and see if I could hack the language to behave differently.