## The problem

Yesterday I wasted about half an hour chasing a bug because I was mindlessly generating a 5x5 matrix with the list repeat syntax, aka `list * integer`

in pure Python:

```
n = 5
array = [[0] * n] * n
```

When I tried to flip some specific bits in this matrix, I observed that entire columns got flipped, instead of just one element:

```
array[0][0] = 1
print(array)
# Output: [[1, 0, 0, 0, 0],
# [1, 0, 0, 0, 0],
# [1, 0, 0, 0, 0],
# [1, 0, 0, 0, 0],
# [1, 0, 0, 0, 0]]
```

In retrospect, the behaviour makes sense, but in the middle of a much larger program, it was a bit hard to find.
Furthermore, I usually have `numpy`

available to help with
multi-dimensional array, so I definitely forgot how pure Python worked.

To demonstrate what’s going on under the hood, let’s generate an array of objects with list repeat syntax:

```
from dataclasses import dataclass
@dataclass
class Num:
val: int
zero = Num(0)
array = [zero] * 5
print(array)
# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]
```

In the example above, I’m using the repeat syntax to generate an array with 5 `Num`

objects, each having the value of `0`

.
The problem is it actually isn’t 5 different `Num`

objects. *It’s the same Num object used 5 times*.

```
array[0] is array[1]
# Output: True
```

Modify any of these references will cause the entire list to get updated:

```
array[0].val = 1
print(array)
# Output: [Num(val=1), Num(val=1), Num(val=1), Num(val=1), Num(val=1)]
```

Following this logic, in my original multi-dimensional array, I didn’t generate 5 distinct rows of number but repeated the same row of number 5 times. In other words, since `array[0]`

and `array[1]`

refer to the same `list`

object, mutating `array[0][0]`

will mutate `array[1][0]`

as well.

## The correct fix

To correctly generate distinct objects in a list, we will have to use a list comprehension to make sure that new objects are created in each iteration:

```
array = [Num(val=0) for _ in range(5)]
array[0] == array[1] # Output: True -- the objects in the list still have the same value (0)
array[0] is array[1] # Output: False -- but they are completely different object now
```

Putting it all together, if you want to generate a multi-dimensional array in pure Python, use list comprehension to ensure the rows are different list objects:

```
array = [[0 for _ in range(num_cols)] for _ in range(num_rows)]]
```

## The wrong, but fun, fix

After finishing my program, I went back to take a look at CPython’s implementation of `list_repeat`

to see why it works the way it does under the hood:

```
static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;
if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item[0];
for (i = 0; i < n; i++) {
items[i] = elem;
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
*p = items[j];
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}
```

It’s been a while since I did C, so to understand this code, it’s worth noting that:

`a`

is a pointer to a`PyListObject`

- The elements in
`a`

is accessed through`a->ob_item`

. This is a*pointer to the pointers of the underlying PyObject in the list*, as defined here. That’s why we see`elem = a->ob_item[0];`

with`elem`

being a pointer to a`PyObject`

for example.

In this implemenation, to repeat list `a`

`n`

times in Python, the interpreter will first allocate a new list `np`

with the correct size. Then during each iteration from 1 to n, the value of each pointer in `a->ob_item`

get copied over to `np`

.

To illustrate, in our `[Num(val=0)] * 5`

example earlier:

`Num(val=0)`

will be a`PyObject`

. Let’s say its memory address is`0xABC`

.- The new list
`np`

will contain`0xABC`

repeated 5 times. - Modifying any element in
`np`

will modify the object located at`0xABC`

.

In other words, the list repeat syntax gives me a list of the same pointer to the same object.

Fun fact:I found Guido’s original commit implementing list_repeat from 1991 here. I ❤️ git.

I think this behaviour is desirable because the alternative is to deepcopy an arbitrary `PyObject`

, which, in my limited understanding, is virtually impossible to do safely in CPython. I tried to monkey around with `memcpy`

a bit:

```
static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
Py_ssize_t i, j;
Py_ssize_t size;
PyListObject *np;
PyObject **p, **items;
PyObject *elem;
PyTypeObject *elem_type;
Py_ssize_t elem_size;
if (n < 0)
n = 0;
if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
return PyErr_NoMemory();
size = Py_SIZE(a) * n;
if (size == 0)
return PyList_New(0);
np = (PyListObject *) list_new_prealloc(size);
if (np == NULL)
return NULL;
if (Py_SIZE(a) == 1) {
items = np->ob_item;
elem = a->ob_item[0];
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
for (i = 0; i < n; i++) {
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
items[i] = s;
Py_INCREF(s);
Py_INCREF(elem);
}
}
else {
p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
for (j = 0; j < Py_SIZE(a); j++) {
elem = items[j];
elem_type = Py_TYPE(elem);
elem_size = elem_type->tp_basicsize + abs(Py_SIZE(elem)) * elem_type->tp_itemsize;
PyObject *s = (PyObject *)PyObject_Malloc(elem_size);
memcpy(s, elem, elem_size);
*p = s;
Py_INCREF(*p);
p++;
}
}
}
Py_SET_SIZE(np, size);
return (PyObject *) np;
}
```

As expected, due to the shallow nature of `memcpy`

, even though we get new pointer to the object in the list created by `list_repeat`

, it still won’t work correctly:

```
from dataclasses import dataclass
@dataclass
class Num:
val: int
zero = Num(0)
array = [zero] * 5
print(array)
# Output: [Num(val=0), Num(val=0), Num(val=0), Num(val=0), Num(val=0)]
array[0] is array[1]
# Output: False // it was True before
# Even though we have new pointer, they are still not deepcopies
array[0].val = 1
array[1].val
# Output: 1
```

In any case, it was really fun trying to work out the reason for a seemingly trivial bug and see if I could hack the language to behave differently.