# Weighted Random Shuffling¶

For some application I needed a random shuffle `r`

of a list of
numbers `a`

. The shuffle, however, should be weighted, such that,
given a list of weights `w`

, the element `a[i]`

with weight
`w[i]`

should be the first element of the random shuffle `r`

with
probability `w[i]/sum(w)`

. I searched for a python implementation of
such an algorithm, but I couldn’t find one. Also, `numpy`

doesnt’
provide it, and `numpy.random.shuffle`

only offers shuffling with
uniform weights. Thus, this code.

## Weighted shuffling in pure python¶

Weighted random shuffling is the same as weighted random sampling from
a list `a`

without replacement. That is, choose with probability
`w[i]/sum(w)`

element `a[i]`

from `a`

. Store this element in a
list `r`

. Then, remove element `a[i]`

from `a`

and `w[i]`

from
`w`

, and select a new element of the modified list `a`

, and so on
until `a`

is empty.

From Eli Bendersky’s site I got the code below for weighted random sampling.

```
from random import random
def weighted_choice(weights):
rnd = random() * sum(weights)
for i, w in enumerate(weights):
rnd -= w
if rnd < 0:
return i
```

Thus, `weighted_choice`

gives an index to the chosen element of an
array. With this index I can fill a random shuffle `r`

with elements
of `a`

.

As removing elements from lists is a bit expensive, I chose to set the
weight `w[i]`

of the chosen element to zero. This ensures that the
element `i`

cannot be selected the next time.

```
def weighted_shuffle(a,w):
w = list(w) # make a copy of w
if len(a) != len(w):
print("weighted_shuffle: Lenghts of lists don't match.")
return
r = [0]*len(a) # contains the random shuffle
for i in range(len(a)):
j = weighted_choice(w)
r[i]=a[j]
w[j] = 0
return r
```

## Fast shuffling¶

Once I had the above code, I went to Gerlach van der Heide. He managed to speed it up by several orders of magnitude.

```
from random import random
from bisect import bisect_right
import numpy as np
def weighted_shuffle(a,w):
r = np.empty_like(a)
cumWeights = np.cumsum(w)
for i in range(len(a)):
rnd = random() * cumWeights[-1]
j = bisect_right(cumWeights,rnd)
#j = np.searchsorted(cumWeights, rnd, side='right')
r[i]=a[j]
cumWeights[j:] -= w[j]
return r
a = np.arange(1,1000)
w = np.arange(1,1000)
r = weighted_shuffle(a,w)
print(r[:2])
```

```
[912 846]
```

Some testing shows, for a reason unknown to me, that
`np.searchsorted`

is slower than `bisect_right`

.

## A Wrong idea¶

While searching the web I came across the one-liner below. It is
elegant, but wrong. The weights according to which the elements of
`a`

are chosen are not in proportion to `w`

. Some basic
probability theory shows why this is the case. It comes down to proving that

where math:w_i` are the weights and \(U_i\) uniform random deviates on \([0,1]\).

```
def wrong(a,w):
aa = range(len(a))
aa.sort(key=lambda i: w[i]*random()*beta[i])
return [a[i] for i in aa]
```