What is Reservoir Sampling? Perform it using program in Python

In this blog, we are going to see what Reservoir Samling is and how it can be performed using Python.

Reservoir sampling is a set of random algorithms meant for randomly choosing samples from a list of m items. m is either a very large or unknown number but is large enough that it does not fit into the main memory.

We need a function that randomly selects k numbers, where 1 <= k <= m.

One of the solutions is to create an array reservoir of maximum size k and one by one randomly select an item from the stream.

If the item selected is not the one selected previously then it is put in the reservoir.

We can search the item in the reservoir to see if it has been previously selected or not.

Time complexity = O(k^2).

This is a costly method if k is big.
If the input is in the form of a stream the method becomes less efficient.

 

This problem can be solved in O(n) time where the solution works well with stream input as well.

  • Create an array reservoir, copy the first K items of the stream to it.
  • Now we will consider all items one by one until the mth item is reached.
  • Now we need to obtain a random number which should be from 0 to j. Here j is the current item index.
  • A random number is generated and assigned to q.
  • If the value is acceptable it can be placed in the reservoir.

Following is how the solution is implemented.

Code: Reservoir Sampling

#Import random
import random

#Array function

def print_array(stream, m):
    for j in range(m):
        print(stream[j], end=" ");
    print();


#Randomly select function

def select_items(stream, m, k):
    j = 0;

    # reservoir[]

    reserv = [0] * k;
    for j in range(k):
        reserv[j] = stream[j];

    #Going from element to mth element

    while (j < m):
        # Pick a random from 0 to j

        q = random.randrange(j + 1);

        # If the randomly picked index is smaller than k, then replace it with new element from stream
        if (q < k):
            reserv[q] = stream[j];
        j += 1;

    print("Following are k randomly selected items");
    print_array(reserv, k);


# Driver

if __name__ == "__main__":
    stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12];
    m = len(stream);
    k = 5;
    select_items(stream, m, k);

Output

Note- Each time the code is run, the result will be different.

Reservoir Sampling

Also read: Thompson Sampling for Multi-Armed Bandit Problem in Python

Leave a Reply

Your email address will not be published. Required fields are marked *