What is Reservoir Sampling? Perform it using program in Python
In this blog, we are going to see what Reservoir Samling is and how it can be performed using Python.
Reservoir sampling is a set of random algorithms meant for randomly choosing k samples from a list of m items. m is either a very large or unknown number but m is large enough that it does not fit into the main memory.
We need a function that randomly selects k numbers, where 1 <= k <= m
.
One of the solutions is to create an array reservoir of maximum size k and one by one randomly select an item from the stream.
If the item selected is not the one selected previously then it is put in the reservoir.
We can search the item in the reservoir to see if it has been previously selected or not.
Time complexity = O(k^2).
This is a costly method if k is big.
If the input is in the form of a stream the method becomes less efficient.
This problem can be solved in O(n) time where the solution works well with stream input as well.
- Create an array reservoir, copy the first K items of the stream to it.
- Now we will consider all items one by one until the mth item is reached.
- Now we need to obtain a random number which should be from 0 to j. Here j is the current item index.
- A random number is generated and assigned to q.
- If the value is acceptable it can be placed in the reservoir.
Following is how the solution is implemented.
Code: Reservoir Sampling
#Import random import random #Array function def print_array(stream, m): for j in range(m): print(stream[j], end=" "); print(); #Randomly select function def select_items(stream, m, k): j = 0; # reservoir[] reserv = [0] * k; for j in range(k): reserv[j] = stream[j]; #Going from element to mth element while (j < m): # Pick a random from 0 to j q = random.randrange(j + 1); # If the randomly picked index is smaller than k, then replace it with new element from stream if (q < k): reserv[q] = stream[j]; j += 1; print("Following are k randomly selected items"); print_array(reserv, k); # Driver if __name__ == "__main__": stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]; m = len(stream); k = 5; select_items(stream, m, k);
Output
Note- Each time the code is run, the result will be different.
Also read: Thompson Sampling for Multi-Armed Bandit Problem in Python
Leave a Reply