Simple Random Sampling in Python Programming

Introduction to sampling in Python

Hey, Python coder! This tutorial will cover the most basic type of sampling techniques in Python, i.e., Simple Random Sampling. But before moving forward, let’s first understand the terms and definitions regarding the concept.

Let’s start with the conceptual understanding!

Introduction to Sampling

Let’s say you have a big packet of candies in different colors and want to know its flavors. Tasting every candy in the packet is not advisable because it would take too much time and work. So, we would instead take a small group of sweets to taste and verify the flavor. Look at the illustration below where the whole packet is known as ‘population,’ and the small share of candies chosen is known as ‘sample.’

In Machine Learning, Sampling works similarly. Instead of studying every item in a large group (population), you pick a smaller group (sample) to gather information from. This sample is selected carefully to give you an idea of the entire population without examining every member.

That’s what Sampling is. This can be achieved through various approaches, namely, random sampling, stratified sampling, and clustered sampling. In this tutorial, we have limited our learning to Random Sampling Technique.

Introduction to Simple Random Sampling

Now, instead of candies, let’s be more specific and take the illustration of Gummy Bears. When it comes to Simple Random Sampling, when creating a sample, each Gummy Bear has an EQUAL chance of getting selected. They all stand on the same level, and no bear can be chosen above another bear. Let’s understand some basic terminologies for this example in the illustration below.

Introduction to Simple Random Sampling

Have a look at the terminologies below: 

  1. Big Group (Population): The whole packet of Gummy Bears that came sealed from the production.
  2. Equal Opportunity: In Simple Random Sampling, each gummy bear in the packet has an equal chance of being chosen.
  3. Random Selection: Now, you close your eyes, reach into the packet, and grab a gummy bear without looking. You’re not choosing based on color, shape, or specific criteria – it’s entirely random.

Code Implementation for Simple Random Sampling

In this section, we will cover the implementation of Simple Random Sampling in a step-by-step manner. We will take the example of the Gummy Bear packet and apply random sampling to the packets we create.

Step 1 – Importing Modules.

We will use the following modules for this tutorial: Numpy, Matplotlib, and Random Module.

import numpy as np
import matplotlib.pyplot as plt
import random
plt.style.use('seaborn')

Step 2 – Create gummy bear packets.

Packets can be created very simply using the code snippet below. The code demonstrates a simple function that takes a single parameter, i.e., the packet size, creates a list of Gummy bears, and returns the same.

def producePacket(packetSize):
    gummyBears = ["GB_" + str(i) for i in range(packetSize)]
    return gummyBears

packet_1 = producePacket(50)

This will result in a packet of 50 Gummy Bears. Using the code snippet below, let’s visualize the packet using a bar graph. We will take out the count of unique Gummy bears present and then plot the bar graph using the bar plot function.

unique, counts = np.unique(packet_1, return_counts=True)

plt.figure(figsize=(20, 6))
plt.bar(unique, counts)
plt.title("Initial Data for Gummy Bears Packet")
plt.xlabel("Gummy Bears ->")
plt.ylabel("Count ->")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

The resulting plot is shown below:

Gummy Bear - Original Packet Plot

Step 3 – Implement Random Sampling

To implement random sampling, we will first check if the size of the sample mentioned exceeds the size of the packet, which is impossible. If the condition is false, we will take a random sample using the random.sample function. We will plot the sampled data using the same approach as earlier.

def simpleRandomSampling(data,sampleSize):
  sizePacket = len(data)

  if(sampleSize > sizePacket):
    print("Sample size cannot be greater than packet size!")
    return None
  
  sampledGB_1 = random.sample(data, sampleSize)

  unique, counts = np.unique(sampledGB_1, return_counts=True)
  plt.figure(figsize=(4, 4))
  plt.bar(unique, counts)
  plt.title("Sampled Data for Gummy Bears Packet")
  plt.xlabel("Gummy Bears ->")
  plt.ylabel("Count ->")
  plt.xticks(rotation=45, ha='right')
  plt.tight_layout()
  plt.show()

When you call the function for a sample size of 10 gummy bears then the resulting plot of the sample data is below:

Implement Random Sampling in Python

Hope you are now clear about Simple Random Sampling and how to implement it using Python programming.

Also Read:

  1. What is Reservoir Sampling? Perform it using the program in Python.
  2. Thompson Sampling for Multi-Armed Bandit Problem in Python
  3. random.sample() vs random.choice() in Python

 Happy Learning!

Leave a Reply

Your email address will not be published. Required fields are marked *