DataFrame, date_range(), slice() in Python Pandas library

Hey there everyone, Today will learn about DataFrame, date_range(), and slice() in Pandas. We all know, Python is a powerful language, that allows us to use a variety of functions and libraries. It becomes a lot easier to work with datasets and analyze them due to libraries like Pandas.
So, let’s get started.

DataFrame in Pandas

DataFrame is a two-dimensional data structure used to represent tabular data. It represents data consisting of rows and columns.
For creating a DataFrame, first, we need to import the Pandas library.

import pandas as pd

Now, we will have a look at different ways of creating DataFrame.

1. Using a ‘.csv’ file :

We can create a DataFrame by importing a ‘.csv’ file using read_csv() function, as shown in the code below:

#reading .csv file to make dataframe
df = pd.read_csv('file_location')

#displaying the dataframe
df

2. Using an excel file :

DataFrame can also be created by importing an excel file, it is similar to using a ‘.csv’ file with just a change in the function name, read_excel()

#reading the excel file to create dataframe
df = pd.read_excel('file_location')

#display dataframe
df

3. Using Dictionary:

We can also create our DataFrame using a dictionary where the key-value pairs of the dictionary will make the rows and columns for our DataFrame respectively.

#creating data using dictionary
my_data = {
    'date': ['2/10/18','3/11/18','4/12/18'],
    'temperature': [31,32,33],
    'windspeed': [7,8,9]
}

#creating dataframe
df = pd.DataFrame(my_data)

#displaying dtaframe
df

OUTPUT:
DataFrame in Pandas

4.Using a list of tuples :

Here, the list of tuples created would provide us with the values of rows in our DataFrame, and we have to mention the column values explicitly in the pd.DataFrame() as shown in the code below:

#creating data using tuple list
my_data = [
    ('1/10/18',30,6),
    ('2/11/18',31,7),
    ('3/12/18',32,7)
]

#creating dataframe
df = pd.DataFrame(data=my_data, columns= ['date','temperature','windspeed'])

#displaying dataframe
df

We can also use a list of dictionary in place of tuples.
OUTPUT:

Pandas DataFrame using List of Tupple

date_range() in Pandas

The date_range function in Pandas gives a fixed frequency DatetimeIndex.
Syntax : pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs).
Let’s try to understand the working of some of the arguments of date_range() with the help of code and their output.

start: Left bound for generating dates.
end: Right bound for generating dates.
freq: Frequency strings can have multiple values, ex:4H

pd.date_range(start ='12-1-2019', end ='12-2-2019', freq ='4H') 

OUTPUT:

DatetimeIndex(['2019-12-01 00:00:00', '2019-12-01 04:00:00',
               '2019-12-01 08:00:00', '2019-12-01 12:00:00',
               '2019-12-01 16:00:00', '2019-12-01 20:00:00',
               '2019-12-02 00:00:00'],
              dtype='datetime64[ns]', freq='4H')

periods: Number of periods to generate.

pd.date_range(start ='12-1-2019', end = '12-10-2019' , periods = 4)

OUTPUT:

DatetimeIndex(['2019-12-01', '2019-12-04', '2019-12-07', '2019-12-10'], dtype='datetime64[ns]', freq=None)

tz: Name of the Time zone for returning localized DatetimeIndex

pd.date_range(start='12/1/2019', periods=4, tz='Asia/Hong_Kong')

OUTPUT:

DatetimeIndex(['2019-12-01 00:00:00+08:00', '2019-12-02 00:00:00+08:00',
               '2019-12-03 00:00:00+08:00', '2019-12-04 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Hong_Kong]', freq='D')

Also, read: Python program to Normalize a Pandas DataFrame Column

slice() in Pandas

str.slice() is used to slice a substring from a string present in the DataFrame. It has the following parameters:
start: Start position for slicing
end: End position for slicing
step: Number of characters to step
Note: “.str” must be added as a prefix before calling this function because it is a string function.

example 1:
we will try to slice the year part(“/18”) from ‘date’ present in the DataFrame ‘df’

slice() in Pandas

start, stop, step = 0, -3, 1
  
# converting 'date' to string data type 
df["date"]= df["date"].astype(str) 
  
# slicing 
df["date"]= df["date"].str.slice(start, stop, step) 
df

OUTPUT:
temperature windspeed table
So, we have successfully sliced the year part from the date.

example 2:
We have this DataFrame

Pandas DataFrame

Now, we will try to remove the decimal part from the ‘height’ present in the DataFrame ‘df’.

start, stop, step = 0, -2, 1
  
# converting 'height' to string data type 
df["height"]= df["height"].astype(str) 
  
# slicing 
df["height"]= df["height"].str.slice(start, stop, step) 
df

OUTPUT:

name age height table

So, we have successfully removed the decimal part from ‘height’.

Leave a Reply

Your email address will not be published. Required fields are marked *