SAS Files Reading Using Pandas in Python
Hey fellow Python coder! In this tutorial, we will be covering what SAS files are and how to read SAS files using the Pandas library in Python programming.
SAS
(Statistical Analysis System) are popular types of files used to store data that can be applied for advanced analytics jobs and data management tasks. It is typically stored using the extension “.sas7bdat”. To read and modify SAS files we can make use of the Pandas library in Python. Pandas library is a very powerful library that comes along with functions for reading and modifying SAS files by converting them into Pandas DataFrames.
Reading SAS Files using Pandas Library
To read an SAS
file into a Pandas DataFrame, use pd.read_sas()
function. For this tutorial, I have downloaded the dates.sas7bdat file from this link. You can download any other file of your choice or create a SAS file on your own. We will be using two parameters where one is the path of the SAS file and the second mentions the original format used to store the SAS file (in our case it’s sas7bdat).
import pandas as pd df = pd.read_sas("dates.sas7bdat", format="sas7bdat")
In this tutorial, I am making use of Google Colab, where I can directly display the df
data frame and the result comes out as the following:
We can also print additional information about the data frame such as shape, information, and description of the dataset using the code snippet below.
print("INFORMATION :\n") print(df.info(), "\n") print("DESCRIPTION :\n") print(df.describe(), "\n") print("SHAPE : ",df.shape, "\n")
The resulting output comes out as the following:
Creating new Columns from already present columns
We will be extracting various parts of the date using the dt
column. In this dataset, we will be extracting year, month, day, hour, minute, and second using the code snippet below.
df['year'] = df['dt'].dt.year df['month'] = df['dt'].dt.month df['day'] = df['dt'].dt.day df['hour'] = df['dt'].dt.hour df['minute'] = df['dt'].dt.minute df['second'] = df['dt'].dt.second
This will add 6 new columns to the dataset and the resulting data frame gets displayed as the following:
The scope of this tutorial is limited to just reading the SAS
data and hence we end here. Thank you for reading.
Happy Learning!
Leave a Reply