SAS Files Reading Using Pandas in Python

Hey fellow Python coder! In this tutorial, we will be covering what SAS files are and how to read SAS files using the Pandas library in Python programming.

SAS (Statistical Analysis System) are popular types of files used to store data that can be applied for advanced analytics jobs and data management tasks. It is typically stored using the extension “.sas7bdat”. To read and modify SAS files we can make use of the Pandas library in Python. Pandas library is a very powerful library that comes along with functions for reading and modifying SAS files by converting them into Pandas DataFrames.

Reading SAS Files using Pandas Library

To read an SAS file into a Pandas DataFrame, use pd.read_sas() function. For this tutorial, I have downloaded the dates.sas7bdat file from this link. You can download any other file of your choice or create a SAS file on your own. We will be using two parameters where one is the path of the SAS file and the second mentions the original format used to store the SAS file (in our case it’s sas7bdat).

import pandas as pd 
df = pd.read_sas("dates.sas7bdat", format="sas7bdat")

In this tutorial, I am making use of Google Colab, where I can directly display the df data frame and the result comes out as the following:

Original SAS file

We can also print additional information about the data frame such as shape, information, and description of the dataset using the code snippet below.

print("INFORMATION :\n")
print(df.info(), "\n")

print("DESCRIPTION :\n")
print(df.describe(), "\n")

print("SHAPE : ",df.shape, "\n")

The resulting output comes out as the following:

Information About SAS Dataset

Creating new Columns from already present columns

We will be extracting various parts of the date using the dt column. In this dataset, we will be extracting year, month, day, hour, minute, and second using the code snippet below.

df['year'] = df['dt'].dt.year
df['month'] = df['dt'].dt.month
df['day'] = df['dt'].dt.day
df['hour'] = df['dt'].dt.hour
df['minute'] = df['dt'].dt.minute
df['second'] = df['dt'].dt.second

This will add 6 new columns to the dataset and the resulting data frame gets displayed as the following:

New SAS file

The scope of this tutorial is limited to just reading the SAS data and hence we end here. Thank you for reading.

Happy Learning!

Also Read:
  1. Reading and Parsing a TSV file in Python
  2. Read an image with scipy.misc in Python
  3. How to read a text file using Pandas in Python

Leave a Reply

Your email address will not be published. Required fields are marked *