Index Resetting in Pandas Dataframe in Python
In this tutorial, we will solve the task of resetting the index in a Pandas Dataframe in Python language. For this, we will use reset_index().
Furthermore, we come across a term: Pandas Dataframe. Let’s first know what is Pandas. Pandas is nothing but an open-source Python library that provides different tools for working in different fields in Python programming like data analysis, finances, statistics. We use “import pandas as pd” for importing the library.
Pandas library is very common when we use Python for Data Science problems. The most common object in Pandas is called Dataframe.
Let us see more on Dataframes before we proceed with the main task.
What are Dataframes in Pandas Library?
Dataframes are 2-D mutable data structures in a tabular form, that is, it consists of rows and columns and data. These represent data in a more structured format and let us do data analysis and predictions easily with it. Moreover, the data here can be of any data type, hence dataframes are heterogeneous.
There are many ways to create dataframes. Datasets, after loaded from different storage places like CSV files, Excel files, etc, are in unstructured format and hence, are converted into Pandas Dataframe. Also, lists, arrays, dictionaries, etc can be converted into a dataframe directly. Let us see the code for it :
# import pandas import pandas as pd # initializing data dataset = {'Name':['Jeetu', 'Piku', 'Paro', 'Chetona', 'Rik'], 'Age':[25, 22, 27, 30, 29], 'Job':['TCS', 'Accenture', 'Amazon', 'Google', 'Capgemini'], 'Salary':['20000', '25000', '50000', '45000', '30000'] } # Convert dictionary into DataFrame df = pd.DataFrame(dataset) # print df df
Here we converted a dictionary into a dataframe. This is the original dataset we will use for our task.
Output :
Name | Age | Job | Salary | |
0 | Jeetu | 25 | TCS | 20000 |
1 | Piku | 22 | Accenture | 25000 |
2 | Paro | 27 | Amazon | 50000 |
3 | Chetona | 30 | 45000 | |
4 | Rik | 29 | Capgemini | 30000 |
How to use reset_index() for the Task ?
Our task is to reset the indexes in a Pandas Dataframe in Python. Generally resetting is required when we get a smaller dataframe from an originally huge dataframe due to some task and the original indexes are messed up and non-continuous because of that. Resetting results in continuous indexing and hence, in a more structured form of the dataframes.
Before proceeding with the coding, we need to know what does reset_index() function does. It simply does what it says in the name. It resets the index of the dataframe with a list of integers commonly or anything else input as per user choice. Let us see the syntax.
Dataframe.reset_index( level , drop , in-place , col_level , col_fil)
Approaching the Task
Approach 1 : Use new index without removing old index
To do this,
- First, convert the original dictionary into a dataframe and add the index column to it. The command should look like this: pd.DataFrame(data, indexing) and store resulting dataframe in df.
- Next, use command df.reset_index(in-place=True) where in-place = True means that changes are possible in original dataframe.
- Print df.
# import pandas import pandas as pd # Define a dictionary containing employee data dataset = {'Name':['Jeetu', 'Piku', 'Paro', 'Chetona', 'Rik'], 'Age':[25, 22, 27, 30, 29], 'Job':['TCS', 'Accenture', 'Amazon', 'Google', 'Capgemini'], 'Salary':['20000', '25000', '50000', '45000', '30000'] } index = {'a', 'b', 'c', 'd', 'e'} # Convert dictionary into DataFrame df = pd.DataFrame(dataset, index) # give new index df.reset_index(inplace = True) df
Output :
index | Name | Age | Job | Salary | |
0 | e | Jeetu | 25 | TCS | 20000 |
1 | a | Piku | 22 | Accenture | 25000 |
2 | d | Paro | 27 | Amazon | 50000 |
3 | c | Chetona | 30 | 45000 | |
4 | b | Rik | 29 | Capgemini | 30000 |
Here, you can see that both new index and default are intact.
Approach 2 : Use new index and remove old index
For this,
- Just use pd.DataFrame(data, index), that is, just add a new index to the dataframe. The old index gets removed.
# import pandas import pandas as pd # Initialize data dataset = {'Name':['Jeetu', 'Piku', 'Paro', 'Chetona', 'Rik'], 'Age':[25, 22, 27, 30, 29], 'Job':['TCS', 'Accenture', 'Amazon', 'Google', 'Capgemini'], 'Salary':['20000', '25000', '50000', '45000', '30000'] } # new index index = {'a', 'b', 'c', 'd', 'e'} # add new index df = pd.DataFrame(dataset, index) df
Output :
Name | Age | Job | Salary | |
e | Jeetu | 25 | TCS | 20000 |
a | Piku | 22 | Accenture | 25000 |
d | Paro | 27 | Amazon | 50000 |
c | Chetona | 30 | 45000 | |
b | Rik | 29 | Capgemini | 30000 |
You can see that the old index is gone
Approach 3 : Reset new index and make old index as default index
For this,
- Convert the given dictionary into dataframe and add the index along with it: pd.DataFrame(data, index)
- Next, write the command reset_index(in-place=True, drop= True) where in-place=True means that there are changes made in the original dataframe. Moreover, the drop=True means that the new index will be dropped.
# import pandas import pandas as pd # initialize dataset with a dictionary dataset = {'Name':['Jeetu', 'Piku', 'Paro', 'Chetona', 'Rik'], 'Age':[25, 22, 27, 30, 29], 'Job':['TCS', 'Accenture', 'Amazon', 'Google', 'Capgemini'], 'Salary':['20000', '25000', '50000', '45000', '30000'] } # new index index = {'a', 'b', 'c', 'd', 'e'} # Convert the dictionary into DataFrame df = pd.DataFrame(dataset, index) # remove index df.reset_index(inplace = True, drop = True) df
Output :
Name | Age | Job | Salary | |
0 | Jeetu | 25 | TCS | 20000 |
1 | Piku | 22 | Accenture | 25000 |
2 | Paro | 27 | Amazon | 50000 |
3 | Chetona | 30 | 45000 | |
4 | Rik | 29 | Capgemini | 30000 |
Here, you can see that the new index is removed.
Thank you for going through this article. You can check the articles below:
Leave a Reply