Dividing a Column into Two Columns in Pandas Dataframe
In this tutorial, we will solve a task to divide a given column into two columns in a Pandas Dataframe in Python. There are many ways to do this. Here we will use Series.str.split() functions.
Furthermore, we come across a term here: Pandas Dataframe. Here, Pandas is an open-source Python library that needs to be imported into the code and supplies with tools for statistical and analytical Python problems. Moreover, Dataframe is a mutable and heterogenous Pandas object that has three key elements: rows, columns, and data. Dataframe’s job is to present the raw dataset in a more clean and structured form to apply Pandas operations.
Now, let us know more on str.split() before proceeding further deep.
What is str.split() function
The str.split() is a Pandas function that splits a Series string at the specified separator. This is almost similar to the split() function of string as both are used for splitting. But there is a difference. For instance, split() function is carried out on a string only but in str.split(), it is performed in a whole series.Now, the syntax of str.split() is as follows:
Series.str.split(pat=None, n=-1, expand=False)
where,
- pat : the separator at which the string is split
- n : number of maximum separations to be made and the default n is -1
- expand : gives boolean value which returns a dataframe with different values in different columns if True, else a series with lists of strings
Implementation of the Method
Code 1 : We will see here how to split the the a column according to a single space
Here the approach is :
- First,import the pandas.
- Next, take a dictionary and convert into dataframe and store in df.
- Then, write the command df.Actor.str.split(expand=True). This means that the column ‘Actor‘ is split into 2 columns on the basis of space and then print.
# import Pandas import pandas as pd # create dataframe df = pd.DataFrame({'Actor': ['Ranbir Kapoor', 'Hrithik Roshan', 'Salman Khan', 'Rani Mukherjee'], 'Film':['Rockstar', 'War', 'Tubelight', 'Black']}) print("Given Dataframe is :\n",df) #splitting on the basis of single space. print("\nSplitting 'Actor':\n", df.Actor.str.split(expand=True))
Output :
Given Dataframe is : Actor Film 0 Ranbir Kapoor Rockstar 1 Hrithik Roshan War 2 Salman Khan Tubelight 3 Rani Mukherjee Black Splitting 'Actor': 0 1 0 Ranbir Kapoor 1 Hrithik Roshan 2 Salman Khan 3 Rani Mukherjee
- First, take the dataset and convert to dataframe and store in df.
- Then write the command: df[[‘First’,’Last’]] = df.Actor.str.split(expand=True). This means the column ‘Actor‘ is splitted according to the space and the first portion is under ‘First‘ and latter portion under ‘Last‘.
- Print df.
# import Pandas import pandas as pd # create dataframe df = pd.DataFrame({'Actor': ['Ranbir Kapoor', 'Hrithik Roshan', 'Salman Khan', 'Rani Mukherjee'], 'Film':['Rockstar', 'War', 'Tubelight', 'Black']}) print("Given Dataframe is :\n",df) # Adding two new columns to the existing dataframe. # bydefault splitting is done on the basis of single space. df[['First','Last']] = df.Actor.str.split(expand=True) print("\n After adding two new columns : \n", df)
Output :
Leave a Reply