Dividing a Column into Two Columns in Pandas Dataframe

Post Views: 914

In this tutorial, we will solve a task to divide a given column into two columns in a Pandas Dataframe in Python. There are many ways to do this. Here we will use Series.str.split() functions.

Furthermore, we come across a term here: Pandas Dataframe. Here, Pandas is an open-source Python library that needs to be imported into the code and supplies with tools for statistical and analytical Python problems. Moreover, Dataframe is a mutable and heterogenous Pandas object that has three key elements: rows, columns, and data. Dataframe’s job is to present the raw dataset in a more clean and structured form to apply Pandas operations.

Now, let us know more on str.split() before proceeding further deep.

What is str.split() function

The str.split() is a Pandas function that splits a Series string at the specified separator. This is almost similar to the split() function of string as both are used for splitting. But there is a difference. For instance, split() function is carried out on a string only but in str.split(), it is performed in a whole series.Now, the syntax of str.split() is as follows:

Series.str.split(pat=None, n=-1, expand=False)

where,

pat : the separator at which the string is split
n : number of maximum separations to be made and the default n is -1
expand : gives boolean value which returns a dataframe with different values in different columns if True, else a series with lists of strings

Implementation of the Method

Code 1 : We will see here how to split the the a column according to a single space

Here the approach is :

First,import the pandas.
Next, take a dictionary and convert into dataframe and store in df.
Then, write the command df.Actor.str.split(expand=True). This means that the column ‘Actor‘ is split into 2 columns on the basis of space and then print.

# import Pandas 
import pandas as pd 

# create dataframe 
df = pd.DataFrame({'Actor': ['Ranbir Kapoor', 'Hrithik Roshan', 'Salman Khan', 'Rani Mukherjee'], 
        'Film':['Rockstar', 'War', 'Tubelight', 'Black']}) 
print("Given Dataframe is :\n",df) 

#splitting on the basis of single space. 
print("\nSplitting 'Actor':\n", df.Actor.str.split(expand=True))

Output :

Given Dataframe is :
             Actor       Film
0   Ranbir Kapoor   Rockstar
1  Hrithik Roshan        War
2     Salman Khan  Tubelight
3  Rani Mukherjee      Black

Splitting 'Actor':
          0          1
0   Ranbir     Kapoor
1  Hrithik     Roshan
2   Salman       Khan
3     Rani    Mukherjee

Here, the you can see in the output that ‘Actor‘ column has been splitted and printed separately.

Code 2 : Separate into two columns with column names and print together

Here the approach is :

First, take the dataset and convert to dataframe and store in df.
Then write the command: df[[‘First’,’Last’]] = df.Actor.str.split(expand=True). This means the column ‘Actor‘ is splitted according to the space and the first portion is under ‘First‘ and latter portion under ‘Last‘.

Print df.

# import Pandas 
import pandas as pd 

# create dataframe 
df = pd.DataFrame({'Actor': ['Ranbir Kapoor', 'Hrithik Roshan', 'Salman Khan', 'Rani Mukherjee'], 
        'Film':['Rockstar', 'War', 'Tubelight', 'Black']}) 
print("Given Dataframe is :\n",df) 

# Adding two new columns to the existing dataframe. 
# bydefault splitting is done on the basis of single space. 
df[['First','Last']] = df.Actor.str.split(expand=True) 

print("\n After adding two new columns : \n", df)

Output :

Given Dataframe is :
             Actor       Film
0   Ranbir Kapoor   Rockstar
1  Hrithik Roshan        War
2     Salman Khan  Tubelight
3  Rani Mukherjee      Black

 After adding two new columns : 
             Actor       Film    First       Last
0   Ranbir Kapoor   Rockstar   Ranbir     Kapoor
1  Hrithik Roshan        War  Hrithik     Roshan
2     Salman Khan  Tubelight   Salman       Khan
3  Rani Mukherjee      Black     Rani     Mukherjee

Here you can see that all the splitted columns are under their respective columns.

Code 3 : Use underscore as the delimiter

First, take the dataset and convert into dataframe and store in df.
Next,write the command as df[[‘First’,’Last’]] = df.Actor.str.split(“_”,expand=True). This means that ‘Actor‘ data will be splitted on the basis of underscore or ‘_‘ and after splitting, will be divided into ‘First‘ and ‘Last‘.
Print df.

# import Pandas 
import pandas as pd 

# create dataframe 
df = pd.DataFrame({'Actor': ['Ranbir_Kapoor', 'Hrithik_Roshan', 'Salman_Khan', 'Rani_Mukherjee'], 
        'Film':['Rockstar', 'War', 'Tubelight', 'Black']}) 

print("Given Dataframe is :\n",df) 

# splitting according underscore. 
df[['First','Last']] = df.Actor.str.split("_",expand=True) 

print("\n After adding two new columns : \n",df)

Output :

Given Dataframe is :
             Actor       Film
0   Ranbir_Kapoor   Rockstar
1  Hrithik_Roshan        War
2     Salman_Khan  Tubelight
3  Rani_Mukherjee      Black

 After adding two new columns : 
             Actor       Film    First       Last
0   Ranbir_Kapoor   Rockstar   Ranbir     Kapoor
1  Hrithik_Roshan        War  Hrithik     Roshan
2     Salman_Khan  Tubelight   Salman       Khan
3  Rani_Mukherjee      Black     Rani    Mukherjee

In the output, you can see that split was done successfully.

Thanks for going through this article. You can also check out the articles given below:

Dividing a Column into Two Columns in Pandas Dataframe

What is str.split() function

Implementation of the Method

Leave a Reply Cancel reply