Remove empty rows from excel file using openpyxl Python

In this tutorial, let’s see how to remove empty rows from an excel file using openpyxl in Python. openpyxl is a popular Python library used in reading or writing an excel file with extensions of xlsx or xlsm or xltx, or xltm. Using this library, users can make changes like modifying data, formatting, styling etc in an excel file using Python instead of manually making those changes. This library is especially useful when we are dealing with an excel file with a huge amount of data.

First, install the openpyxl library in your system using pip.

pip install openpyxl

You can check: How to install openpyxl in Python

Once the library is installed, import it into the python file you are working in.

import openpyxl

Initial Setup

We need an excel file to work on, so create or use an excel file with some rows and columns filled as shown below.

openpyxl sheet

Now load the excel file using its location in the file system in openpyxl.

data = 'data.xlsx' 
wb = openpyxl.load_workbook(data)

 

Select a sheet from the excel file. Here I am using the active sheet as I have only one sheet in the file, and it is the active one.

ws = wb.active

Remove empty rows from an excel file using openpyxl in Python

As we have everything we need, let’s write the code to delete all empty rows from the excel file.

indx = []
for i in range(len(tuple(ws.rows))):
    flag = False
    for cell in tuple(ws.rows)[i]:
        if cell.value != None:
            flag = True
            break
    if flag == False:
        indx.append(i)

indx.sort()
for i in range(len(indx)):
    ws.delete_rows(idx = indx[i]+1-i)

newData = 'newData.xlsx'
wb.save(newData)

In the above code, I declared an empty list called indx to store all empty row indexes. At each iteration of a row, declare a variable called flag with the value False to determine whether to delete the current row or not. Next, iterate through each row and then each cell in the row and check if at least one cell is not empty in the row. If we have at least one nonempty cell, change the value of the flag to True and break out of the loop. After looping through each row, check if the flag is still False. If it is, then add the row index to the indx list.

Here tuple(ws.rows) is used to convert the worksheet into a tuple of tuples where each sub-tuple is a row. It enables us to iterate over the rows and also access them.

Once we added all the empty rows indexes to the list, sort it using the sort() method. If we delete a row in the excel file, then indexes after the deleted row will reduce by one, so by sorting the list, we can get this property for all the elements in the list. Now iterate through the indx list and use the delete_rows() method to delete an empty row. We are passing idx argument in the delete_row() method with the value indx[i]+1-i. We are adding one as the Python list is 0 index-based whereas excel is 1 index-based and subtracting i as with each row deletion index of the next rows will be reduced by one.

At last, we are creating a new excel file called newData using the save() method.

Output:

Remove empty rows from excel file using openpyxl Python

Leave a Reply

Your email address will not be published. Required fields are marked *