How to iterate over files in a given directory in Python
In this tutorial, I’ll share some techniques to iterate over files in a given directory and perform some actions in Python. There are several ways to iterate over files in Python, let me discuss some of them:
Using os.scandir() function
Since Python 3.5, we have a function called scandir() that is included in the os module. By using this function we can easily scan the files in a given directory. It only lists files or directories immediately under a given directory. It doesn’t list all the files/directories recursively under a given directory.
Let us take an example to understand the concept :
Suppose I want to list the .exe and .pdf files from a specific directory in Python.
#importing os module import os #providing the path of the directory #r = raw string literal dirloc = r"C:\Users\sourav\Downloads" #calling scandir() function for file in os.scandir(dirloc): if (file.path.endswith(".exe") or file.path.endswith(".pdf")) and file.is_file(): print(file.path)
It’ll print the path of the .exe and .pdf files present immediately in the given directory.
Using os.listdir() function
It’ll also return a list of files immediately present in a given directory. Like os.scandir() function it also doesn’t work recursively.
Let us take one example to understand the concept :
Suppose I want to list the .iso and .png files from a specific directory.
#importing os module import os #providing the path of the directory #r = raw string literal dirloc = r"C:\Users\sourav\Downloads" #calling listdir() fucntion for file in os.listdir(dirloc): if file.endswith(".iso") or file.endswith(".png"): print(os.path.join(dirloc, file)) else: continue
It’ll also print the path of the .iso and .png files present immediately in the given directory.
To list the files and folders recursively in a given directory, please use below methods
Using os.walk() function
This function is also included in the os module. This function will iterate over all the files immediately as well as it’ll iterate over all the descendant files present in the subdirectories in a given directory.
Let us take an example to understand the concept :
Suppose I want to list the .mp3 and .png files from a specific directory.
#importing os module import os #calling os.walk() function #r = raw string literal #os.path.sep = path separator for subdirectories, directories, files in os.walk(r'C:\Users\sourav\Downloads'): for file_name in files: file_loc = subdirectories + os.path.sep + file_name #printing .mp3 and .jpg files recursively if file_loc.endswith(".mp3") or file_loc.endswith(".jpg"): print (file_loc)
It’ll print the list of the files present in the given directory recursively.
Using glob.iglob() function
In glob module, we’ve iglob() function. We can use this glob.iglob() function for printing all the files recursively as well as immediately under a given directory.
Let us take one example to understand the concept :
Suppose I want to list the .zip and .exe files immediately from a specific directory.
#importing glob module import glob #printing zip files present in the directory #r = raw string literal for fileloc in glob.iglob(r'C:\Users\sourav\Downloads\*.zip'): print(fileloc) #printing exe files present in the directory #r = raw string literal for fileloc in glob.iglob(r'C:\Users\sourav\Downloads\*.exe'): print(fileloc) #Note :- It'll print the files immediately not recursively
As I said in the code, it’ll not print the files recursively but immediately. The glob module supports “**” directive, but to use this we have to pass the recursive = True parameter.
Let us take one more example to understand this concept:
Suppose I want to list all the .zip and .exe files recursively from a specific directory.
#importing glob module import glob #printing zip files present in the directory #r = raw string literal #we have to use the recursive=True parameter for recursive iteration #we have to use "\**\*" at the end of the directory path for recursive iteration for fileloc in glob.iglob(r'C:\Users\sourav\Downloads\**\*.zip',recursive=True): print(fileloc) #printing exe files present in the directory #r = raw string literal for fileloc in glob.iglob(r'C:\Users\sourav\Downloads\**\*.exe',recursive=True): print(fileloc)
Using Path function from pathlib module
By using Path function from pathlib module, we can also iterate over files recursively under a specified directory and list them.
Let us take an example to understand the concept:
Suppose I want to list all the .exe files recursively from a specific directory.
#importing Path function from pathlib module from pathlib import Path #providing the path of the directory #r = raw string literal locations = Path(r'C:\Users\sourav\Downloads').glob('**/*.exe') for loc in locations: #loc is object not string location_in_string = str(loc) print(location_in_string)
It’ll print the path of the .exe files present in the given directory recursively.
I hope now you’re familiar with the concept of how to iterate over files in a given directory in Python.
Also read:
Leave a Reply