Download all images of a webpage using Python
Whenever you visit any webpage, you may come across different types of content, ranging from text to images, audio to videos. Sometimes, you just want to read the content, catch a glimpse of the information. Other times, you might want to save the information on the page for later reference.
Consider a case where you want to download all the images from a webpage. Individually downloading all of them is not just a lot of manual work but also very time-consuming and inefficient. But guess what, you can solve this by using Python. In this tutorial, you will learn how to download all images of a webpage using Python.
The technique to download all images of a webpage using Python: Web Scraping
Web Scraping is basically a method used for extracting data from various. This data can be in any form-text, image, audio, video etc.
In web scraping, we directly extract the underlying HTML code of the website. You can then use this code to replicate/retrieve the required webpage data.
Now, let us learn how to extract images from the webpage by making use of the above technique, but through python.
Installing necessary modules:
- re– It is the regular expressions module of Python that supports the matching of strings and expressions specified by a set of rules.
- requests-This Python module is used to send HTTP requests to the server.
- bs4– This provides the BeautifulSoup library that enables the extraction of data from HTML/XML files, usually by working with the parser.
A simple code to perform the download:
import re import requests from bs4 import BeautifulSoup site = 'https://www.codespeedy.com/' response = requests.get(site) soup = BeautifulSoup(response.text, 'html.parser') image_tags = soup.find_all('img') urls = [img['src'] for img in image_tags] for url in urls: filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url) if not filename: print("Regular expression didn't match with the url: {}".format(url)) continue with open(filename.group(1), 'wb') as f: if 'http' not in url: url = '{}{}'.format(site, url) response = requests.get(url) f.write(response.content) print("Download complete, downloaded images can be found in current directory!")
Download complete, downloaded images can be found in current directory!
In the above code;
- Firstly, you import all the necessary modules, as mentioned earlier.
- Next, you must specify the address of the webpage from which we want to download all the images.
- You can then send a GET request to the specified URL, requesting resources.
- Once this is done, you can use BeautifulSoup to implement web scraping. This works with the parser and extracts the HTML/XML of the URL. Further, pull out all the image tags from the Html file.
- Once you get the image tags, get out the source attribute of all the images present, which specifies the URL of the image source. You must then iterate through all these source URLs, and also verify their formats.
- Finally, you can write the image file into the current directory, thereby completing the download.
A more detailed code:
A more inclusive code, in which the URL is input explicitly and the images are downloaded and stored in a new folder specified by the user, along with keeping track of the number of images on the site is as follows:
from bs4 import * import requests import os def folder_create(images): folder_name = input("Enter name of folder: ") os.mkdir(folder_name) download_images(images, folder_name) def download_images(images, folder_name): count = 0 print(f"Found {len(images)} images") if len(images) != 0: for i, image in enumerate(images): image_link = image["src"] r = requests.get(image_link).content with open(f"{folder_name}/images{i+1}.jpg", "wb+") as f: f.write(r) count += 1 if count == len(images): print("All the images have been downloaded!") else: print(f" {count} images have been downloaded out of {len(images)}") def main(url): r = requests.get(url) soup = BeautifulSoup(r.text, 'html.parser') images = soup.findAll('img') folder_create(images) url = input("Enter site URL:") main(url)
Enter site URL:https://www.codespeedy.com/ Enter name of folder: abc Found 13 images All the images have been downloaded!
A folder named ABC is created in the current directory and the images are downloaded into that folder.
Leave a Reply