Fetch Header information from URL using Python
To work on URLs in Python, the urllib
package is preferred. Today I will tell you how to fetch header information from a URL using Python. However, it also includes a brief overview of the urllib
package.
urllib: URL Handling Module in Python
It is a package that helps one while working with URLs. Let me tell you about the various modules :
- urllib.request lets you open and read the URLs.
urllib.error
gives you information about the errors raised by the request module.urllib.parse
helps you in parsing the URL.urllib.robotparser
parses the robots.txt files.
I am going to focus on the urllib.requests
module. This will help you open URLs provided by the user, read them, and get information from them.
urlopen function
It is one of the urllib.request
functions. It enables you to open any URL.
Code :
from urllib.request import urlopen url = "https://en.wikipedia.org/wiki/National_Basketball_Association" URL_response = urlopen(url)
Fetch Header information from URL using Python
In order to retrieve header information from the URL, I have used the headers
function which returns the data in an EmailMessage instance.
Code :
URL_response = urlopen(url) print(URL_response.headers)
Output :
date: Tue, 22 Aug 2023 23:06:13 GMT vary: Accept-Encoding,Cookie,Authorization server: ATS/9.1.4 x-content-type-options: nosniff content-language: en last-modified: Fri, 18 Aug 2023 10:53:31 GMT content-type: text/html; charset=UTF-8 age: 30424 x-cache: cp5019 hit, cp5019 hit/41 x-cache-status: hit-front server-timing: cache;desc="hit-front", host;desc="cp5019" strict-transport-security: max-age=106384710; includeSubDomains; preload report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] } nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0} set-cookie: WMF-Last-Access=23-Aug-2023;Path=/;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT set-cookie: WMF-Last-Access-Global=23-Aug-2023;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT set-cookie: WMF-DP=5c0;Path=/;HttpOnly;secure;Expires=Wed, 23 Aug 2023 00:00:00 GMT x-client-ip: 117.197.228.199 cache-control: private, s-maxage=0, max-age=0, must-revalidate set-cookie: GeoIP=IN:WB:Kolkata:22.52:88.38:v4; Path=/; secure; Domain=.wikipedia.org set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600 accept-ranges: bytes content-length: 549872 connection: close
When I printed response.headers
command it returned sufficient information which includes the date, content-language, connection, server, etc.
Leave a Reply