Fetch Header information from URL using Python
To work on URLs in Python, the urllib package is preferred. Today I will tell you how to fetch header information from a URL using Python. However, it also includes a brief overview of the urllib package.
urllib: URL Handling Module in Python
It is a package that helps one while working with URLs. Let me tell you about the various modules :
- urllib.request lets you open and read the URLs.
urllib.errorgives you information about the errors raised by the request module.urllib.parsehelps you in parsing the URL.urllib.robotparserparses the robots.txt files.
I am going to focus on the urllib.requests module. This will help you open URLs provided by the user, read them, and get information from them.
urlopen function
It is one of the urllib.request functions. It enables you to open any URL.
Code :
from urllib.request import urlopen url = "https://en.wikipedia.org/wiki/National_Basketball_Association" URL_response = urlopen(url)
Fetch Header information from URL using Python
In order to retrieve header information from the URL, I have used the headers function which returns the data in an EmailMessage instance.
Code :
URL_response = urlopen(url) print(URL_response.headers)
Output :
date: Tue, 22 Aug 2023 23:06:13 GMT
vary: Accept-Encoding,Cookie,Authorization
server: ATS/9.1.4
x-content-type-options: nosniff
content-language: en
last-modified: Fri, 18 Aug 2023 10:53:31 GMT
content-type: text/html; charset=UTF-8
age: 30424
x-cache: cp5019 hit, cp5019 hit/41
x-cache-status: hit-front
server-timing: cache;desc="hit-front", host;desc="cp5019"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Last-Access=23-Aug-2023;Path=/;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT
set-cookie: WMF-Last-Access-Global=23-Aug-2023;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT
set-cookie: WMF-DP=5c0;Path=/;HttpOnly;secure;Expires=Wed, 23 Aug 2023 00:00:00 GMT
x-client-ip: 117.197.228.199
cache-control: private, s-maxage=0, max-age=0, must-revalidate
set-cookie: GeoIP=IN:WB:Kolkata:22.52:88.38:v4; Path=/; secure; Domain=.wikipedia.org
set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600
accept-ranges: bytes
content-length: 549872
connection: closeWhen I printed response.headers command it returned sufficient information which includes the date, content-language, connection, server, etc.
Leave a Reply