Fetch Header information from URL using Python

To work on URLs in Python, the urllib package is preferred. Today I will tell you how to fetch header information from a URL using Python. However, it also includes a brief overview of the urllib package.

urllib: URL Handling Module in Python

It is a package that helps one while working with URLs. Let me tell you about the various modules :

  • urllib.request lets you open and read the URLs.
  • urllib.error gives you information about the errors raised by the request module.
  • urllib.parse helps you in parsing the URL.
  • urllib.robotparser parses the robots.txt files.

I am going to focus on the urllib.requests module. This will help you open URLs provided by the user, read them, and get information from them.

urlopen function

It is one of the urllib.request functions. It enables you to open any URL.

Code

from urllib.request import urlopen

url = "https://en.wikipedia.org/wiki/National_Basketball_Association"

URL_response = urlopen(url)

Fetch Header information from URL using Python

In order to retrieve header information from the URL, I have used the headers function which returns the data in an EmailMessage instance.

Code

URL_response = urlopen(url)
print(URL_response.headers)

Output

date: Tue, 22 Aug 2023 23:06:13 GMT
vary: Accept-Encoding,Cookie,Authorization
server: ATS/9.1.4
x-content-type-options: nosniff
content-language: en
last-modified: Fri, 18 Aug 2023 10:53:31 GMT
content-type: text/html; charset=UTF-8
age: 30424
x-cache: cp5019 hit, cp5019 hit/41
x-cache-status: hit-front
server-timing: cache;desc="hit-front", host;desc="cp5019"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
set-cookie: WMF-Last-Access=23-Aug-2023;Path=/;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT
set-cookie: WMF-Last-Access-Global=23-Aug-2023;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sun, 24 Sep 2023 00:00:00 GMT
set-cookie: WMF-DP=5c0;Path=/;HttpOnly;secure;Expires=Wed, 23 Aug 2023 00:00:00 GMT
x-client-ip: 117.197.228.199
cache-control: private, s-maxage=0, max-age=0, must-revalidate
set-cookie: GeoIP=IN:WB:Kolkata:22.52:88.38:v4; Path=/; secure; Domain=.wikipedia.org
set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600
accept-ranges: bytes
content-length: 549872
connection: close

When I printed response.headers  command it returned sufficient information which includes the date, content-language, connection, server, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *