Convert HTML to JSON in Python

Hello friends, today we are going to learn how to convert HTML to JSON using Python. HTML or Hypertext Markup Language is a standard markup language for documents to be designed in a way to view on web browsers. It describes the meaning and structure of the content. It uses markup to interpret text, images, tables and other elements that can be viewed on the web browser. Some of the special elements of HTML are <head>, <title>, <body>, <header>, etc. HTML elements are case-sensitive. JSON or JavaScript Object Notation is a standard text format used to represent structured data objects in the JavaScript object syntax. It is mainly used to convey data in web applications. It is a text format, self-explanatory and used for storing and transferring data.

Convert HTML to JSON with html-json package

We are now going to convert the HTML string to JSON format. Firstly, open your command prompt in Windows or terminal in macOS/linux.  Change your directory to the one where you’ve stored Python. Run a command to install html-to-json dependency which would help you perform this task.

 

pip install html-to-json

Import the html-to-json module to your code.

import html_to_json

Here is an example, where I’ve taken a multi line string in a temporary variable html_Str. Now call the convert() function to convert HTML string to JSON using Python. Pass your temporary variable, html_Str in the example as an argument to the convert function.

Code :

import html_to_json
import json

html_Str = """<head>
    <title>This is Codespeedy</title>
    <meta charset="UTF-8">
    <meta name="description" content="We are software development & app development company">
    </head>"""
json_Str = html_to_json.convert(html_Str)

formatted_Str = json.dumps(json_Str, indent = 2)


print(formatted_Str)

The convert() function returns the equivalent JSON string.

Output :

{
  "head": [
    {
      "title": [
        {
          "_value": "This is Codespeedy"
        }
      ],
      "meta": [
        {
          "_attributes": {
            "charset": "UTF-8"
          }
        },
        {
          "_attributes": {
            "name": "description",
            "content": "We are software development & app development company"
          }
        }
      ]
    }
  ]
}

I’ve included the json module and its dumps() function in the example code. Here, the dumps function takes the equivalent json string and indent value as an argument.

Leave a Reply

Your email address will not be published. Required fields are marked *