JSON to Pandas DataFrame in Python

Hey fellow Python coder! In this tutorial, we will be exploring the concept of modifying JSON data in a way that it gets converted to a Pandas DataFrame.

Before explaining further, let’s get right on it!

What is JSON Data?

JSON stands for “JavaScript Object Notation“, and is lightweight data that is formatted in a way that it can be easily exchanged between one computer to another. JSON uses a simple text format where there are many key-value pairs. Each key is the label for data, and the value is the actual data corresponding to the key.

Also Read: Reverse Rows in Pandas DataFrame in Python

Code Implementation of JSON to Pandas DataFrame

The first step is to import the Pandas as well as JSON library into your code as everything starts off by importing the necessary modules.

import pandas as pd
import json

Creating a JSON Dataset

We will start by creating a JSON dataset. For the dataset, let’s make things a bit fun and use the concept of  ‘Games’. If you are a gamer I am sure you might be already aware of all the games I list in my code.

JSON data consists of a list with nested dictionaries (key-value pairs) where each dictionary will represent data for a particular Game. I will be using some general information about the games such as their names, release year, and Genre. We will also include the Protagonist and the Antagonists of the Games.

json_GameData = [
  {
    "Game": "Tomb Raider",
    "ReleaseYear": 2013,
    "Genre": "Action-Adventure",
    "Protagonist": "Lara Croft",
    "Antagonists": ["Himiko", "Mathias", "Trinity"]
  },
  {
    "Game": "Life is Strange",
    "ReleaseYear": 2015,
    "Genre": "Graphic Adventure",
    "Protagonist": "Max Caulfield",
    "Antagonists": ["Mark Jefferson", "Nathan Prescott"]
  },
  {
    "Game": "Uncharted",
    "ReleaseYear": 2007,
    "Genre": "Action-Adventure, Third-Person Shooter",
    "Protagonist": "Nathan Drake",
    "Antagonists": ["Gabriel Roman", "Atoq Navarro", "Zoran Lazarević"]
  },
  {
    "Game": "Resident Evil",
    "ReleaseYear": 1996,
    "Genre": "Survival Horror",
    "Protagonist": "Various",
    "Antagonists": ["Albert Wesker", "Nemesis", "Umbrella Corporation"]
  },
  {
    "Game": "Hitman",
    "ReleaseYear": 2016,
    "Genre": "Stealth",
    "Protagonist": "Agent 47",
    "Antagonists": ["Erich Soders", "Providence"]
  }
]

I have this whole dataset. I have obviously taken some help from the internet to get the relevant name and data. You can include more games or modify data according to your preferences! Now let’s jump in to convert the dataset to Pandas DataFrame.

Importing a JSON Dataset from an External .json File

Now that we are getting our JSON data, let’s also consider a case where we have an external JSON file with .json extension and we wish to import the same into your code using Python programming language. We can do the same with the help of the json library.

First of all, let’s see what the data looks like in an external file. Have a look below at how the same data looks inside the .json file in my jupyter notebook browser.

JSON to Pandas DataFrame in Python

Now to import the data we will make use of the open and load function where the open simply opens the file for you to operate on and the load function will load the data from the file. Have a look at the code below. This will generate the same dataset into the same variable called json_GameData.

import json

with open("GamingData.json", 'r') as file:
    json_GameData = json.load(file)

print(json_GameData)

Just in case the source of your JSON data is a URL then you can refer to the following post: Parse JSON from URL in Python

Converting JSON to DataFrame

Using the power of the Pandas library, there are many ways to convert JSON data to a Pandas DataFrame. Let me list the methods available (we all know when we see pd it means pandas library):

  1. pd.DataFrame()
  2. pd.json_normalize()

Using pd.DataFrame() function

The DataFrame() function is a general function that converts a variety of data types to a DataFrame and in this case, our original data being passed is JSON data. The head()function will display the first 5 rows by default.

game_DataFrame = pd.DataFrame(json_GameData)
game_DataFrame.head()

The output of the data frame looks like shown below after execution. You see how much more organized the data looks in a tabular form in the image.

Using pd.DataFrame() function

Using pd.json_normalize() function

The json_normalizefunction might be a new function you have encountered that is used to convert JSON data into a flat table-like structure. Let’s check how the function works!

game_DataFrame = pd.json_normalize(json_GameData)
game_DataFrame.head()

The output of the data frame looks like shown below after execution.

JSON to Pandas DataFrame in Python

In both methods, the resulting DataFrame has the columns same as the keys in the JSON data, and each row represents information about the game.

Conclusion

In conclusion, converting JSON data into a Pandas DataFrame can be achieved easily with the help of either the pd.DataFrame() or the pd.json_normalize() function. Hope now whenever you see JSON data and you are asked to convert it to a DataFrame, you won’t sit there confused!

Also Read:

  1. Dictionary to Pandas DataFrame in Python
  2. Print Pandas DataFrames without Index in Python
  3. Select rows from Pandas Dataframe Based On Column Values

Happy Coding!

Leave a Reply

Your email address will not be published. Required fields are marked *