Append a Table to an Existing HDF File – Python

Post Views: 456

Hey fellow Python coder! In this tutorial, we will understand HDF files and learn how to add a new Table to an existing HDF file using Python programming. We will be covering the following topics in this tutorial:

Introduction to HDF Files
- Advantages of HDF Files
Creating a new HDF5 File
- Writing Data to HDF File
- Reading Data from HDF File
Learning how to append Tables to HDF Files
- Importance of tables in HDF Files.
- Code Implementation to append Tables to HDF Files

Introduction to HDF Files

HDF stands for Hierarchical Data Format. In Python, it’s a file format used to store and organize huge amounts of data. They are commonly used in technologies like data analysis because they can handle complex and large data structures easily and efficiently.

Some of the advantages of using HDF files are as follows:

These files support a variety of data types be it simple or complex data structures and datasets.
These files can be easily shared across different platforms and programming languages.
These files can handle datasets of any size, be it very small to very large datasets.
These files can be efficiently used to store and access large datasets very well.

Creating a new `HDF5` File

To operate with HDF5 files, we will make use of the h5py library in Python. If the same is not installed in the system then do the same using the pip command and then create a new file using the code snippet below. To create a new file, we will open the file in write mode.

import h5py
newHDF_File = h5py.File('codespeedy.h5', 'w')

Writing Data to HDF File

After the creation of the file, let’s add some data to the file using the simple code snippet below. For this tutorial, let’s just add a 10×10 matrix using the numpy module and create a new dataset using the sample data in the HDF file.

import numpy as np
sampleData = np.random.rand(5, 5)
DATA = newHDF_File.create_dataset('Sample_Data', data=sampleData)

Reading Data from HDF File

To read the dataset from aHDF file, first, we will open the file created in reading form and then load the Sample_Data from the file and then print the data read. Have a look at the code snippet below.

HDF_File_Read = h5py.File('codespeedy.h5', 'r')

ReadDataset = HDF_File_Read['Sample_Data']
print(ReadDataset)

But this will result in output as this:

<HDF5 dataset "Sample_Data": shape (5, 5), type "<f8">

This is not what we wanted so let’s load the data stored inside using the slicing technique as displayed in the code below.

print(ReadDataset[:])

This will result in the output below:

[[0.31633171 0.22529574 0.32318286 0.64615849 0.2032933 ]
 [0.35067536 0.53707505 0.39550888 0.96691937 0.69121246]
 [0.42437762 0.74815193 0.48438767 0.95991042 0.5048299 ]
 [0.33808803 0.83252701 0.91295353 0.47600387 0.023969  ]
 [0.38987256 0.3877606  0.87817955 0.85761328 0.30519223]]

Now we are good to go for reading and writing the HDF files in the sections above. Let’s move on to the next sections where we will learn appending tables to the file. But before moving to making any amends let’s close both the originally opened modes of the file.

newHDF_File.close()
HDF_File_Read.close()

Learning how to append Tables to `HDF` Files

Inserting Tables in HDF files help a developer organize data in a structured way. There are multiple advantages of appending a table to an HDF file, some of them are listed below:

Tables allow you to organize data into rows and columns, making it easy to read, understand, and work with.
Adding Tables supports developers to operate on various queries allowing them to get subsets of datasets based on a condition.

Code Implementation to append Tables to HDF Files

So until now, we have this code with us, which creates and writes a basic dataset on the file.

import h5py
import numpy as np
newHDF_File = h5py.File('codespeedy.h5', 'w')
sampleData = np.random.rand(5, 5)
DATA = newHDF_File.create_dataset('Sample_Data', data=sampleData)
HDF_File_Read = h5py.File('codespeedy.h5', 'r')
ReadDataset = HDF_File_Read['Sample_Data']
print(ReadDataset[:])
newHDF_File.close()
HDF_File_Read.close()

Now we will create tabular data using pandas data frame and then put the table in the HDF file with the help of opening the file in “append” mode. Have a look at the code snippet below. We will create a data frame with two columns and then store the data in Tabular format.

import pandas as pd
df = pd.DataFrame({'C1': np.random.rand(5), 'C2': np.random.rand(5)})
with pd.HDFStore("codespeedy.h5", 'a') as store:
    store.put('Table_Data', df, format='table')

We will then read the data differently this time around. Have a look at the code snippet below:

with pd.HDFStore("codespeedy.h5", 'r') as store:
    df_read = store['Table_Data']
    print(df_read)

The resulting output comes out to be like this:

         C1        C2
0  0.806708  0.285298
1  0.906525  0.580559
2  0.123885  0.400370
3  0.218696  0.154856
4  0.120232  0.683300

See how simple it is to create, read, and store tables in HDF files. I hope you liked this tutorial.

Also Read: Handling large datasets with HDF5

Happy Learning!