Append a Table to an Existing HDF File – Python
Hey fellow Python coder! In this tutorial, we will understand HDF files and learn how to add a new Table to an existing HDF file using Python programming. We will be covering the following topics in this tutorial:
- Introduction to HDF Files
- Advantages of HDF Files
- Creating a new HDF5 File
- Writing Data to HDF File
- Reading Data from HDF File
- Learning how to append Tables to HDF Files
- Importance of tables in HDF Files.
- Code Implementation to append Tables to HDF Files
Introduction to HDF Files
HDF
stands for Hierarchical Data Format. In Python, it’s a file format used to store and organize huge amounts of data. They are commonly used in technologies like data analysis because they can handle complex and large data structures easily and efficiently.Some of the advantages of using HDF files are as follows:
- These files support a variety of data types be it simple or complex data structures and datasets.
- These files can be easily shared across different platforms and programming languages.
- These files can handle datasets of any size, be it very small to very large datasets.
- These files can be efficiently used to store and access large datasets very well.
Creating a new HDF5
File
To operate with HDF5 files, we will make use of the h5py library in Python. If the same is not installed in the system then do the same using the pip
command and then create a new file using the code snippet below. To create a new file, we will open the file in write mode.
import h5py newHDF_File = h5py.File('codespeedy.h5', 'w')
Writing Data to HDF File
After the creation of the file, let’s add some data to the file using the simple code snippet below. For this tutorial, let’s just add a 10×10 matrix using the numpy
module and create a new dataset using the sample data in the HDF file.
import numpy as np sampleData = np.random.rand(5, 5) DATA = newHDF_File.create_dataset('Sample_Data', data=sampleData)
Reading Data from HDF File
To read the dataset from aHDF
file, first, we will open the file created in reading form and then load the Sample_Data
from the file and then print the data read. Have a look at the code snippet below.
HDF_File_Read = h5py.File('codespeedy.h5', 'r') ReadDataset = HDF_File_Read['Sample_Data'] print(ReadDataset)
But this will result in output as this:
<HDF5 dataset "Sample_Data": shape (5, 5), type "<f8">
This is not what we wanted so let’s load the data stored inside using the slicing technique as displayed in the code below.
print(ReadDataset[:])
This will result in the output below:
[[0.31633171 0.22529574 0.32318286 0.64615849 0.2032933 ] [0.35067536 0.53707505 0.39550888 0.96691937 0.69121246] [0.42437762 0.74815193 0.48438767 0.95991042 0.5048299 ] [0.33808803 0.83252701 0.91295353 0.47600387 0.023969 ] [0.38987256 0.3877606 0.87817955 0.85761328 0.30519223]]
Now we are good to go for reading and writing the HDF
files in the sections above. Let’s move on to the next sections where we will learn appending tables to the file. But before moving to making any amends let’s close both the originally opened modes of the file.
newHDF_File.close() HDF_File_Read.close()
Learning how to append Tables to HDF
Files
HDF
files help a developer organize data in a structured way. There are multiple advantages of appending a table to an HDF file, some of them are listed below:- Tables allow you to organize data into rows and columns, making it easy to read, understand, and work with.
- Adding Tables supports developers to operate on various queries allowing them to get subsets of datasets based on a condition.
Code Implementation to append Tables to HDF Files
So until now, we have this code with us, which creates and writes a basic dataset on the file.
import h5py import numpy as np newHDF_File = h5py.File('codespeedy.h5', 'w') sampleData = np.random.rand(5, 5) DATA = newHDF_File.create_dataset('Sample_Data', data=sampleData) HDF_File_Read = h5py.File('codespeedy.h5', 'r') ReadDataset = HDF_File_Read['Sample_Data'] print(ReadDataset[:]) newHDF_File.close() HDF_File_Read.close()
Now we will create tabular data using pandas data frame and then put the table in the HDF
file with the help of opening the file in “append” mode. Have a look at the code snippet below. We will create a data frame with two columns and then store the data in Tabular format.
import pandas as pd df = pd.DataFrame({'C1': np.random.rand(5), 'C2': np.random.rand(5)}) with pd.HDFStore("codespeedy.h5", 'a') as store: store.put('Table_Data', df, format='table')
We will then read the data differently this time around. Have a look at the code snippet below:
with pd.HDFStore("codespeedy.h5", 'r') as store: df_read = store['Table_Data'] print(df_read)
The resulting output comes out to be like this:
C1 C2 0 0.806708 0.285298 1 0.906525 0.580559 2 0.123885 0.400370 3 0.218696 0.154856 4 0.120232 0.683300
See how simple it is to create, read, and store tables in HDF files. I hope you liked this tutorial.
Also Read: Handling large datasets with HDF5
Happy Learning!
Leave a Reply