How to Load CSV data in TensorFlow | Python

Hey there everyone, Today we will learn how to load a CSV file data using TensorFlow in Python. For this tutorial, we are going to use Tensorflow 2.1. We will be loading a ‘.csv’ file that contains values for the area of land and their corresponding prices.
So, let’s get started.

Python code to loading CSV data in TensorFlow

Let’s first import TensorFlow and check its version.

import tensorflow as tf
tf. __version__

OUTPUT:

'2.1.0'

Importing other required libraries.

import numpy as np
import pandas as pd

The contents of our ‘.csv’ file.

!head {'file.csv'}

OUTPUT:

area,prices
1000,316404.1095890411
1500,384297.9452054794
2300,492928.0821917808
3540,661304.794520548
4120,740061.6438356165
4560,799808.2191780822
5490,926090.7534246575
3460,650441.7808219178
4750,825607.8767123288

Now, let’s have a look at different ways for loading CSV data.

Example 1:

Using Dataset.from_tensor_slices, this method works on dictionaries and allows us to easily import our data.

#dataframe
df = pd.read_csv('file.csv', index_col=None)
df.head(10)

OUTPUT:

df_slices = tf.data.Dataset.from_tensor_slices(dict(df))

for features in df_slices.take(10):
  for df_key, df_value in features.items():
    print(f"{df_key}  :  {df_value}")

OUTPUT:

area  :  1000
prices  :  316404.1095890411
area  :  1500
prices  :  384297.9452054794
area  :  2300
prices  :  492928.0821917808
area  :  3540
prices  :  661304.794520548
area  :  4120
prices  :  740061.6438356165
area  :  4560
prices  :  799808.2191780822
area  :  5490
prices  :  926090.7534246576
area  :  3460
prices  :  650441.7808219178
area  :  4750
prices  :  825607.8767123288
area  :  2300
prices  :  492928.0821917808

Example 2:

Another way of loading our CSV data is by using experimental.make_csv_dataset, this function is a high-level interface that allows us to read sets of CSV files. It also supports features like batching and shuffling that makes its usage simpler, It also supports column type inference.

data= tf.data.experimental.make_csv_dataset('file.csv', batch_size=4, label_name="area")
for features, labels in data.take(1):
  print("'area': {}".format(labels))
  for data_key, data_value in features.items():
    print(f"{data_key}     :    {data_value}")

 

OUTPUT:

'area': [3460 2300 2300 3540]
prices     :    [650441.75 492928.1  492928.1  661304.8 ]

Example 3:

There is a lower-level class experimental.CsvDataset, which provides finer-grained control. But, this does not support column type inference.

col_types  = [tf.int32, tf.float32] 
dataset = tf.data.experimental.CsvDataset('file.csv', col_types , header=True)

dataset

OUTPUT:

<CsvDatasetV2 shapes: ((), ()), types: (tf.int32, tf.float32)>
for x in dataset.take(10):
  print([y.numpy() for y in x])

OUTPUT:

[1000, 316404.12]
[1500, 384297.94]
[2300, 492928.1]
[3540, 661304.8]
[4120, 740061.6]
[4560, 799808.25]
[5490, 926090.75]
[3460, 650441.75]
[4750, 825607.9]
[2300, 492928.1]

So, these were the different ways of loading our CSV data using TensorFlow.

Leave a Reply

Your email address will not be published. Required fields are marked *