Reading and Parsing a tsv file in python

In this Python tutorial, we are going learn Reading and Parsing a TSV file in Python.

In Python, there are two types of files usually used to load the dataset which is tsv and CSV files. But in this tutorial, we will perform the operation only on tsv file.

We used pandas and NumPy Library for reading the tsv file.

You can download files from Kaggle.

Let’s get started and understand with implementation as well as some examples.

Reading & Parsing tsv file Using Pandas

The path of the Python file and TSV file should be the same.

Code:

import pandas as pd
df = pd.read_csv("movie_characters_metadata.tsv")
print(df)

Explanation:

  1. importing pandas library as ‘pd’.
  2. ‘.read_csv’ is a function that read the file.
  3. ‘df ‘ is used to print output.

In the output image, the last row indicates the total size of the dataset ie file in

[9034 rows x 1 columns] format.

Output:

Reading and Parsing a tsv file in python

Using head() function to read file

If we want to read-only first 10th or 20th values or rows we could use a head()  function.

Code:

import pandas as pd
df = pd.read_csv("movie_characters_metadata.tsv")
print(df.head(10))

Explanation:

  • Here, in the head() function we can pass the required parameter.
  • we passed 10 for reading only the first 10th rows; Whereas in the above output, we read all the rows of a file.

(In the above output there are more than nine thousand rows are present hence compiler or interpreter skips the middle rows which are shown in the example.)

Output:

Using head() function to read file

Using the tail() function to read a file

It is very similar to the head() function in head() function we can read starting rows but using a tail() function we can read the last rows of the dataset.

Code:

import pandas as pd
df = pd.read_csv("movie_characters_metadata.tsv")
print(df.tail(10))

Output:

tail() function

Leave a Reply

Your email address will not be published. Required fields are marked *