User Defined Functions for Hive in Java

Post Views: 838

In this tutorial we will learn how to implement custom functions in the form of UDF ( User defined function ) for hive in Java.

Hive Extensibility & Big Data

Since the inception of big data technologies, Hive is a form of non-traditional RDBMS over the MapReduce framework initially developed by Facebook Inc to store large chunks of less used and older data.

Generally, older data from MySQL tables are pushed to hive tables based on some fixed interval of time defined and the amount of data produced. Hive has a nearly SQL type format in terms of executing queries with the main difference of non-updation and batch processing capabilities which are present only in Hive.

Write User Defined Functions for Hive in Java

Hive is a batch processing tools so that individual entries can neither be updated nor be deleted. Only whole partition batches can be deleted. This makes it quite different from MySQL.Hive contains some 200 defined functions which can be seen by writing in the hive console :

hive> show functions;

Specific functions and their details can be seen by :

hive> describe function function_name;
Or by 
hive> describe function extended function_name;

For more custom functions, Hive comes with extensibility in form of UDF(User Defined Functions), UDAF(User Defined Aggregate Functions) and UDTF(User Defined Transfer Functions like explode etc).UDFs for Hive can only be written in Java. Here I will take an example of a UDF which when used shall help us to strip one string of another string for every tuple.

We need to extend the UDF class present in org.apache.hadoop.hive.ql.exec.UDF package. We will override the evaluate() function and then strip the first passed string of second passed string and return the new string for every tuple in the table.

package hiveUDFExample;
import org.apache.commons.lang.*;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.*;
public class Strip extends UDF{
  private Text result = new Text();
   public Text evaluate(Text str, String stripChars) {
   if(str == null) {
   return null;
   }
   result.set(StringUtils.strip(str.toString(), stripChars));proC7
   return result;
   }
   public Text evaluate(Text str) {
   if(str == null) {
   return null;
   }
   result.set(StringUtils.strip(str.toString()));
   return result;
   }
}

We will now create a jar file containing this Strip class using any method (by Eclipse, Maven etc). We will then add the jar in our hive console :

hive> add jar jar_name_with_path ;

Then we will create a temporary function named Strip :

hive> create temporary function on Strip as 'hiveUDFExample.Strip';

Check for the added jars :

hive> list jars;

Then use the Strip function in the hive console made earlier to get results. Eg:-

hive> select Strip('hadoop','ha') from table_name;

This prints ‘doop’ number of times of tuples in the table table_name.

Also, read:

User Defined Functions for Hive in Java

Write User Defined Functions for Hive in Java

Leave a Reply Cancel reply