在前面一篇文章的日志表中,时间的格式的是这样的
"31/Aug/2015:00:04:37 +0800"
;这样并不友好,为了好看点,我们自定义一个时间格式化的udf函数,hive应该也提供时间转换的函数。
代码
自定义函数还是继承UDF类
package com.madman.hive.function;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
/** * * UDF函数还是老样子.... * * /** * A User-defined function (UDF) for the use with Hive. * * New UDF classes need to inherit from this UDF class. * * Required for all UDF classes: 1. Implement one or more methods named * "evaluate" which will be called by Hive. The following are some examples: * public int evaluate(); public int evaluate(int a); public double evaluate(int * a, double b); public String evaluate(String a, int b, String c); * * "evaluate" should never be a void method. However it can return "null" if * needed. */
public class HiveDateFunction extends UDF {
public Text evaluate(Text time) {
if (time == null) {
return null;
}
if (StringUtils.isBlank(time.toString())) {
return null;
}
String parser = time.toString().replaceAll("\"", "");
SimpleDateFormat inputSimple = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss", Locale.ENGLISH);
SimpleDateFormat outputSimple = new SimpleDateFormat("yyyyMMddHHmmss");
String format = "";
try {
Date parse = inputSimple.parse(parser);
format = outputSimple.format(parse);
System.out.println(format);
} catch (Exception e) {
e.printStackTrace();
return null;
}
return new Text(format);
}
public static void main(String[] args) {
String text = "31/Aug/2015:00:04:37 +0800";
System.out.println(new HiveDateFunction().evaluate(new Text(text)));
System.exit(0);
SimpleDateFormat inputSimple = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss", Locale.ENGLISH);
SimpleDateFormat outputSimple = new SimpleDateFormat("yyyyMMddHHmmss");
try {
Date parse = inputSimple.parse(text);
String format = outputSimple.format(parse);
System.out.println(format);
} catch (Exception e) {
e.printStackTrace();
}
}
}
代码写好之后本地先测试下,是否可行,可行之后打成jar包上传到hive环境中去,然后将jar加入到hive中。
参考命令:
hive (default)> add jar /opt/cdhmoduels/data/hiveDateFunction.jar;
然后创建一个函数,参考命令:
create temporary function hiveDateFunction as 'com.madman.hive.function.HiveDateFunction';
//这里需要制定类的路劲。
调用函数命令:
hive (default)> select hiveDateFunction(time_local) from bf_log limit 10;
结果:
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
_c0
20150831000437
20150831000437
20150831000453
20150831000453
20150831000453
20150831000453
20150831000453
20150831000453
20150831000453
20150831000453
Time taken: 20.954 seconds, Fetched: 10 row(s)
先上传jar包到hive的环境中,然后再定义函数指明类的具体路劲。
hive (default)> add jar /opt/cdhmoduels/data/hiveDateFunction.jar;
create temporary function hiveDateFunction as 'com.madman.hive.function.HiveDateFunction';
测试SQL
hive (default)> select hiveDateFunction(time_local) from bf_log limit 10;
创建函数的时候直接指定类路劲和类所在jar的路劲,这里我是放在hdfs上面了,直接指定了hdfs的路劲。
create temporary function parseDate as 'com.madman.hive.function.HiveDateFunction' using jar 'hdfs://hadoop.madman.com:8020/jar/hiveDateFunction.jar'; 测试SQL hive (default)> select parseDate(time_local) from bf_log limit 10;
本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。