hive 动态加载数据到指定分区,以及其他hive使用的技巧


hive修改分隔符:

  1. alter table tableName set SERDEPROPERTIES('field.delim'='\t'); 

hive根据数据创建分区,并且动态加载数据到分区
  1. insert into table device_status_log partition(  date ) 

    select `vin`,`obd_id` , `function_id` , `message_id` ,`message_content` ,
    `longitude`,`latitude` ,`speed` ,`engine_speed` ,`gps_stat`,`client_time`,
    `create_time`,`analytical_result`,regexp_replace( to_date(create_time ) ,'-','') as date 
    from pre_device_status_log ;
如果报如下错误的话
Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
按照提示在hivecCli设置 :set hive.exec.dynamic.partition.mode=nonstrict

Loading data to table obd_message.device_status_log partition (date=null)
         Time taken for load dynamic partitions : 4073
        Loading partition {date=20161020}
        Loading partition {date=20161017}
        Loading partition {date=20161024}
        Loading partition {date=20161021}
        Loading partition {date=20161023}
        Loading partition {date=20161026}
        Loading partition {date=20161015}
        Loading partition {date=20161018}
        Loading partition {date=20161016}
        Loading partition {date=20161019}
        Loading partition {date=20161025}
        Loading partition {date=20161022}
         Time taken for adding to write entity : 6
Partition obd_message.device_status_log{date=20161015} stats: [numFiles=1, numRows=188, totalSize=79565, rawDataSize=79377]
Partition obd_message.device_status_log{date=20161016} stats: [numFiles=1, numRows=648, totalSize=299298, rawDataSize=298650]
Partition obd_message.device_status_log{date=20161017} stats: [numFiles=1, numRows=912, totalSize=414597, rawDataSize=413685]
Partition obd_message.device_status_log{date=20161018} stats: [numFiles=1, numRows=895, totalSize=410935, rawDataSize=410040]
Partition obd_message.device_status_log{date=20161019} stats: [numFiles=1, numRows=1412, totalSize=613903, rawDataSize=612491]
Partition obd_message.device_status_log{date=20161020} stats: [numFiles=1, numRows=475, totalSize=204375, rawDataSize=203900]
Partition obd_message.device_status_log{date=20161021} stats: [numFiles=1, numRows=346, totalSize=142079, rawDataSize=141733]
Partition obd_message.device_status_log{date=20161022} stats: [numFiles=1, numRows=561, totalSize=220711, rawDataSize=220150]
Partition obd_message.device_status_log{date=20161023} stats: [numFiles=1, numRows=856, totalSize=352452, rawDataSize=351596]
Partition obd_message.device_status_log{date=20161024} stats: [numFiles=1, numRows=1997, totalSize=783701, rawDataSize=781704]
Partition obd_message.device_status_log{date=20161025} stats: [numFiles=1, numRows=1384, totalSize=556970, rawDataSize=555586]
Partition obd_message.device_status_log{date=20161026} stats: [numFiles=1, numRows=326, totalSize=133275, rawDataSize=132949]


hive查看分区

  1. show partitions  device_status_log ;
hive正则匹配去除指定分隔符:
create_time 类型为2016-10-10 00:00:00
  1. regexp_replace( to_date(create_time ) ,'-','') as date 


hive 时间函数 添加分钟或者秒
  1. from_unixtime(unix_timestamp(client_time) + 8*3600 ) as client_time 
hive 自带的时间 函数 有date_add(   ) 但是只能对天进行增加减少
  1. date        date(       date_add(   date_sub(   datediff(   datetime    

有些tips
创建hiveInit.sh
编辑内容如下 (此处的目的是为了能够尽量让job在本地执行,缩短等待时间,方便调试):
SET mapred.job.tracker=local;
set mapred.reduce.tasks = 1;
set hive.exec.mode.local.auto.input.files.max=1000;
set hive.exec.mode.local.auto.inputbytes.max=50000000;
set hive.exec.mode.local.auto.tasks.max=10;
set hive.exec.mode.local.auto=true;
set hive.cli.print.current.db=true;
set hive.cli.print.header=true;
show databases;
use obd_message;




在编辑 hiveStart.sh

hive -i hiveInit.sh

然后修改执行权限 在当前目录执行 ./hiveStart.sh  就能以指定的配置启动hiveClient





智能推荐

注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告