大数据培训_HIve常用的基本语句总结

发布时间： 2019-06-27 14:54:58

　　大数据培训_HIve常用的基本语句总结

　　本次记录hive当中最常用的一些语句，助你在hive操作能快速入手。通过常用的SQL语句能对Hive更加理解。本次实例都是一些针对表的管理。包括外部表，内部表，分区表，分桶表。

大数据培训

简单的hive sql语句

　　#查询数据库
　　show databases;
　　#查询表
　　show tables;
　　#使用数据库
　　use database_name;
　　#查看表结构
　　desc table_name;
　　#删除表
　　drop table table_name;

创建表

格式：

　　CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
　　[(col_name data_type [COMMENT col_comment], ...)]
　　[COMMENT table_comment]
　　[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
　　[CLUSTERED BY (col_name, col_name, ...)
　　[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
　　[ROW FORMAT row_format]
　　[STORED AS file_format]
　　[LOCATION hdfs_path]

　　# 字段解释：
　　[external] # 申明是否为外部表

　　[if not exists] # 如果表存在，则不创建了

　　[(col_name data_type [COMMENT col_comment], ...)] # 分别是字段名，字段类型，字段注释

　　[COMMENT table_comment] # 放在()后面，是表注释

　　[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] # 建立分区，()里是字段名，字段类型，字段注释，分区里的字段不能包含在字段声明区。

　　[CLUSTERED BY (col_name, col_name, ...)
　　 [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] # 在clustered by 里的字段上建立 num_buckets个桶，记录是由 sorted by里的字段排序的。

　　[ROW FORMAT row_format] # 指定分隔符，可以是以下几个：
: DELIMITED [FIELDS TERMINATED BY char]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]

　　[STORED AS file_format] # 指定load文件的类型，分为：
textfile 是纯文本文件
sequence 是压缩文件

　　[LOCATION hdfs_path] # 向表里加载数据，hdfs_path是一个hdfs上的目录，不能是文件，hive会依据默认配置的hdfs路径，自动将整个目录下的文件都加载到表中。

案例实操

（1）普通创建表

　　create table if not exists student2(
　　id int, name string
　　)
　　row format delimited fields terminated by '\t'
　　stored as textfile
　　location '/user/hive/warehouse/student2';

（2）根据查询结果创建表（查询的结果会添加到新创建的表中）

　　create table if not exists student3
　　as select id, name from student;

（3）根据已经存在的表结构创建表

　　create table if not exists student4 like student;

（4）查询表的类型
　　hive (default)> desc formatted student2;
　　Table Type: MANAGED_TABLE

外部表
1）理论
因为表是外部表，所有 Hive 并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据，不过描述表的元数据信息会被删除掉。
2）管理表和外部表的使用场景：
每天将收集到的网站日志定期流入 HDFS 文本文件。在外部表（原始日志表）的基础上做大量的统计分析，用到的中间表、结果表使用内部表存储，数据通过 SELECT+INSERT进入内部表

使用复杂类型创建表

　　create external table if not exists T2(
　　 id int,
　　 course array<string>,
　　 score map<string,int>
　　)
　　row format
　　delimited fields terminated by ','
　　 collection items terminated by '|'
　　 map keys terminated by ':'
　　stored as textfile;
　　# 数据文件内容
　　1001,语文|数学|英语,语文|56,语文:102|数学:2033|英语:30
　　1002,语文|数学|英语,语文|156,语文:120|数学:2033|英语:30
　　1003,语文|数学|英语,语文|1156,语文:210|数学:3320|英语:30
　　1004,语文|数学|英语,语文|1156,语文:2210|数学:203|英语:30
　　1005,语文|数学|英语,语文|5116,语文:22210|数学:230|英语:30
　　# 导入数据文件
　　load data local inpath '/home/datanode/hiveTest/test01' overwrite into table t2;

创建一个带分区的内部表

　　create table if not exists T3(
　　 id int,
　　 name string
　　)
　　partitioned by (classid int)
　　row format
　　delimited fields terminated by ','
　　stored as textfile;

创建一个带桶的内部表

　　create table T4(
　　 id int ,
　　 name string,
　　 sex string,
　　 age int
　　)
　　partitioned by (city string)
　　clustered by(age) sorted by(name) into 5 buckets
　　row format
　　delimited fields terminated by ','
　　stored as textfile;