irpas技术客

【Hive】基本建库、建表操作_骑着蜗牛ひ追导弹'_hive 创建库

未知 8317


文章目录 一、Hive 数据仓库的操作1、创建数据仓库2、查看 db 数据仓库的信息及路径3、删除 db 数据仓库 二、Hive 数据表的操作1、创建内部表2、创建外部表3、修改表结构4、删除表5、创建同结构表


环境准备

Hadoop 完全分布式(一主两从即可)MySQL环境、Hive环境
一、Hive 数据仓库的操作

验证 hadoop 集群、mysql服务均已启动:

在 Hive 安装目录下的 bin 目录下启动: ./hive :

[root@server bin]# ./hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>
1、创建数据仓库 // 小技巧:显示当前所使用的数据库名 hive> set hive.cli.print.current.db=true // 创建db数据库 hive (test)> create database if not exists db; OK Time taken: 0.011 seconds // 使用db数据库 hive (test)> use db; OK Time taken: 0.021 seconds

在 Hive 中创建一个 db 数据仓库。在创建时,为了避免新建的库名与已有的库名重复,创建的时候添加 if not exists 如果库名重复将会报错:FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db already exists,


2、查看 db 数据仓库的信息及路径 hive (db)> describe database db; OK db hdfs://192.168.64.183:9000/user/hive/warehouse/db.db root USER Time taken: 0.329 seconds, Fetched: 1 row(s)
3、删除 db 数据仓库 // 删除数据仓库 hive (db)> drop database if exists db; Sun Mar 06 09:21:29 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. Sun Mar 06 09:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. Sun Mar 06 09:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. Sun Mar 06 09:21:30 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. OK Time taken: 0.438 seconds // 再次查看所有数据仓库 hive (db)> show databases; OK default test Time taken: 0.025 seconds, Fetched: 2 row(s)

返回顶部


二、Hive 数据表的操作

Hive 的数据表分为两种:内部表 和 外部表。

Hive 创建内部表的时候,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不会对数据的位置做出任何改变。

在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。这样外部表相对来说更加安全些,数据组织也更加灵活,方便共享源数据,生产中常使用外部表。


1、创建内部表

查看已存在的表(当前数据仓库中没有表):

hive (db)> show tables; OK Time taken: 0.026 seconds

创建一个名为 cat 的内部表,包含有两个字段:cat_id、cat_name,字符类型为 string:

hive (db)> create table cat(cat_id string,cat_name string); OK Time taken: 1.046 seconds hive (db)> show tables; OK cat Time taken: 0.02 seconds, Fetched: 1 row(s)

如果 cat 内部表已经存在,会报错:FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table cat already exists)


2、创建外部表

创建一个名为 cat1 的外部表,包含有两个字段:cat_id、cat_name,字符类型为 string:

// 创建外部表 hive (db)> create external table if not exists cat1(cat_id string,cat_name string) > row format delimited fields terminated by '\t' > location '/usr/root/goods'; OK Time taken: 0.315 seconds // 查看表 hive (db)> show tables; OK cat cat1 Time taken: 0.017 seconds, Fetched: 2 row(s)
3、修改表结构

修改 cat 表的表结构,对 cat 表添加两个字段 group_id 和 cat_code:

hive (db)> alter table cat add columns(group_id string,cat_code string); OK Time taken: 0.2 seconds hive (db)> desc cat; // 查看表结构 OK cat_id string cat_name string group_id string cat_code string Time taken: 0.042 seconds, Fetched: 4 row(s)

修改 cat1 表的表名,将其重命名为 cat2 :

hive (db)> alter table cat1 rename to cat2; OK Time taken: 0.142 seconds hive (db)> show tables; // 查看表 OK cat cat2 Time taken: 0.019 seconds, Fetched: 2 row(s)

该命令可以修改表名,数据所在的位置和分区名并不改变!!!


4、删除表

删除名为 cat2 的表:

hive (db)> drop table cat2; OK Time taken: 0.337 seconds hive (db)> show tables; OK cat Time taken: 0.024 seconds, Fetched: 1 row(s)
5、创建同结构表

创建与已知表相同结构的表,创建一个与 cat 表相同结构的表,名为 cat3,使用 like 关键字:

hive (db)> create table cat3 like cat; OK Time taken: 0.271 seconds hive (db)> show tables; OK cat cat3 Time taken: 0.018 seconds, Fetched: 2 row(s) hive (db)> desc cat3; OK cat_id string cat_name string group_id string cat_code string Time taken: 0.031 seconds, Fetched: 4 row(s) hive (db)> select * from cat3; OK Time taken: 1.599 seconds

通过查看表 cat3 的表结构及其表数据,可以看到 like 就是复制已有表的表结构。

返回顶部



1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #hive #创建库 #文章目录一Hive #db #数据仓库的信息及路径3删除 #数据仓库二Hive