GPDB管理员笔记(三)装载和卸载数据
发布时间:2020-12-13 17:31:07 所属栏目:百科 来源:网络整理
导读:外部表定义 可读外部表(不可以做dml操作) 可写外部表(只insert,不可以select,update,delete) 装载 创建外部表 =# CREATE EXTERNAL WEB TABLE ext_expenses (name text, date date,amount float4,category text,description text) LOCATION ( 'http://int
外部表定义
可读外部表(不可以做dml操作)
可写外部表(只insert,不可以select,update,delete)
装载
创建外部表
=# CREATE EXTERNAL WEB TABLE ext_expenses (name text,
date date,amount float4,category text,description text) LOCATION ( 'http://intranet.company.com/expenses/sales/file.csv', 'http://intranet.company.com/expenses/exec/file.csv', 'http://intranet.company.com/expenses/finance/file.csv', 'http://intranet.company.com/expenses/ops/file.csv', 'http://intranet.company.com/expenses/marketing/file.csv', 'http://intranet.company.com/expenses/eng/file.csv' ) FORMAT 'CSV' ( HEADER );
装载外部表数据
=# INSERT INTO expenses_travel
SELECT * from ext_expenses where category='travel'; 或者想要快速装载全部数据到一个新的数据库表中: =# CREATE TABLE expenses AS SELECT * from ext_expenses;
测试:
[root@mdw ~]# wget http://mirrors.aliyun.com/repo/Centos-6.repo
--2014-03-04 13:51:30-- http://mirrors.aliyun.com/repo/Centos-6.repo 正在解析主机 mirrors.aliyun.com... 115.28.122.210,112.124.140.210 正在连接 mirrors.aliyun.com|115.28.122.210|:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:2086 (2.0K) [application/octet-stream] 正在保存至: “Centos-6.repo” 100%[==============================================================================================================================>] 2,086 --.-K/s in 0s 2014-03-04 13:51:30 (194 MB/s) - 已保存 “Centos-6.repo” [2086/2086]) libo=# CREATE EXTERNAL WEB TABLE ext_expenses (name text) libo-# location ('http://mirrors.aliyun.com/repo/Centos-6.repo') libo-# FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ; CREATE EXTERNAL TABLE
libo=# CREATE TABLE expenses AS SELECT * from ext_expenses;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'colum' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. ERROR: could not translate host name "mirrors.aliyun.com",port "80" to address: Temporary failure in name resolution (cdbutil.c:754) (seg0 slice1 sdw1:40000 pid=26261) (cdbdisp.c:1489) libo=# libo=# libo=# SELECT * from ext_expenses; ERROR: could not translate host name "mirrors.aliyun.com",port "80" to address: Temporary failure in name resolution (cdbutil.c:754) (seg0 slice1 sdw1:40000 pid=26254) (cdbdisp.c:1489)
libo=# drop EXTERNAL WEB TABLE ext_expenses ;
DROP EXTERNAL TABLE libo=# CREATE EXTERNAL WEB TABLE ext_expenses (colum text) libo-# location ('http://115.28.122.210/repo/Centos-6.repo') libo-# FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ; CREATE EXTERNAL TABLE libo=# select * from ext_expenses; ERROR: connection with gpfdist failed for http://115.28.122.210/repo/Centos-6.repo. effective url: http://115.28.122.210/repo/Centos-6.repo. (seg0 slice1 sdw1:40000 pid=26296) libo=# [1] 10321 [gpadmin@mdw data_tst]$ Serving HTTP on port 8081,directory /home/gpadmin/data_tst
[root@mdw ~]# wget http://192.168.100.101:8081/aaa
--2014-03-04 14:14:01-- http://192.168.100.101:8081/aaa 正在连接 192.168.100.101:8081... 已连接。 已发出 HTTP 请求,正在等待回应... 200 ok 长度:未指定 [text/plain] 正在保存至: “aaa” [ <=> ] 17 --.-K/s in 0s 2014-03-04 14:14:01 (1.61 MB/s) - “aaa” 已保存 [17]
libo=# CREATE EXTERNAL WEB TABLE ext_expenses (colum text)
libo-# location ('http://192.168.100.101:8081/aaa') libo-# FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ; CREATE EXTERNAL TABLE libo=# select * from ext_expenses; colum ------- aaaa aaa aa a (7 rows)
create table t as select * from t_extDISTRIBUTED BY(id);
libo=# create table t as select * from t_ext DISTRIBUTED RANDOMLY;
SELECT 10
libo=# create external table t_ext (id int,name text)
libo-# location ('gpfdist://192.168.100.11:8081/aaa.csv') libo-# format 'csv'; CREATE EXTERNAL TABLE libo=# select * from t_ext; ERROR: missing data for column "name" (seg3 slice1 sdw2:40001 pid=10243) DETAIL: External table t_ext,line 4000 of gpfdist://192.168.100.11:8081/aaa.csv: ""
原因:csv 中有空行
结论:外部表只支持gpfdist 的http协议gpfdist服务是GP的简单的web服务
装载错误处理:
在定义可读外部表时使用CREATE EXTERNAL TABLE命令
结合使用SEGMENT REJECT LIMIT子句。 ? 拒绝限制count参数可用于指定记录数(缺省),或者使用PERCENT指定记录 百分比。 ? 保存错误记录以备将来的检查,使用LOG ERRORS INTO子句指定错误记 录日志表。
使用gpload装载
卸载数据
禁止web表定义中使用execute
libo=# show gp_external_enable_exec
libo-# ; gp_external_enable_exec ------------------------- on (1 row)
数据格式
在使用各种GP命令装载或卸载数据时,需要指定数据如何格式化
行分隔
GPDB预期是以LF字符(Line Feed/换行符/0x0A)、CR(Carriage Return/回车/0x0D)
或者CR加LF(CR+LF/回车换行/0x0A 0x0D)作为一行的分割。LF是标准UNIX或 类UNIX操作系统的标准换行标识符。其他操作系统(如Windows、Mac OS 9)可 能是CR或者CR+LF。所有这些换行标识符在GPDB中都被支持作为行分隔符
列分隔
对于TEXT文件来说缺省的列分隔符是TAB字符(0x09),而 对 于CSV文件来说缺 省的列分隔符是逗号(0x2C)。不过在使用COPY、CREATE EXTERNAL TABLE 时或者使用gpload定义数据格式时都可以使用DELIMITER子句执行其他的单 字符分隔符。
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |