大数据系列5:Pig – 大数据分析平台
wget?http://mirror.bit.edu.cn/apache/pig/pig-0.11.1/pig-0.11.1.tar.gz tar?-xzvf pig-0.11.1.tar.gz sudo vi?/etc/profile 增加: export PIG_HOME=/home/ysc/pig-0.11.1 exportPATH=$PATH:$PIG_HOME/bin source?/etc/profile cp?conf/log4j.properties.template conf/log4j.properties pig?--help LocalMode: 1、pig?-x local 2、java?-cp /home/ysc/pig-0.11.1/pig-0.11.1.jar org.apache.pig.Main -x local MapreduceMode(Default): 1、pig 2、pig?-x mapreduce 3、java?-cp /home/ysc/pig-0.11.1/pig-0.11.1.jar:/home/ysc/hadoop-1.2.1/conf org.apache.pig.Main 4、java?-cp /home/ysc/pig-0.11.1/pig-0.11.1.jar:/home/ysc/hadoop-1.2.1/conf org.apache.pig.Main -x mapreduce 准备数据: hadoop fs?-put /etc/passwd passwd Interactive Mode: 进入Pig shell(Local或Mapreduce Mode): pig(pig -x local) grunt>?A = load 'passwd' using PigStorage(':'); grunt>?B = foreach A generate $0 as id; grunt>?dump B; Batch Mode: 编写脚本: vi?id.pig 输入: /* id.pig */ -- load the passwd file A = load 'passwd' using PigStorage(':'); -- extract the user IDs B = foreach A generate $0 as id; -- write the results to a file name id.out store B into 'id.out'; 运行脚本(Local或Mapreduce Mode): pig(pig -x local)?id.pig 查看结果: hadoopfs?-cat id.out/part-m-00000 Pig使用HCatalog管理数据: 启动Metastore hcat_server.sh start & (或:hive --service metastore &) sudo vi?/etc/profile 增加: export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-*.jar: $HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar: $HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar: $HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar export PIG_OPTS=-Dhive.metastore.uris=thrift://host001:9083 ???????source?/etc/profile 创建表: ??????????????hcat -e "CREATETABLE students (name STRING,age INT)??ROW FORMAT DELIMITED???FIELDS TERMINATED BY 't'???LINES TERMINATED BY'n'???STORED AS TEXTFILE;?" 准备数据: ???????vi students.txt ???????输入: 刘德华51 张学友52 刘亦菲41 杨尚川27 成龙???55 洪金宝52 林志玲40 ???hadoop fs -put students.txt /user/ysc/students.txt 启动pig: pig -Dpig.additional.jars=$PIG_CLASSPATH 存储数据: ??????students = LOAD '/user/ysc/students.txt' AS (name:chararray,age:int); ??????dump students; STORE students INTO 'students' USING org.apache.hcatalog.pig.HCatStorer(); 加载数据: A= LOAD 'students' USING org.apache.hcatalog.pig.HCatLoader(); ? APDPlat旗下十大开源项目 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |