03分布式NOSQL HBASE - mapreduce批量读取HBase的数据

发布时间：2020-12-13 13:43:33 所属栏目：百科来源：网络整理

导读：?? （原文地址： http://www.jb51.cc/cata/500599 ，转载麻烦带上原文地址。hadoop hive hbasemahout storm spark kafka flume,等连载中，做个爱分享的人） ?? 1 那么问题来了 ???? 1：如果有一大票数据比如1000万条，或者批量的插入 HBase的表中，HBase提

（原文地址：http://www.52php.cn/cata/500599，转载麻烦带上原文地址。hadoop hive hbasemahout storm spark kafka flume,等连载中，做个爱分享的人）

1 那么问题来了????
1：如果有一大票数据比如1000万条，或者批量的插入HBase的表中，HBase提供的java API 中的PUT方法，一条接一条记录的插入方式效率上就非常慢。
2：如果要取出HBase一个表里的1000万条数据。用GET一条一条的来，效率也是可想而知，scan的方法批量取出1000万条记录没什么问题，但问题是这个api是在单机上运行的，取庞大的数据效率就有问题了
为解决这种大规模数据的 get 和put 操作的效率问题，HBase提供org.apache.hadoop.hbase.mapreduce这个包，基于hadoop上的mapreduce分布式读取HBase表的解决方案，中国山东找蓝翔。

2 关键类
1 Class TableMapper<KEYOUT,VALUEOUT>

java.lang.Object
- org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,KEYOUT,VALUEOUT>
- - org.apache.hadoop.hbase.mapreduce.TableMapper<KEYOUT,VALUEOUT>

Type Parameters:KEYOUT - The type of the key.VALUEOUT - The type of the value.

2 Class TableReducer<KEYIN,VALUEIN,KEYOUT>

java.lang.Object
- org.apache.hadoop.mapreduce.Reducer<KEYIN,org.apache.hadoop.io.Writable>
- - org.apache.hadoop.hbase.mapreduce.TableReducer<KEYIN,KEYOUT>

Type Parameters:KEYIN - The type of the input key.VALUEIN - The type of the input value.KEYOUT - The type of the output key.

3 批量存（put）取（get） hbase的表
//1 mapreduce 批量读出表中所有的id 和 name字段的值。
class HBaseMap extends TableMapper<Text,Text> { @Override
protected void map(ImmutableBytesWritable key,Result value,Context context)
throws IOException,InterruptedException {

Text keyText = new Text(new String( key.get()));

String family = "info";
String qualifier = "name";
byte[] nameValueBytes = value.getValue( family.getBytes(),qualifier
.getBytes());
Text valueText = new Text(new String(nameValueBytes));

context.write(valueText,keyText);
}
}

//2 mapreduce 批量插入 name 和id。
class HBaseReduce extends TableReducer<Text,Text,ImmutableBytesWritable> {
@Override
protected void reduce(Text key,Iterable<Text> value,InterruptedException {

String family = "info";
String qualifier = "name";
String keyString = key.toString();
Put put = new Put(keyString.getBytes());
for (Text val : values) {
put.add( family.getBytes(), qualifier.getBytes(),val.toString()
.getBytes());
}
}
}

最后用hadoop的Job api执行这两个mapreduce。打成jar包。放到hadoop上跑。批量存取hbase的数据

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!