跟着时下炒得火热的NOSQL潮流,学习了一下mongodb,记录在此,希望与感兴趣的同学一起研究!
MongoDB概述
mongodb由C++写就,其名字来自humongous这个单词的中间部分,是由10gen开发并维护的,关于它的一个最简洁描述为:scalable,high-performance,open source,schema-free,document-oriented database。MongoDB的主要目标是在键/值存储方式(提供了高性能和高度伸缩性)以及传统的RDBMS系统(丰富的功能)架起一座桥梁,集两者的优势于一身。
MongoDB特性:
l 面向文档存储
l 全索引支持,扩展到内部对象和内嵌数组
l 复制和高可用
l 自动分片支持云级扩展性
l 查询记录分析
l 动态查询
l 快速,就地更新
l 支持Map/Reduce操作
l GridFS文件系统
l 商业支持,培训和咨询
官网: http://www.mongodb.org/
配置
Master-slaves 模式
机器 |
IP |
角色 |
test001 |
192.168.1.1 |
master |
test002 |
192.168.1.2 |
slave |
test003 |
192.168.1.3 |
slave |
test004 |
192.168.1.4 |
slave |
test005 |
192.168.1.5 |
slave |
test006 |
192.168.1.6 |
slave |
启动master:
1
|
./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -oplogSize=10000 -logappend -master -port=27017 -fork
|
添加repl用户:
1
2
3
|
./mongo
>use local
> db.addUser('repl','replication');
|
启动slaves:
1
2
|
./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -slave -port=27017 -source=test001:27017 --autoresync
-fork
|
添加repl用户:
1
2
3
|
./mongo
>use local
> db.addUser('repl','replication');
|
autoresync 参数会在系统发生意外情况造成主从数据不同步时,自动启动复制操作 (同步复制 10 分钟内仅执行一次)。除此之外,还可以用 –slavedelay 设定更新频率(秒)。
通常我们会使用主从方案实现读写分离,但需要设置 Slave_OK。
slaveOk
When querying a replica pair or replica set,drivers route their requests to the master mongod by default; to perform a query against an (arbitrarily-selected) slave,the query can be run with the slaveOk option. Here’s how to do so in the shell:
db.getMongo().setSlaveOk(); // enable querying a slave
db.users.find(...)
Note: some language drivers permit specifying the slaveOk option on each find(),others make this a connection-wide setting. See your language’s driver for details.
Replica Set模式
Replica Sets 使用 n 个 Mongod 节点,构建具备自动容错转移(auto-failover)、自动恢复(auto-recovery) 的高可用方案。
机器 |
IP |
角色 |
test001 |
192.168.1.1 |
secondary |
test002 |
192.168.1.2 |
secondary |
test003 |
192.168.1.3 |
primary |
test004 |
192.168.1.4 |
secondary |
test005 |
192.168.1.5 |
secondary |
test006 |
192.168.1.6 |
secondary |
test007 |
192.168.1.7 |
secondary |
启动:
1
|
./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -oplogSize=10000 -logappend -replSet set1 -port=27017 -fork –rest
|
添加repl用户:
1
2
3
|
./mongo
>use local
> db.addUser('repl','replication');
|
配置:
1
2
3
4
5
6
7
8
9
10
|
config={_id:'set1',members:[
{_id:0,host:'test001:27017'},
{_id:1,host:'test002:27017'},
{_id:2,host:'test003:27017'},
{_id:3,host:'test004:27017'},
{_id:4,host:'test005:27017'},
{_id:5,host:'test006:27017'},
{_id:6,host:'test007:27017'}]
}
rs.initiate(config);
|
查看:
访问 http://test001 :28017/_replSet
或者
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
./mongo
> rs.status()
{
"set" : "set1",
"date" : "Fri Dec 03 2010 00:57:44 GMT+0800 (CST)",
"myState" : 2,
"members" : [
{
"_id" : 0,
"name" : "test001:27017",
"health" : 1,
"state" : 2,
"self" : true
},
{
"_id" : 1,
"name" : "test002:27017",
"health" : 1,
"state" : 2,
"uptime" : 194451,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"
},
{
"_id" : 2,
"name" : "test003:27017",
"health" : 1,
"state" : 1,
"uptime" : 194689,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:43 GMT+0800 (CST)"
},
{
"_id" : 3,
"name" : "test004:27017",
"health" : 1,
"state" : 2,
"uptime" : 194689,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"
},
{
"_id" : 4,
"name" : "test005:27017",
"health" : 1,
"state" : 2,
"uptime" : 194689,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"
},
{
"_id" : 5,
"name" : "test006:27017",
"health" : 1,
"state" : 2,
"uptime" : 194689,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:43 GMT+0800 (CST)"
},
{
"_id" : 6,
"name" : "test007:27017",
"health" : 1,
"state" : 2,
"uptime" : 194689,
"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"
}
],
"ok" : 1
}
|
在Replica Sets上做操作后调用getlasterror使写操作同步到至少3台机器后才返回
db.runCommand( { getlasterror : 1,w : 3 } )
注:该模式不支持auth功能,需要auth功能请选择m-s模式
Sharding模式
要构建一个 MongoDB Sharding Cluster,需要三种角色:
- Shard Server: mongod 实例,用于存储实际的数据块。
- Config Server: mongod 实例,存储了整个 Cluster Metadata,其中包括 chunk 信息。
- Route Server: mongos 实例,前端路由,客户端由此接入,且让整个集群看上去像单一进程数据库。
机器 |
IP |
角色 |
test002 |
192.168.1.2 |
mongod shard11:27017 |
test003 |
192.168.1.3 |
mongod shard21:27017 |
test004 |
192.168.1.4 |
mongod shard31:27017 |
test005 |
192.168.1.5 |
mongod config1:20000 mongs1:30000 |
test006 |
192.168.1.6 |
mongod config2:20000 mongs2:30000 |
test007 |
192.168.1.7 |
mongod config3:20000 mongs3:30000 |
test008 |
192.168.1.8 |
mongod shard12:27017 |
test009 |
192.168.1.9 |
mongod shard22:27017 |
test010 |
192.168.1.10 |
mongod shard32:27017 |
Shard配置
Shard1
[test002; test008]
test002:
1
|
./mongod -shardsvr -replSet shard1 -port 27017 -dbpath /mongodb/data/shard11 -oplogSize 10000 -logpath /mongodb/logs/shard11.log -logappend -fork
|
test008:
1
|
./mongod -shardsvr -replSet shard1 -port 27017 -dbpath /mongodb/data/shard12 -oplogSize 10000 -logpath /mongodb/logs/shard12.log -logappend -fork
|
初始化shard1
1
2
3
4
5
|
config={_id:'shard1',host:'test008:27017'}]
}
rs.initiate(config);
|
Shard2
[test003; test009]
test003:
1
|
./mongod -shardsvr -replSet shard2 -port 27017 -dbpath /mongodb/data/shard21 -oplogSize 10000 -logpath /mongodb/logs/shard21.log -logappend -fork
|
test009:
1
|
./mongod -shardsvr -replSet shard2 -port 27017 -dbpath /mongodb/data/shard22 -oplogSize 10000 -logpath /mongodb/logs/shard22.log -logappend -fork
|
初始化shard2
1
2
3
4
5
|
config={_id:'shard2',host:'test009:27017'}]
}
rs.initiate(config);
|
Shard3
[test004; test010]
test004:
1
|
./mongod -shardsvr -replSet shard3 -port 27017 -dbpath /mongodb/data/shard31 -oplogSize 10000 -logpath /mongodb/logs/shard31.log -logappend -fork
|
test010:
1
|
./mongod -shardsvr -replSet shard3 -port 27017 -dbpath /mongodb/data/shard32 -oplogSize 10000 -logpath /mongodb/logs/shard32.log -logappend -fork
|
初始化shard3
1
2
3
4
5
|
config={_id:'shard3',host:'test010:27017'}]
}
rs.initiate(config);
|
config server配置
[test005; test006; test007]
1
|
./mongod -configsvr -dbpath /mongodb/data/config -port 20000 -logpath /mongodb/logs/config.log -logappend -fork
|
Mongos配置
[test005; test006; test007]
1
|
./mongos -configdb test005:20000,test006:20000,test007:20000 -port 30000 -chunkSize 5 -logpath /mongodb/logs/mongos.log -logappend -fork
|
Route 转发请求到实际的目标服务进程,并将多个结果合并回传给客户端。Route 本身并不存储任何数据和状态,仅在启动时从 Config Server 获取信息。Config Server 上的任何变动都会传递给所有的 Route Process。
Configuring the Shard Cluster
1. 连接admin数据库
1
|
./mongo test005:30000/admin
|
2. 加入shards
1
2
3
|
db.runCommand({addshard:"shard1/test002:27017,test008:27017",name:"s1",maxsize:20480});
db.runCommand({addshard:"shard2/test003:27017,test009:27017",name:"s2",maxsize:20480});
db.runCommand({addshard:"shard3/test004:27017,test010:27017",name:"s3",maxsize:20480});
|
3. Listing shards
1
|
db.runCommand({listshards:1})
|
如果列出了以上3个shards,表示shards已经配置成功
4. 激活数据库和表分片
1
2
|
db.runCommand({enablesharding:"taobao"});
db.runCommand({shardcollection:"taobao.test0",key:{_id:1}}); db.runCommand({shardcollection:"taobao.test1",key:{_id:1}});
|
使用
shell操作数据库
超级用户相关:
1) 进入数据库admin
2) 增加或修改用户密码
3) 查看用户列表
4) 用户认证
5) 删除用户
6) 查看所有用户
7) 查看所有数据库
8) 查看所有的collection
9) 查看各collection的状态
1
|
db.printCollectionStats()
|
10) 查看主从复制状态
1
|
db.printReplicationInfo()
|
11) 修复数据库
12) 设置记录profiling,0=off 1=slow 2=all
13) 查看profiling
14) 拷贝数据库
1
|
db.copyDatabase('mail_addr','mail_addr_tmp')
|
15) 删除collection
16) 删除当前的数据库
增加删除修改:
1) Insert
1
2
3
|
db.user.insert({'name':'dump','age':1})
or
db.user.save({'name':'dump','age':1})
|
嵌套对象:
1
|
db.foo.save({'name':'dump','address':{'city':'hangzhou','post':310015},'phone':[138888888,13999999999]})
|
数组对象:
1
|
db.user_addr.save({'Uid':'dump','Al':['test-1@taobao.com','test-2@taobao.com']})
|
2) delete
删除name=’dump’的用户信息:
1
|
db.user.remove({'name':'dump'})
|
删除foo表所有信息:
3) update
//update foo set xx=4 where yy=6
//如果不存在则插入,允许修改多条记录
1
|
db.foo.update({'yy':6},{'$set':{'xx':4}},upsert=true,multi=true)
|
查询:
1
2
3
4
5
6
7
8
|
coll.find() // select * from coll
coll.find().limit(10) // select * from coll limit 10
coll.find().sort({x:1}) // select * from coll order by x asc
coll.find().sort({x:1}).skip(5).limit(10) // select * from coll order by x asc limit 5,10
coll.find({x:10}) // select * from coll where x = 10
coll.find({x: {$lt:10}}) // select * from coll where x <= 10
coll.find({},{y:true}) // select y from coll
coll.count() //select count(*) from coll
|
其他:
1
2
3
4
5
|
coll.find({"address.city":"gz"}) // 搜索嵌套文档address中city值为gz的记录
coll.find({likes:"math"}) // 搜索数组
coll.find({name: {$exists: true}});//查询所有存在name字段的记录
coll.find({phone: {$exists: false}});//查询所有不存在phone字段的记录
coll.find({name: {$type: 2}});//查询所有name字段是字符类型的coll.find({age: {$type: 16}});//查询所有age字段是整型的
|
索引:
1(ascending),-1(descending)
1
2
3
4
5
6
7
|
coll.ensureIndex({productid:1}) // 在productid上建立普通索引
coll.ensureIndex({district:1,plate:1}) // 多字段索引
coll.ensureIndex({"address.city":1}) // 在嵌套文档的字段上建索引
coll.ensureIndex({productid:1},{unique:true}) // 唯一索引
coll.ensureIndex({productid:1},{unique:true,dropDups:true|) // 建索引时,如果遇到索引字段值已经出现过的情况,则删除重复记录
coll.getIndexes() // 查看索引
coll.dropIndex({productid:1}) // 删除单个索引
|
MongoDB Drivers
C
C#
C++
Haskell
Java
Javascript
Perl
PHP
Python
Ruby
Scala (via Casbah)
Mongodb支持的client 编程api非常多,由于dump中心是建立在hadoop的基础上的,所以着重介绍java api,后面的测试程序采用的也是java api.
MongoDB in Java
下载MongoDB的Java驱动,把jar包(mongo-2.3.jar)扔到项目里去就行了,
Java中,Mongo对象是线程安全的,一个应用中应该只使用一个Mongo对象。Mongo对象会自动维护一个连接池,默认连接数为10。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
import com.mongodb.*
try{
Mongo mg = new Mongo(server_lists);// List<ServerAddress> server _lists
DB db = mg.getDB("taobao");
if (db.isAuthenticated() == false) {
db.authenticate("name","pwd".toCharArray());
}
DBCollection coll=db.getCollection("category_property_values");
coll.slaveOk();//repl set模式必须调用,否则所有query将只发到主节点查询
//insert
BasicDBObject doc = <strong>new</strong> BasicDBObject();
//赋值
doc.put("name","MongoDB");
doc.put("type","database");
coll.insert(doc);
……
//select
//查询一条数据
BasicDBObject doc = <strong>new</strong> BasicDBObject();
doc.put("name","MongoDB");
DBObject query = coll.findOne(doc);
……
//使用游标查询
DBCursor cur = coll.find(doc);
while(cur.hasNext()) {
cur.next();
……
}
……
//update
DBObject dblist = new BasicDBObject();
DBObject qlist = new BasicDBObject();
qlist.put("_id",j);
dblist.put("t1",str);
coll.update(qlist,dblist);
……
//delete
DBObject dlist = new BasicDBObject();
dlist.put("_id",j);
coll.remove(dlist);
}catch(MongoException ex){
}
|
MongoDB 测试
测试版本: 1.6.3
采用单线程分别插入100万,300万,500万,1000万数据和多个线程,每线程插入100万数据.
插入数据格式:
1
|
{ "_id" : NumberLong(16),"nid" : NumberLong(16),"t1" : "search_engine_insert","t2" : "search_engine_insert","t3" : "search_engine_insert","t4" : "search_engine_insert" }
|
1) Master slaves模式
Insert
Per-thread rows |
run time |
Per-thread insert |
Total-insert |
Total rows |
threads |
1000000 |
20 |
50000 |
50000 |
1000000 |
1 |
3000000 |
60 |
50000 |
50000 |
3000000 |
1 |
5000000 |
99 |
50505 |
50505 |
5000000 |
1 |
8000000 |
159 |
50314 |
50314 |
8000000 |
1 |
10000000 |
208 |
48076 |
48076 |
10000000 |
1 |
1000000 |
64 |
15625 |
31250 |
2000000 |
2 |
Mongodb只有主节点才能进行插入和更新操作.
Update
数据格式:
1
|
{ "_id" : NumberLong(16),"t1" : "search_engine_update","t2" : "search_engine_update","t3" : "search_engine_update","t4" : "search_engine_update" }
|
Per-thread rows |
run time |
Per-thread update |
Total-update |
Total rows |
threads |
1000000 |
96 |
10416 |
10416 |
1000000 |
1 |
3000000 |
287 |
10452 |
10452 |
3000000 |
1 |
1000000 |
188 |
5319 |
15957 |
3000000 |
3 |
1000000 |
351 |
2849 |
14245 |
5000000 |
5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
72 |
13888 |
13888 |
1000000 |
1 |
1000000 |
129 |
7751 |
77519 |
10000000 |
10 |
1000000 |
554 |
1805 |
90252 |
50000000 |
50 |
1000000 |
1121 |
892 |
89206 |
100000000 |
100 |
1000000 |
2256 |
443 |
88652 |
200000000 |
200 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
173 |
5780 |
5780*39=223470 |
1000000*39 |
1 |
1000000 |
1402 |
713 |
7132*39=278148 |
10000000*39 |
10 |
500000 |
1406 |
355 |
7112*39=277368 |
10000000*39 |
20 |
200000 |
1433 |
139 |
6978*39=272142 |
10000000*39 |
50 |
2) Replica Set 模式
Insert
Per-thread rows |
run time |
Per-thread insert |
Total-insert |
Total rows |
threads |
1000000 |
40 |
25000 |
25000 |
1000000 |
1 |
3000000 |
117 |
25641 |
25641 |
3000000 |
1 |
5000000 |
211 |
23696 |
23696 |
5000000 |
1 |
8000000 |
289 |
27681 |
27681 |
8000000 |
1 |
10000000 |
388 |
25773 |
25773 |
10000000 |
1 |
1000000 |
83 |
12048 |
24096 |
2000000 |
2 |
1000000 |
210 |
4762 |
23809 |
5000000 |
5 |
Update
Per-thread rows |
run time |
Per-thread update |
Total-update |
Total rows |
threads |
1000000 |
28 |
35714 |
35714 |
1000000 |
1 |
3000000 |
83 |
36144 |
36144 |
3000000 |
1 |
1000000 |
146 |
6849 |
20547 |
3000000 |
3 |
1000000 |
262 |
3816 |
19083 |
5000000 |
5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
198 |
5050 |
5050 |
1000000 |
1 |
1000000 |
264 |
3787 |
37878 |
10000000 |
10 |
1000000 |
436 |
2293 |
114678 |
50000000 |
50 |
1000000 |
754 |
1326 |
132625 |
100000000 |
100 |
1000000 |
1526 |
655 |
131061 |
200000000 |
200 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
216 |
4629 |
4629*39=180531 |
1000000*39 |
1 |
1000000 |
1375 |
729 |
7293*39=284427 |
10000000*39 |
10 |
500000 |
1469 |
340 |
6807*39=265473 |
10000000*39 |
20 |
200000 |
1561 |
128 |
6406*39=249834 |
10000000*39 |
50 |
3) Sharding 模式
Insert
Per-thread rows |
run time |
Per-thread insert |
Total-insert |
Total rows |
threads |
1000000 |
58 |
17241 |
17241 |
1000000 |
1 |
3000000 |
180 |
16666 |
16666 |
3000000 |
1 |
5000000 |
373 |
13404 |
13404 |
5000000 |
1 |
2000000 |
234 |
8547 |
17094 |
4000000 |
2 |
2000000 |
447 |
4474 |
22371 |
10000000 |
5 |
Update
Per-thread rows |
run time |
Per-thread update |
Total-update |
Total rows |
threads |
1000000 |
38 |
26315 |
26315 |
1000000 |
1 |
3000000 |
115 |
26086 |
26086 |
3000000 |
1 |
1000000 |
64 |
15625 |
46875 |
3000000 |
3 |
1000000 |
93 |
10752 |
53763 |
5000000 |
5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
277 |
3610 |
3610 |
1000000 |
1 |
1000000 |
456 |
2192 |
21929 |
10000000 |
10 |
1000000 |
1158 |
863 |
43177 |
50000000 |
50 |
1000000 |
2299 |
434 |
43497 |
100000000 |
100 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows |
run time |
Per-thread select |
Total-select |
Total rows |
threads |
1000000 |
659 |
1517 |
1517*39= 59163 |
1000000*39 |
1 |
1000000 |
8540 |
117 |
1170*39=45630 |
10000000*39 |
10 |
小结:
Mongodb在M-S和Repl-Set模式下查询效率还是不错的,区别在于Repl-Set模式如果有primary节点挂掉,系统自己会选举出另一个primary节点,不会影响后续的使用,原来的主节点恢复后自动成为secondary节点,而M-S模式一旦master节点挂掉需要手工将别的slaves节点修改成master,另外Repl-Set模式最多只能有7个节点.
由于sharding模式查询速度下降明显,耗时太长,所以只测试了2轮,估计他的威力应该在数据量非常大的环境下才能体现出来吧,以上数据仅供参考,现在只是简单的进行了测试,接下来会对源码进行一下研究,欢迎和感兴趣的同学多多交流! (编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|