加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > 安全 > 正文

亿级别记录的mongodb分页查询java代码实现

发布时间:2020-12-16 04:45:26 所属栏目:安全 来源:网络整理
导读:1.准备环境 1.1 mongodb下载 1.2 mongodb启动 ? C:mongodbbinmongod --dbpath D:mongodbdata 1.3 可视化mongo工具Robo 3T下载 2.准备数据 org.mongodb mongo-java-driver 3.6.1 java代码执行 /spanspan style="color: #0000ff;"gt;try/spanspan style="

1.准备环境

  1.1 mongodb下载

  1.2 mongodb启动

?    C:mongodbbinmongod --dbpath D:mongodbdata

  1.3 可视化mongo工具Robo 3T下载

2.准备数据

  

org.mongodb mongo-java-driver 3.6.1

java代码执行

</span><span style="color: #0000ff;"&gt;try</span><span style="color: #000000;"&gt; { </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Connect to MongoDB ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; Since 2.10.0,uses MongoClient</span> MongoClient mongo = <span style="color: #0000ff;"&gt;new</span> MongoClient("localhost",27017<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Get database ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; if database doesn't exists,MongoDB will create it for you</span> DB db = mongo.getDB("www"<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Get collection / table from 'testdb' ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; if collection doesn't exists,MongoDB will create it for you</span> DBCollection table = db.getCollection("person"<span style="color: #000000;"&gt;); </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Insert ***</span><span style="color: #008000;"&gt;*/</span> <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; create a document to store key and value</span> BasicDBObject document=<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;; </span><span style="color: #0000ff;"&gt;for</span>(<span style="color: #0000ff;"&gt;int</span> i=0;i<100000000;i++<span style="color: #000000;"&gt;) { document </span>= <span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject(); document.put(</span>"name","mkyong"+<span style="color: #000000;"&gt;i); document.put(</span>"age",30<span style="color: #000000;"&gt;); document.put(</span>"sex","f"<span style="color: #000000;"&gt;); table.insert(document); } </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Done ***</span><span style="color: #008000;"&gt;*/</span><span style="color: #000000;"&gt; System.out.println(</span>"Done"<span style="color: #000000;"&gt;); } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (UnknownHostException e) { e.printStackTrace(); } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (MongoException e) { e.printStackTrace(); } }</span></pre>

3.分页查询

 传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

= collection.find({:_id => {:$gt => collection_name </span>=<span style="color: #000000;"&gt; collection[:name] @logger.debug(</span><span style="color: #800000;"&gt;"</span><span style="color: #800000;"&gt;collection_data is: #{@collection_data}</span><span style="color: #800000;"&gt;"</span><span style="color: #000000;"&gt;) last_id </span>=<span style="color: #000000;"&gt; @collection_data[index][:last_id] </span><span style="color: #008000;"&gt;#</span><span style="color: #008000;"&gt;@logger.debug("last_id is #{last_id}",:index => index,:collection => collection_name)</span> <span style="color: #008000;"&gt;#</span><span style="color: #008000;"&gt; get batch of events starting at the last_place if it is set</span>

<span style="color: #000000;">

      last_id_object </span>=<span style="color: #000000;"&gt; last_id
      </span><span style="color: #0000ff;"&gt;if</span> since_type == <span style="color: #800000;"&gt;'</span><span style="color: #800000;"&gt;id</span><span style="color: #800000;"&gt;'</span><span style="color: #000000;"&gt;
        last_id_object </span>=<span style="color: #000000;"&gt; BSON::ObjectId(last_id)
      elsif since_type </span>== <span style="color: #800000;"&gt;'</span><span style="color: #800000;"&gt;time</span><span style="color: #800000;"&gt;'</span>
        <span style="color: #0000ff;"&gt;if</span> last_id != <span style="color: #800000;"&gt;''</span><span style="color: #000000;"&gt;
          last_id_object </span>=<span style="color: #000000;"&gt; Time.at(last_id)
        end
      end
      cursor </span>= get_cursor_for_collection(@mongodb,collection_name,batch_size)</pre>

使用java实现

<span style="color: #0000ff;">import<span style="color: #000000;"> org.bson.types.ObjectId;

<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.BasicDBObject;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DB;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBCollection;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBCursor;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.DBObject;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.MongoClient;
<span style="color: #0000ff;">import<span style="color: #000000;"> com.mongodb.MongoException;

<span style="color: #0000ff;">public <span style="color: #0000ff;">class<span style="color: #000000;"> Test {

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> <span style="color: #0000ff;"&gt;void</span><span style="color: #000000;"&gt; main(String[] args) {
    </span><span style="color: #0000ff;"&gt;int</span> pageSize=50000<span style="color: #000000;"&gt;;

    </span><span style="color: #0000ff;"&gt;try</span><span style="color: #000000;"&gt; {

        </span><span style="color: #008000;"&gt;/**</span><span style="color: #008000;"&gt;** Connect to MongoDB ***</span><span style="color: #008000;"&gt;*/</span>
        <span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt; Since 2.10.0,MongoDB will create it for you</span>
        DBCollection table = db.getCollection("person"<span style="color: #000000;"&gt;);
        DBCursor dbObjects;            
        Long cnt</span>=<span style="color: #000000;"&gt;table.count();
        </span><span style="color: #008000;"&gt;//</span><span style="color: #008000;"&gt;System.out.println(table.getStats());</span>
        Long page=<span style="color: #000000;"&gt;getPageSize(cnt,pageSize);
        ObjectId lastIdObject</span>=<span style="color: #0000ff;"&gt;new</span> ObjectId("5bda8f66ef2ed979bab041aa"<span style="color: #000000;"&gt;);

        </span><span style="color: #0000ff;"&gt;for</span>(Long i=0L;i<page;i++<span style="color: #000000;"&gt;) {
            Long start</span>=<span style="color: #000000;"&gt;System.currentTimeMillis();
            dbObjects</span>=<span style="color: #000000;"&gt;getCursorForCollection(table,lastIdObject,pageSize);
            System.out.println(</span>"第"+(i+1)+"次查询,耗时:"+(System.currentTimeMillis()-start)/1000+"秒"<span style="color: #000000;"&gt;);
            List</span><DBObject> objs=<span style="color: #000000;"&gt;dbObjects.toArray();
            lastIdObject</span>=(ObjectId) objs.get(objs.size()-1).get("_id"<span style="color: #000000;"&gt;);

        }            

    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (UnknownHostException e) {
        e.printStackTrace();
    } </span><span style="color: #0000ff;"&gt;catch</span><span style="color: #000000;"&gt; (MongoException e) {
        e.printStackTrace();
    }


}

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt; pageSize) {
    DBCursor dbObjects</span>=<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;;
    </span><span style="color: #0000ff;"&gt;if</span>(lastIdObject==<span style="color: #0000ff;"&gt;null</span><span style="color: #000000;"&gt;) {
        lastIdObject</span>=(ObjectId) collection.findOne().get("_id"<span style="color: #000000;"&gt;); //TODO 排序sort取第一个,否则可能丢失数据
    }
    BasicDBObject query</span>=<span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject();
    query.append(</span>"_id",<span style="color: #0000ff;"&gt;new</span> BasicDBObject("$gt"<span style="color: #000000;"&gt;,lastIdObject));
    BasicDBObject sort</span>=<span style="color: #0000ff;"&gt;new</span><span style="color: #000000;"&gt; BasicDBObject();
    sort.append(</span>"_id",1<span style="color: #000000;"&gt;);
    dbObjects</span>=<span style="color: #000000;"&gt;collection.find(query).limit(pageSize).sort(sort);
    </span><span style="color: #0000ff;"&gt;return</span><span style="color: #000000;"&gt; dbObjects;
}

</span><span style="color: #0000ff;"&gt;public</span> <span style="color: #0000ff;"&gt;static</span> Long getPageSize(Long cnt,<span style="color: #0000ff;"&gt;int</span><span style="color: #000000;"&gt; pageSize) {
    </span><span style="color: #0000ff;"&gt;return</span> cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1<span style="color: #000000;"&gt;;
}

}

4.一些经验教训

  1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

query.append("_id",new BasicDBObject("$gt",lastIdObject));  2.创建索引  创建普通的单列索引:db.collection.ensureIndex({field:1/-1});? 1是升续 -1是降续    实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功  查看当前索引状态: db.collection.getIndexes();  实例:  db.articles.getIndexes();  删除单个索引db.collection.dropIndex({filed:1/-1});

? ? ? 3.执行计划

?  db.student.find({"name":"dd1"}).explain()

?参考文献:

【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

【2】https://www.cnblogs.com/yxlblogs/p/4930308.html

【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读