加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

Java – Reading a Large File Efficiently--转

发布时间:2020-12-14 06:20:58 所属栏目:Java 来源:网络整理
导读:原文地址:http://www.baeldung.com/java-read-lines-large-file 1. Overview This tutorial will show? how to read all the lines from a large file in Java ?in an efficient manner. This article is part of? Java – Back to Basic ” tutorial?here

原文地址:http://www.baeldung.com/java-read-lines-large-file

1. Overview

This tutorial will show?how to read all the lines from a large file in Java?in an efficient manner.

This article is part of?Java – Back to Basic” tutorial?here on Baeldung.

2. Reading In Memory

The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to?OutOfMemoryError?if the File is large enough.

For example –?reading a ~1Gb file:

This starts off with a small amount of memory being consumed:?(~0 Mb consumed)

However,?after the full file has been processed,we have at the end:?(~2 Gb consumed)

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that?keeping in-memory the contents of the file will quickly exhaust the available memory?– regardless of how much that actually is.

What’s more,?we usually don’t need all of the lines in the file in memory at once?– instead,we just need to be able to iterate through each one,do some processing and throw it away. So,this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a?java.util.Scanner?to run through the contents of the file and retrieve lines serially,one by one:

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion,?without keeping them in memory:?(~150 Mb consumed)

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well,by using?the customLineIterator?provided by the library:

Since the entire file is not fully in memory – this will also result in?pretty conservative memory consumption numbers:?(~150 Mb consumed)

5. Conclusion

This quick article shows how to?process lines in a large file without iteratively,without exhausting the available memory?– which proves quite useful when working with these large files.

The implementation of all these examples and code snippets?can be found in??– this is an Eclipse based project,so it should be easy to import and run as it is.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!