php – 如何正确迭代一个大的json文件
亲爱的Stackoverflow社区,
我有一个34 GB的json文件,里面有很多数据.我试图通过使用mongoimport –file file.json导入到我的mongodb中 – 但它失败了,因为文件太大而扔了一个你知道的内存系统抛出错误.是否可以使用PHP代码使用游标迭代文件?我对此没有经验,有人告诉我这是可能的.我想知道文件是如何构建的,但我不知道如何查看它的示例数组.从源代码我可以得到一个示例数组: { "_id": ObjectId("53b29644aafd413977b23b7e"),"summonerId": NumberLong(24570940),"region": "euw","updatedAt": NumberLong(1404212804),"season": NumberLong(4),"stats": { "110": { "totalSessionsPlayed": NumberLong(3),"totalSessionsLost": NumberLong(2),"totalSessionsWon": NumberLong(1),"totalChampionKills": NumberLong(34),"totalDamageDealt": NumberLong(415051),"totalDamageTaken": NumberLong(63237),"mostChampionKillsPerSession": NumberLong(12),"totalMinionKills": NumberLong(538),"totalDoubleKills": NumberLong(5),"totalTripleKills": NumberLong(1),"totalDeathsPerSession": NumberLong(18),"totalGoldEarned": NumberLong(40977),"totalTurretsKilled": NumberLong(6),"totalPhysicalDamageDealt": NumberLong(381668),"totalMagicDamageDealt": NumberLong(31340),"totalAssists": NumberLong(25),"maxChampionsKilled": NumberLong(12),"maxNumDeaths": NumberLong(10) } } } 字段统计包含更多数组,110只是一个示例. 再一次……召唤者ID有一个在他的职业生涯中一直在玩的冠军名单.冠军指的是(在这个例子中)110.每个召唤者都可以包含多个冠军,我希望拥有所有冠军,召唤冠军的次数(totalsessionplay).
您将需要使用流式解析器.这些只能一次将文件的一小部分拉入内存.
它们有几种不同的风格:类似SAX的推送解析器和拉解析器. XML reader models: SAX versus XML pull parser概述了差异. 推分析器 这是使用salsify/json-streaming-parser的快速示例. 当它浏览文件时,我们将跟踪summonerId,championId和state.它都是基于事件的 – 您不会通过顺序解析器获得随机访问权限,因此您必须自己跟踪事物.每当totalSessionsPlayed出现时,它都会回显summonerId,championId和totalSessionsPlayed. data.json 这是一个用于演示目的的配对json文件. [ { "_id": "53b29644aafd413977b23b7e","summonerId": 24570940,"stats": { "110": { "totalSessionsPlayed": 3,"totalSessionsLost": 2,"totalSessionsWon": 1 },"112": { "totalSessionsPlayed": 45,"totalSessionsWon": 1 } } },{ "_id": "asdfasdfasdf","summonerId": 555555,"stats": { "42": { "totalSessionsPlayed": 65,"88": { "totalSessionsPlayed": 99,"totalSessionsWon": 1 } } } ] 例: class ListMatchUps extends JsonStreamingParserListenerIdleListener { private $key; private $summonerId; private $championId; private $inStats; public function start_document() { $this->key = null; $this->summonerId = null; $this->championId = null; $this->inStats = false; } public function start_object() { if ($this->key === 'stats') { $this->inStats = true; } else if ($this->inStats) { $this->championId = $this->key; } } public function end_object() { if ($this->championId !== null) { $this->championId = null; } else if ($this->inStats) { $this->inStats = false; } else { $this->summonerId = null; } } public function key($key) { $this->key = $key; } public function value($value) { switch ($this->key) { case 'summonerId': $this->summonerId = $value; break; case 'totalSessionsPlayed': echo "{$this->summonerId},{$this->championId},$valuen"; break; } } } $stream = fopen('data.json','r'); $listener = new ListMatchUps(); try { $parser = new JsonStreamingParser_Parser($stream,$listener); $parser->parse(); } catch (Exception $e) { fclose($stream); throw $e; } 输出: 24570940,110,3 24570940,112,45 555555,42,65 555555,88,99 拉解析器 这是使用我最近编写的解析器,pcrov/jsonreader(需要PHP 7.) 与上面相同的data.json. 例: use pcrovJsonReaderJsonReader; $reader = new JsonReader(); $reader->open("data.json"); while($reader->read("summonerId")) { $summonerId = $reader->value(); $reader->next("stats"); foreach($reader->value() as $championId => $stats) { echo "$summonerId,$championId,{$stats['totalSessionsPlayed']}n"; } } $reader->close(); 输出: 24570940,99 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |