在Ruby中向一个集合添加短字符串很慢
发布时间:2020-12-17 03:14:29 所属栏目:百科 来源:网络整理
导读:我正在尝试使用这个 Ruby代码从我的utf-8法语词典文件中提取所有独特的字符.字典是3.7 MB.由于某种原因,它需要我的体面的计算机大约半小时执行.有任何想法吗? c = Set.newf = open "dict"s = f.readf.closefor i in 0..s.length-1 c s[i]end 解决方法 在对
我正在尝试使用这个
Ruby代码从我的utf-8法语词典文件中提取所有独特的字符.字典是3.7 MB.由于某种原因,它需要我的体面的计算机大约半小时执行.有任何想法吗?
c = Set.new f = open "dict" s = f.read f.close for i in 0..s.length-1 c << s[i] end 解决方法
在对其执行任何计算之前一次性读取整个文件可防止IO与计算交错.此外,它会增加内存压力(如果你的内存接近极限,可能很重要)并且会大大减少
cache coherency.
我编写了以下小脚本,在我的/usr/share / dict / words文件中执行了.3秒 – 小于一兆字节,但仍然足够大,有点有趣: $cat /tmp/set.rb #!/usr/bin/ruby require 'set' c = Set.new f = open "/usr/share/dict/words" f.each_char do |char| c << char end p c $time /tmp/set.rb #<Set: {"A","n","'","s","B","M","C","T","H","I","D","S","O","L","P","W","Z","a","c","h","e","n","l","i","y","r","o","b","d","t","u","j","g","m","p","v","x","f","k","z","w","q","ó","ü","á","?","?","E","F","R","U","N","G","K","é","?","Q","è","V","J","X","?","?","í","Y","a","?","ê","?","?"}> real 0m0.341s user 0m0.340s sys 0m0.000s 你的程序在一分钟后仍在执行,我放弃了. 主要区别在于我使用内置迭代器将一小部分文件(可能是4k-16k)读入缓冲区,并在每次迭代时将特定字符交给我.这将一遍又一遍地重复使用相同的少量内存,并允许CPU的相对较小的缓存行存储整个数据. 编辑 通过一个小的测试用例,我能够将速度差异主要与each_char vs string子脚本进行隔离. J?rg points out that string subscripting is an O(N) operation – 因为UTF-8字符串不能像人们预期的那样通过乘法简单地索引,找到第N个字符意味着从头开始.因此,你的方法是O(N ^ 2),而我的方法只是O(N),这更能说明性能差异.我终于满足于我们找出了核心原因. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
推荐文章
站长推荐
热点阅读