【Go】strings.Replace 与 bytes.Replace 调优
原文链接:https://blog.thinkeridea.com/201902/go/replcae_you_hua.html 标准库中函数大多数情况下更通用,性能并非最好的,还是不能过于迷信标准库,最近又有了新发现, 我的服务中有部分代码使用
本博文中使用函数均在 go-extend 中,优化后的函数在 exbytes.Replace 中。 发现问题近期使用 go tool pprof allocs File: xxx Type: alloc_space Time: Feb 1,2019 at 9:53pm (CST) Entering interactive mode (type "help" for commands,"o" for options) (pprof) top Showing nodes accounting for 617.29GB,48.86% of 1263.51GB total Dropped 778 nodes (cum <= 6.32GB) Showing top 10 nodes out of 157 flat flat% sum% cum cum% 138.28GB 10.94% 10.94% 138.28GB 10.94% logrus.(*Entry).WithFields 95.27GB 7.54% 18.48% 95.27GB 7.54% strings.Replace 67.05GB 5.31% 23.79% 185.09GB 14.65% v3.(*v3Adapter).parseEncrypt 57.01GB 4.51% 28.30% 57.01GB 4.51% bufio.NewWriterSize 56.63GB 4.48% 32.78% 56.63GB 4.48% bufio.NewReaderSize 56.11GB 4.44% 37.23% 56.11GB 4.44% net/url.unescape 39.75GB 3.15% 40.37% 39.75GB 3.15% regexp.(*bitState).reset 36.11GB 2.86% 43.23% 38.05GB 3.01% des3_and_base64.(*des3AndBase64).des3Decrypt 36.01GB 2.85% 46.08% 36.01GB 2.85% des3_and_base64.(*des3AndBase64).base64Decode 35.08GB 2.78% 48.86% 35.08GB 2.78% math/big.nat.make 标准库中最常用的函数,居然……,不可忍必须优化,先使用 (pprof) list strings.Replace Total: 1.23TB ROUTINE ======================== strings.Replace in /usr/local/go/src/strings/strings.go 95.27GB 95.27GB (flat,cum) 7.54% of Total . . 858: } else if n < 0 || m < n { . . 859: n = m . . 860: } . . 861: . . 862: // Apply replacements to buffer. 47.46GB 47.46GB 863: t := make([]byte,len(s)+n*(len(new)-len(old))) . . 864: w := 0 . . 865: start := 0 . . 866: for i := 0; i < n; i++ { . . 867: j := start . . 868: if len(old) == 0 { . . 869: if i > 0 { . . 870: _,wid := utf8.DecodeRuneInString(s[start:]) . . 871: j += wid . . 872: } . . 873: } else { . . 874: j += Index(s[start:],old) . . 875: } . . 876: w += copy(t[w:],s[start:j]) . . 877: w += copy(t[w:],new) . . 878: start = j + len(old) . . 879: } . . 880: w += copy(t[w:],s[start:]) 47.81GB 47.81GB 881: return string(t[0:w]) . . 882:} 从源码发现首先创建了一个 既然类型转换这里有点浪费空间,有没有办法可以零成本转换呢,那就使用 go-extend 这个包里面的 我记得 这样确实减少了一倍的内存分配,即使只有 分析代码使用火焰图看看究竟什么函数在调用 这里主要是两个方法在使用,当然我记得还有几个地方有使用,看来不在火焰图中应该影响比较低 ,看一下代码吧(简化的代码不一定完全合理): // 第一部分 func (v2 *v2Adapter) parse(s string) (*AdRequest,error) { s = strings.Replace(s," ","",-1) requestJSON,err := v2.paramCrypto.Decrypt([]byte(s)) if err != nil { return nil,err } request := v2.getDefaultAdRequest() if err := request.UnmarshalJSON(requestJSON); err != nil { return nil,err } return request,nil } // 第二部分 func (v3 *v3Adapter) parseEncrypt(s []byte) ([]byte,error) { ss := strings.Replace(string(s),err := v3.paramCrypto.Decrypt([]byte(ss)) if err != nil { return nil,error } return requestJSON,nil } // 通过搜索找到的第三部分 type LogItems []string func LogItemsToBytes(items []string,sep,newline string) []byte { for i := range items { items[i] = strings.Replace(items[i],-1) } str := strings.Replace(strings.Join(items,sep),newline,-1) return []byte(str + newline) } 通过分析我们发现前两个主要是为了删除一个字符串,第三个是为了把一个字符串替换为另一个字符串,并且源数据的生命周期很短暂,在执行替换之后就不再使用了,能不能原地替换字符串呢,原地替换的就会变成零分配了,尝试一下吧。 优化先写一个函数简单实现原地替换,输入的 func Replace(s,old,new []byte,n int) []byte { if n == 0 { return s } if len(old) < len(new) { return bytes.Replace(s,new,n) } if n < 0 { n = len(s) } var wid,i,j int for i,j = 0,0; i < len(s) && j < n; j++ { wid = bytes.Index(s[i:],old) if wid < 0 { break } i += wid i += copy(s[i:],new) s = append(s[:i],s[i+len(old)-len(new):]...) } return s } 写个性能测试看一下效果: $ go test -bench="." -run=nil -benchmem goos: darwin goarch: amd64 pkg: github.com/thinkeridea/go-extend/exbytes/benchmark BenchmarkReplace-8 500000 3139 ns/op 416 B/op 1 allocs/op BenchmarkBytesReplace-8 1000000 2032 ns/op 736 B/op 2 allocs/op 使用这个新的函数和 找到问题就好了,移动 func Replace(s,j,w int for i,old) if wid < 0 { break } w += copy(s[w:],s[i:i+wid]) w += copy(s[w:],new) i += wid + len(old) } w += copy(s[w:],s[i:]) return s[0:w] } 在运行一下性能测试吧: $ go test -bench="." -run=nil -benchmem goos: darwin goarch: amd64 pkg: github.com/thinkeridea/go-extend/exbytes/benchmark BenchmarkReplace-8 1000000 2149 ns/op 416 B/op 1 allocs/op BenchmarkBytesReplace-8 1000000 2231 ns/op 736 B/op 2 allocs/op 运行性能差不多,而且更好了,内存分配也减少,不是说是零分配吗,为啥有一次分配呢? var replaces string var replaceb []byte func init() { replaces = strings.Repeat("A BC",100) replaceb = bytes.Repeat([]byte("A BC"),100) } func BenchmarkReplace(b *testing.B) { for i := 0; i < b.N; i++ { exbytes.Replace([]byte(replaces),[]byte(" "),[]byte(""),-1) } } func BenchmarkBytesReplace(b *testing.B) { for i := 0; i < b.N; i++ { bytes.Replace([]byte(replaces),-1) } } 可以看到使用了 之前说写一个优化 $ go test -bench="." -run=nil -benchmem goos: darwin goarch: amd64 pkg: github.com/thinkeridea/go-extend/exbytes/benchmark BenchmarkReplace-8 1000000 2149 ns/op 416 B/op 1 allocs/op BenchmarkBytesReplace-8 1000000 2231 ns/op 736 B/op 2 allocs/op BenchmarkStringsReplace-8 1000000 2260 ns/op 1056 B/op 3 allocs/op BenchmarkUnsafeStringsReplace-8 1000000 2522 ns/op 736 B/op 2 allocs/op PASS ok github.com/thinkeridea/go-extend/exbytes/benchmark 10.260s 运行效率上都相当,优化之后的 修改代码有了优化版的 // 第一部分 func (v2 *v2Adapter) parse(s string) (*AdRequest,error) { b := exbytes.Replace(exstrings.UnsafeToBytes(s),err := v2.paramCrypto.Decrypt(b) if err != nil { return nil,err } request := v2.getDefaultAdRequest() if err := request.UnmarshalJSON(requestJSON); err != nil { return nil,err } return request,error) { s = exbytes.Replace(s,err := v3.paramCrypto.Decrypt(s) if err != nil { return nil,err } return requestJSON,nil } // 第三部分 type LogItems []string func LogItemsToBytes(items []string,newline string) []byte { for i := range items { items[i] = exbytes.ToString(exbytes.Replace(exstrings.UnsafeToBytes(items[i]),[]byte(sep),-1)) } b := exbytes.Replace(exstrings.UnsafeToBytes(strings.Join(items,sep)),[]byte(newline),-1) return append(b,newline...) } 上线后性能分析$ go tool pprof allocs2 File: xx Type: alloc_space Time: Feb 2,2019 at 5:33pm (CST) Entering interactive mode (type "help" for commands,"o" for options) (pprof) top exbytes.Replace Focus expression matched no samples Active filters: focus=exbytes.Replace Showing nodes accounting for 0,0% of 864.21GB total flat flat% sum% cum cum% (pprof) 居然在 优化前 $ go tool pprof profile File: xx Type: cpu Time: Feb 1,2019 at 9:54pm (CST) Duration: 30.08s,Total samples = 12.23s (40.65%) Entering interactive mode (type "help" for commands,"o" for options) (pprof) top strings.Replace Active filters: focus=strings.Replace Showing nodes accounting for 0.08s,0.65% of 12.23s total Showing top 10 nodes out of 27 flat flat% sum% cum cum% 0.03s 0.25% 0.25% 0.08s 0.65% strings.Replace 0.02s 0.16% 0.41% 0.02s 0.16% countbody 0.01s 0.082% 0.49% 0.01s 0.082% indexbytebody 0.01s 0.082% 0.57% 0.01s 0.082% memeqbody 0.01s 0.082% 0.65% 0.01s 0.082% runtime.scanobject 优化后 $ go tool pprof profile2 File: xx Type: cpu Time: Feb 2,2019 at 5:33pm (CST) Duration: 30.16s,Total samples = 14.68s (48.68%) Entering interactive mode (type "help" for commands,"o" for options) (pprof) top exbytes.Replace Active filters: focus=exbytes.Replace Showing nodes accounting for 0.06s,0.41% of 14.68s total Showing top 10 nodes out of 18 flat flat% sum% cum cum% 0.03s 0.2% 0.2% 0.03s 0.2% indexbytebody 0.02s 0.14% 0.34% 0.05s 0.34% bytes.Index 0.01s 0.068% 0.41% 0.06s 0.41% github.com/thinkeridea/go-extend/exbytes.Replace 通过 本博文中使用函数均在 go-extend 中,优化后的函数在 exbytes.Replace 中。 转载: 本文作者: 戚银(thinkeridea) 本文链接: https://blog.thinkeridea.com/201902/go/replcae_you_hua.html 版权声明: 本博客所有文章除特别声明外,均采用 CC BY 4.0 CN协议 许可协议。转载请注明出处! (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |