Go1.1性能測(cè)試報(bào)告(和C差距在10%以內(nèi))
最近Go1.1正式發(fā)布, 根據(jù)官方的說(shuō)法, Go1.1性能比Go1.0提升基本有30%-40%, 有時(shí)更多(當(dāng)然也有不明顯的情況).
Go1.1的詳細(xì)介紹: Go1.1新特性介紹(語(yǔ)言和庫(kù)更完善/性能提高約30%).
這里是針對(duì)Go1.1和C語(yǔ)言的性能測(cè)試: 測(cè)試的重點(diǎn)是語(yǔ)言的性能, 當(dāng)然也會(huì)受到標(biāo)準(zhǔn)庫(kù)性能的影響.
測(cè)試環(huán)境
- 測(cè)試程序: $GOROOT/test/bench/shootout/timing.sh
- 硬件配置: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz; 16GB內(nèi)存
- 操作系統(tǒng): CentOS6.3 x86_64
補(bǔ)充: i7-3770是4核心8線程.
gcc
和gc
版本:
- gcc -v
- gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
- go version
- go version go1.1 linux/amd64
測(cè)試結(jié)果
- $GOROOT/test/bench/shootout/timing.sh
- fasta -n 25000000
- gcc -m64 -O2 fasta.c 0.86u 0.00s 0.87r
- gc fasta 0.85u 0.00s 0.86r
- gc_B fasta 0.83u 0.00s 0.83r
- reverse-complement < output-of-fasta-25000000
- gcc -m64 -O2 reverse-complement.c 0.45u 0.05s 0.50r
- gc reverse-complement 0.60u 0.05s 0.65r
- gc_B reverse-complement 0.55u 0.04s 0.59r
- nbody -n 50000000
- gcc -m64 -O2 nbody.c -lm 5.51u 0.00s 5.52r
- gc nbody 7.16u 0.00s 7.18r
- gc_B nbody 7.12u 0.00s 7.14r
- binary-tree 15 # too slow to use 20
- gcc -m64 -O2 binary-tree.c -lm 0.31u 0.00s 0.31r
- gc binary-tree 1.08u 0.00s 1.07r
- gc binary-tree-freelist 0.15u 0.00s 0.15r
- fannkuch 12
- gcc -m64 -O2 fannkuch.c 26.45u 0.00s 26.54r
- gc fannkuch 35.99u 0.00s 36.08r
- gc fannkuch-parallel 73.40u 0.00s 18.58r
- gc_B fannkuch 25.18u 0.00s 25.25r
- regex-dna 100000
- gcc -m64 -O2 regex-dna.c -lpcre 0.25u 0.00s 0.26r
- gc regex-dna 1.65u 0.00s 1.66r
- gc regex-dna-parallel 1.72u 0.01s 0.67r
- gc_B regex-dna 1.64u 0.00s 1.65r
- spectral-norm 5500
- gcc -m64 -O2 spectral-norm.c -lm 9.63u 0.00s 9.66r
- gc spectral-norm 9.63u 0.00s 9.66r
- gc_B spectral-norm 9.63u 0.00s 9.66r
- k-nucleotide 1000000
- gcc -O2 k-nucleotide.c -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -lglib-2.0 2.62u 0.00s 2.63r
- gc k-nucleotide 2.69u 0.01s 2.71r
- gc k-nucleotide-parallel 3.02u 0.00s 0.97r
- gc_B k-nucleotide 2.66u 0.01s 2.68r
- mandelbrot 16000
- gcc -m64 -O2 mandelbrot.c 20.95u 0.00s 21.01r
- gc mandelbrot 23.73u 0.00s 23.79r
- gc_B mandelbrot 23.72u 0.00s 23.79r
- meteor 2098
- gcc -m64 -O2 meteor-contest.c 0.05u 0.00s 0.05r
- gc meteor-contest 0.06u 0.00s 0.07r
- gc_B meteor-contest 0.06u 0.00s 0.06r
- pidigits 10000
- gcc -m64 -O2 pidigits.c -lgmp 0.77u 0.00s 0.77r
- gc pidigits 1.45u 0.01s 1.44r
- gc_B pidigits 1.45u 0.01s 1.43r
- threadring 50000000
- gcc -m64 -O2 threadring.c -lpthread 12.05u 261.20s 216.36r
- gc threadring 6.61u 0.00s 6.63r
- chameneos 6000000
- gcc -m64 -O2 chameneosredux.c -lpthread 4.04u 21.08s 4.20r
- gc chameneosredux 4.97u 0.00s 4.99r
測(cè)試結(jié)果說(shuō)明
其中gc_B
是開(kāi)了-B
選項(xiàng), 選項(xiàng)的說(shuō)明如下:
- go tool 6g -h
- usage: 6g [options] file.go...
- -+ compiling runtime
- -% debug non-static initializers
- -A for bootstrapping, allow 'any' type
- -B disable bounds checking
- ...
應(yīng)該就是禁用了Go的slice下標(biāo)越界檢測(cè)等特性.
測(cè)試的結(jié)果顯示Go的性能已經(jīng)和C語(yǔ)言已經(jīng)非常接近了, 有極個(gè)別的甚至比C還好(binary-tree
).
根據(jù)$GOROOT/test/bench/shootout/timing.log
的數(shù)據(jù), gccgo
的優(yōu)化應(yīng)該更好一點(diǎn).
不過(guò)gccgo
的標(biāo)準(zhǔn)庫(kù)比gc
性能可能要差一些(因此有些測(cè)試性能比gc
差一些).
我電腦沒(méi)有安裝gccgo, 因此只有g(shù)cc/gc/gc_B三個(gè)測(cè)試結(jié)果.
關(guān)于 BenchmarksGame 的測(cè)試差異
http://benchmarksgame.alioth.debian.org/u64q/go.php
BenchmarksGame的測(cè)試結(jié)果中, 有幾個(gè)Go的性能很差:
- Benchmark Time Memory Code
- fasta 3× 3× ±
- spectral-norm 4× 3× ±
- binary-trees 13× 4× ±
- regex-dna † 26× ± 1/4
其中 spectral-norm
和 binary-trees
的 C 版本都開(kāi)了 #pragma omp
優(yōu)化(這已經(jīng)不是C語(yǔ)言級(jí)別的比較了).
而 Go 的 binary-trees
啟動(dòng)了很多 goroutine
, Go并發(fā)的版本和C的非并發(fā)版本比較肯定也是不合理的.
其中regex
的測(cè)試主要是Go的regex標(biāo)準(zhǔn)庫(kù)和C的高度優(yōu)化的pcre
比較. 目前Go的regex
庫(kù)還有待進(jìn)一步的優(yōu)化.
關(guān)于其他的各個(gè)程序, 實(shí)現(xiàn)和$GOROOT/test/bench/shootout都是有一定的差異的.
官方的測(cè)試結(jié)論
http://go.googlecode.com/hg/test/bench/shootout/timing.log:
- # Sep 26, 2012
- # 64-bit ints, plus significantly better floating-point code.
- # Interesting details:
- # Generally something in the 0-10% slower range, some (binary tree) more
- # Floating-point noticeably faster:
- # nbody -25%
- # mandelbrot -37% relative to Go 1.
- # Other:
- # regex-dna +47%
Go已經(jīng)和C差距在10%以內(nèi), 有特殊場(chǎng)景性能甚至更好.