Statistically compare benchmark results #21

New issue

Open

opened 2023-03-01 13:56:07 +00:00 by fyrchik · 1 comment

fyrchik commented

2023-03-01 13:56:07 +00:00

(Migrated from github.com)

Comparing min/avg etc. values is nice but can be misleading.
I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to benchstat .
Basically, it should be obvious for a performance engineer what improvement the change in code produces.
As an example, here is benchstat output:

$ benchstat old.txt new.txt
name        old time/op  new time/op  delta
GobEncode   13.6ms ± 1%  11.8ms ± 1%  -13.31% (p=0.016 n=4+5)
JSONEncode  32.1ms ± 1%  31.8ms ± 1%     ~    (p=0.286 n=4+5)

We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant.

The only difficulty I see is that we might need to store results for all operations in the benchmark. Still possible.

I believe automated regression tests could also use such feature.

cc @anikeev-yadro @jingerbread

Comparing min/avg etc. values is nice but can be misleading. I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to [benchstat](https://godocs.io/golang.org/x/perf/cmd/benchstat) . Basically, it should be _obvious_ for a performance engineer what improvement the change in code produces. As an example, here is benchstat output: ``` $ benchstat old.txt new.txt name old time/op new time/op delta GobEncode 13.6ms ± 1% 11.8ms ± 1% -13.31% (p=0.016 n=4+5) JSONEncode 32.1ms ± 1% 31.8ms ± 1% ~ (p=0.286 n=4+5) ``` We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant. The only difficulty I see is that we might need to store results for _all_ operations in the benchmark. Still possible. I believe automated regression tests could also use such feature. cc @anikeev-yadro @jingerbread

anikeev-yadro commented

2023-03-01 15:07:05 +00:00

(Migrated from github.com)

FYI @dansingjulia