Statistically compare benchmark results #21

Open
opened 2023-03-01 13:56:07 +00:00 by fyrchik · 1 comment
fyrchik commented 2023-03-01 13:56:07 +00:00 (Migrated from github.com)

Comparing min/avg etc. values is nice but can be misleading.
I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to benchstat .
Basically, it should be obvious for a performance engineer what improvement the change in code produces.
As an example, here is benchstat output:

$ benchstat old.txt new.txt
name        old time/op  new time/op  delta
GobEncode   13.6ms ± 1%  11.8ms ± 1%  -13.31% (p=0.016 n=4+5)
JSONEncode  32.1ms ± 1%  31.8ms ± 1%     ~    (p=0.286 n=4+5)

We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant.

The only difficulty I see is that we might need to store results for all operations in the benchmark. Still possible.

I believe automated regression tests could also use such feature.

cc @anikeev-yadro @jingerbread

Comparing min/avg etc. values is nice but can be misleading. I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to [benchstat](https://godocs.io/golang.org/x/perf/cmd/benchstat) . Basically, it should be _obvious_ for a performance engineer what improvement the change in code produces. As an example, here is benchstat output: ``` $ benchstat old.txt new.txt name old time/op new time/op delta GobEncode 13.6ms ± 1% 11.8ms ± 1% -13.31% (p=0.016 n=4+5) JSONEncode 32.1ms ± 1% 31.8ms ± 1% ~ (p=0.286 n=4+5) ``` We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant. The only difficulty I see is that we might need to store results for _all_ operations in the benchmark. Still possible. I believe automated regression tests could also use such feature. cc @anikeev-yadro @jingerbread
anikeev-yadro commented 2023-03-01 15:07:05 +00:00 (Migrated from github.com)

FYI @dansingjulia

FYI @dansingjulia
alexvanin added the
P2
label 2023-03-10 09:15:06 +00:00
snegurochka added the
discussion
enhancement
labels 2023-05-03 17:12:45 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/xk6-frostfs#21
No description provided.