Effects systems benchmark
Following my Functional Conf talk, I have prepared a small effects systems benchmark.
Note: Don't bother watching it, it was painful, I have coughed all the time, I probably have expelled a piece of lungs at some point (mostly kidding).
For ZuriHac 2020, Alexis King did an amazing work and talk, focusing on micro-benchmarking each effects systems, focusing on the raw performances.
It's still a cornerstone to make effects move forward.
However, it became an authority argument in most of the debates I had,
while in production code there's IOs and computations.
I've tried to come up with a different one, in which I can inject some IOs/computations.
Let's take the basic structure:
data Behavior = Behavior { loops :: Int, action :: Int -> IO () }
mkTarget embed getCounter setCounter Behavior {..} = go
where go = do
n <- getCounter
when (n < loops) $ do
embed $ action n
setCounter $ n + 1
go
The firsts arguments are "effect systems"-dependents, dealing with IO and state.
For example for IO:
target ref = mkTarget id (readIORef ref) (writeIORef ref)
And for polysemy:
target = mkTarget embed get put
Then we have Behavior which is benchmark parameterization, for instance:
ioOfMs delay _ = threadDelay $ delay * 1000
pureComputation start x = void $ evaluate $ x + read (fib $ show start)
fib =
\case
"0" -> "0"
"1" -> "1"
n -> show $ read @Int (fib $ show $ read @Int n - 1) + read (fib $ show $ read @Int n - 2)
loops is always set to 1000.
Note: for some reason, GHC optimize naive fibonacci sequence aggressively so much,
I suspect it turns it into a O(n) function.
Let's have a look at results:
With no IO/computation

It's more or less the results shown in the hereinabove mentioned talk.
Let's add light computation

Differences are fading away, ranging from 8.5 ms to 12.6 ms.
It's even more compelling with heavy computation

Differences are now neglectable, ranging from 67 ms to 73 ms.
With an IO of 1 ms

Differences are already neglectable, ranging from 1.144 s to 1.157 ms.
Actually, no API call is that fast, for example,
Redis calls are expected to run in 7 ms, which gives:

This benchmark is far from being perfect, for instance, IO/computations are run
on each iteration and not each n, which could impact the results.
That being said, my hope with this work is to provide a contextualized benchmark, that way, the pure performance of an effects system would not be the only decision criteria.