We like functional programming a lot at Mews, and one of the most basic and mundane, yet extremely powerful concepts is Maybe monad. Maybe is called Option in FuncSharp, which is the functional programming embodiment of our choice for C#. Option and its Map method, which is basically a projection of an option’s value (if it has one), is the backbone of our backend:
The performance of such a fundamental method is crucial. Here is a benchmark baseline for a non-empty option Map:
| Method | Mean | Error | StdDev | Allocated | |---------------- |------------:|----------:|-----------:|----------:| | Map | 161.0728 ns | 3.7304 ns | 10.5825 ns | 208 B |
Both memory allocation and average duration are very high, nothing that you would like to have in the heart of your system. For context: a concurrent dictionary lookup only takes ~20 ns on the same machine and it is definitively a much more complex operation.
Let’s investigate where the friction comes from and if it is possible to oil it up.
The Map implementation is very elegant and uses a more general concept: coproduct. Coproducts are basically discriminated unions. Bartozs Milewsky wrote an in depth blog post about coproducts if you want to find out more.
If an option is a coproduct having a value of Unit then the option is empty. If a coproduct has a value of type T then the option also has a value. Match is a method calling a lambda based on its value. If the coproduct has a value of T then the provided Func delegate is invoked to generate the projection, otherwise an empty option is returned.
The source of the execution friction is obvious from every single line of the method.
The C# compiler must create a closure for f since it is not passed to the first lambda Option.Valued(f(a)) as an argument. This must also be responsible for the high memory allocation. The optimization is straightforward: Just use a plain old conditional operator:
This breaks the beauty of the abstraction, but it is a well justified implementation detail, so why not?
Let’s run the benchmark and reap a standing ovation. The performance improvement will be huge for sure:
| Method | Mean | Error | StdDev | Gen 0 | Allocated | |-------------- |---------:|--------:|--------:|-------:|----------:| | Map | 108.0 ns | 2.18 ns | 2.24 ns | 0.0038 | 32 B |
Wait, what? Memory allocation dropped as expected – the method allocates the new projected Option which is unavoidable. But! The average duration is still too high. Why?
Is the non-sealed implementation class and covariant interface the reason? I doubt it, but let’s give it a try:
It is a little bit faster but clearly not enough:
| Method | Mean | Error | StdDev | Gen 0 | Allocated | |-------------- |---------:|--------:|--------:|-------:|----------:| | Map | 102.0 ns | 0.86 ns | 0.76 ns | 0.0038 | 32 B |
I start to remove stuff from the code in such mysterious cases, so I can quickly identify what the real root cause is. So, what can we remove here? Perhaps we can remove an empty option branch from Map. It clearly breaks implementation, but it doesn’t matter for our experiment. We can also remove Empty static field and static constructor since it is unused:
And you might have begun to suspect which way the wind is blowing since the title of this blog is kind of a spoiler:
| Method | Mean | Error | StdDev | Gen 0 | Allocated | |-------------- |---------:|---------:|---------:|-------:|----------:| | Map | 10.17 ns | 0.234 ns | 0.343 ns | 0.0038 | 32 B |
Now, it is what you would expect for such a trivial and fundamental operation as Option projection. It is a remarkable success, but the mystery remains: why does removing the empty branch and static constructor lead to a 10x speedup?
The answer lies in IL Option class declaration:
If we revert to the last changes and restore the static constructor then we get this declaration:
The only difference here is beforefieldinit flag.
The C# compiler emits beforefieldinit flag for a class when all static fields and properties are initialized inline by initializers and there is no static constructor. On the other hand, when a class has a static constructor then C# doesn’t add the flag.
What does this flag mean? There are surprisingly only a few mentions of beforefieldinit on the web. Sure, there is the ECMA-335 Common Language Infrastructure specification (I.8.9.5 Class type definition) which says:
If marked beforefieldinit then the type’s initializer method is executed at, or sometime before, first access to any static field defined for that type.
If not marked beforefieldinit then that type’s initializer method is executed at (i.e., is triggered by):
1) first access to any static field of that type, or
2) first invocation of any static method of that type, or
3) first invocation of any instance or virtual method of that type if it is a value type or
4) first invocation of any constructor for that type.
What does all this really mean, and what are the consequences? This is much more difficult to find an answer to.
At the end, static field initializers are compiled to a static constructor in IL. beforefieldinit flag gives JIT much more freedom to decide when to call it. It seems like JIT is much more careful about making sure that the static constructor was already called when beforefieldinit flag is not present. To do that, JIT adds extensive checks, and they can apparently decrease performance by a lot.
The price of static constructor checks is so high that C# has a dedicated code quality rule for it: CA1810 Initialize reference type static fields inline. The rule description nicely explains the motivation behind the beforefieldinit/static constructor notion:
Static constructor checks can decrease performance. Often a static constructor is only used to initialize static fields, in which case you must only make sure that static initialization occurs before the first access of a static field. The beforefieldinit behavior is appropriate for these and most other types. It is only inappropriate when static initialization affects the global state and one of the following is true:
– The effect on global state is expensive and is not required if the type is not used.
– The global state effects can be accessed without accessing any static fields of the type.
Fortunately, the renowned Jon Skeet dissected this topic even deeper in his brilliant article C# and beforefieldinit. It was the first result Google gave me when I was at the beginning of this story and was indigestible for me at that time :).
Paying the price
JIT is quite clever and can remove checks if it knows that the static constructor was already called even without beforefieldinit flag.
For example, if you have a restricted Map version projecting to the same type, then JIT knows that the static constructor was already called:
JIT can be sure that at the point when new Option<A>() is executed, all static constructor checks are unnecessary – Map is an instance member of Option<A> class. And indeed, in such a less generic form, Map performance is much better:
| Method | Mean | Error | StdDev | |------- |---------:|----------:|----------:| | Map | 7.703 ns | 0.2202 ns | 0.6317 ns |
Even now, when the C# compiler cannot use beforefieldinit flag, then JIT is clever enough to not slow down the execution. This is even faster than previous versions, but it sacrifices IOption value covariance (note missing out before A type argument).
I always thought that
are identical, that they have the same meaning, and it is just a matter of taste which version you want to use. Surprise, surprise, there is a very subtle difference.
I don’t see this as a sign of bad design, I see it as the power and beauty of C# and CLR: you don’t need to mess with the gory details unless you are doing something really demanding. It took me 12 years of professional life to get there :).