Here is an interesting problem that I’ve faced recently that took quite a bit of time to figure out.
Let’s say you have a core library that multi-targets netstandard2.0
and net8.0
. The library could have a bunch of stuff, like helpers for Span<T>
, or just anything else. For the sake of this example, this library just would have one class Config
type with an init-only
property:
// Core.csproj
// <TargetFrameworks>netstandard2.0;net8.0</TargetFrameworks>
namespace Core;
public class Config { public int X { get; init; } }
Obviously, the code won’t compile, since netstandard2.0 version doesn’t have IsExternalInit
type. The solution sounds pretty easy, right? We just add IsExternalInit.cs
file manually (or with some MSBuild magic) with the following content:
#if NETSTANDARD2_0
namespace System.Runtime.CompilerServices;
internal class IsExternalInit;
#endif
We either can add IsExternalInit.cs
conditionally to the project itself if the target is netstandard2.0
or just have #if NETSTANDARD2_0
inside of it. We can’t just add this type for all the targets, but in this case we could face a compilation errors if the Core
project would have InternalsVisibleTo
attribute for a test project that target net8.0
or any other target runtime that has IsExternalInit
type already defined.
Now, we add another library, let’s say, Library.csproj
that targets only netstandard2.0
that uses our Core.csproj
. This might be not a super common case, but I’ve seen quite a few of them in the wild:
// Library.csproj
// <TargetFramework>netstandard2.0</TargetFramework>
public static class ConfigFactory
{
public static Config Create(int value) => new () { X = value };
}
And now we have a console app that targets net8.0
that just uses the factory:
// Application.exe
// <TargetFramework>net8.0</TargetFramework>
using Factory;
var config = ConfigFactory.Create(42);
Console.WriteLine("Done!");
Here is the dependency diagram:
Would you expect any issues with this code? Me neither, to be honest! But here is the output:
Unhandled exception. System.MissingMethodException: Method not found: 'Void Configuration.Config.set_X(Int32)'.
at Factory.ConfigFactory.Create(Int32 value)
at Program.<Main>$(String[] args) in Application/Program.cs:line 3
You can check the IL, and you’ll see that the set_X(Int32)
“method” (which is a property) is definitely exists in the Config
class. But why do we get the error? Is it a compiler bug? Not really!
So here is the issue. Even though the Core.csproj
is multi-targeted, the question is: which version of Core.dll
is actually deployed in the output of folder? The core.dll
that targets .netstandard2.0 or the core.dll
that targets net8.0
? At runtime there is no such a thing as ‘multi-targeting’, the multi-targeting is a build-time feature!
Sine Application
project targets net8.0
and implicitly references Core.csproj
, the net8.0
version is deploy.
Is it a problem? Actually, yes, it is. Let’s check the IL for the ConfigFactory
:
.method public hidebysig static class [Core]Core.Config
Create(
int32 'value'
) cil managed
{
// [7 47 - 7 67]
IL_0000: newobj instance void [Core]Core.Config::.ctor()
IL_0005: dup
IL_0006: ldarg.0 // 'value'
IL_0007: callvirt instance void modreq ([Core]System.Runtime.CompilerServices.IsExternalInit) [Core]Core.Config::set_X(int32)
IL_000c: nop
IL_000d: ret
} // end of method ConfigFactory::Create
Library.csproj
targets netstandard2.0
and uses System.Runtime.CompilerServices.IsExgternalInit
type from Core.dll
, but at runtime we have Core.dll
that targets net8.0
with the following set_X
property:
.property instance int32 X()
{
.get instance int32 Core.Config::get_X()
.set instance void modreq ([System.Runtime]System.Runtime.CompilerServices.IsExternalInit) Core.Config::set_X(int32)
} // end of property Config::X
I.e. the one, that takes IsExternalInit
from System.Runtime
dll and not Core
assembly. Yes, you could have the same types defined in different assemblies, and from the runtime point of view, they’re definitely are the two different types.
So, how can we solve this issue?
The simplest solution is just to use a tool that solved this problem already, for instance, PolySharp
nuget package. But if this is not an option for you for some reason, there are two solutions available.
First, you can add IsExternalInit
unconditionally. This might cause a problem with InternalsVisibleTo
as I mentioned before, and second solution is based on TypeForwardingAttribute
:
#if NETSTANDARD2_0
namespace System.Runtime.CompilerServices;
internal class IsExternalInit;
#else
[assembly: global::System.Runtime.CompilerServices.TypeForwardedTo(
typeof(global::System.Runtime.CompilerServices.IsExternalInit))]
#endif
TypeForwardedToAttribute
tells the runtime where to look the types that supposed to be in the current assembly. In this case, for net8.0
case we’re telling the runtime that IsExternalInit
class is located in BCL and everything works just fine. Btw, this is the solution that PolySharp
library uses under the hood as well.
And indeed, if you won’t specify the C# language version implicitly in the project file the version would be picked based on the target framework: C# 12 for .net8, C#11 for .net7, and C# 7.3 for Full Framework:
And even though the mapping just specifies the defaults, some people believe that the mapping is fixed and, for instance, if you got stuck with Full Framework, you also got stuck with C# 7.3. But this is not the case.
The actual relationship between the C# language version and the target framework is more delicate.
There are 3 ways how the feature might relate to the target framework.
langVersion
in a project file and a new feature works regardless of the target framework.Error CS9064 : Target runtime doesn't support ref fields
.The first and the last cases are quite obvious, but the second one requires a bit of extra information. The C# compiler requires the special types to be available during compilation of the project for the feature to be usable, and it doesn’t care where the type definition is coming from: from the target framework, from a nuget package, or be part of the project itself.
Here is an example of using init-only setters (available since C# 9) in a project targeting netstandard 2.0:
// Project targets netstandard2.0 or net472
public record MyRecord
{
// System.Runtime.CompilerServices.IsExternalInit class is required.
public int X { get; init; }
}
namespace System.Runtime.CompilerServices
{
internal class IsExternalInit { }
}
But if you’ll try to use some other features, like required members, you would have to add quite a bit of extra types to your compilation:
public record class MyRecord
{
// System.Runtime.CompilerServices.IsExternalInit class is required.
public int X { get; init; }
// System.Runtime.CompilerServices.RequiredMemberAttribute,
// CompilerFeatureRequiredAttribute and
// System.Diagnostics.CodeAnalysis.SetsRequiredMembersAttribute are required
public required int Y { get; set; }
}
namespace System.Runtime.CompilerServices
{
internal class IsExternalInit { }
internal class RequiredMemberAttribute : System.Attribute { }
internal sealed class CompilerFeatureRequiredAttribute(string featureName) : System.Attribute
{
public string FeatureName { get; set; } = featureName;
}
}
namespace System.Diagnostics.CodeAnalysis
{
internal class SetsRequiredMembersAttribute : System.Attribute { }
}
Adding all the attributes manually to every project is very tedious, so you can rely on some MSBuild magic to add a set of known files based on the target framework. Or you could just use something like PolySharp that uses source generation to add all the required types regardless of the target framework.
There is an issue with the case shown before. Let’s say you have A.csproj
targeting netstandard2.0
and A.Tests.csproj
targeting net8.0
with InternalVisibleTo("A.Tests")
inside A.csproj
.
In this case, you won’t be able to compile A.Tests.csproj
with an error about duplicate member definition, since the type like IsExternalInit
would be available from two places - from A.csproj
and from .net8.0
runtime library.
The solution is pretty simple: multitarget A.csproj
and target both netstandard2.0
and net8.0
.
And here I want to show all the language features from C# 12 down to C# 8 with their requirements and a link to a github issue that explains the feature.
Language Feature | Requirements |
---|---|
ref-readonly parameters | No extra requirements (1) |
Collection expressions | No extra requirements (2) |
Interceptors | InterceptsLocationAttribute (3) |
Inline Arrays | Runtime support is required: .net8+ |
nameof accessing instance members | No extra requirements |
Using aliases for any types | No extra requirements |
Primary Constructors | No extra requirements |
Lambda optional parameters | No extra requirements |
Experimental Attribute | ExperimentalAttribute (4) |
(1) ref-readonly parameters is an interesting feature. On one hand, it doesn’t require any extra types to be declared manually, but it does rely on an extra type - System.Runtime.CompilerServices.RequiresLocationAttribute
. But if the compilation is missing this type, the compiler would generate it for you!
(2) System.Runtime.CompilerServices.CollectionBuilderAttribute
is needed to support collection expression for custom types.
(3) The full type name is System.Runtime.CompilerServices.InterceptsLocationAttribute
(4) The full type name is System.Diagnostics.CodeAnalysis.ExperimentalAttribute
Language Feature | Requirements |
---|---|
File-local types | No extra requirements |
ref fields a.k.a. low level struct enhancements | .net7+ |
Required properties | RequiredMemberAttribute , CompilerFeatureRequiredAttribute ,SetsRequiredMembersAttribute (1) |
Static abstract members in interfaces | .net7+ |
Numeric IntPtr | No extra requirements |
Unsigned right shift operator | No extra requirements |
utf8 string literals | System.Memory nuget or .net2.1+ |
Pattern matching on ReadOnlySpan<char> |
System.Memory nuget package to get ReadOnlySpan itself. |
Checked Operators | No extra requirements |
auto-default structs | No extra requirements |
Newlines in string interpolations | No extra requirements |
List patterns | System.Index , System.Range (2) |
Raw string literals | No extra requirements |
Cache delegates for static method group | No extra requirements |
nameof(parameter) | No extra requirements |
Relaxing Shift Operator | No extra requirements |
Generic attributes | No extra requirements |
(1) The full type names are System.Runtime.CompilerServices.RequiredMemberAttribute
, System.Runtime.CompilerServices.CompilerFeatureRequiredAttribute
and System.Diagnostics.CodeAnalysis.SetsRequiredMembersAttribute
(2) Some features are going to work only targeting net2.1 or netstandard2.1, for instance the following code requires System.Runtime.CompilerServices.RuntimeHelpers.GetSubArray
to be available:
int[] n = new int[]{ 1 };
if (n is [1, .. var x, 2])
{
}
Language Feature | Requirements |
---|---|
Record structs | No extra requirements |
Global using directives | No extra requirements |
Improved Definite Assignment | No extra requirements |
Constant Interpolated Strings | No extra requirements |
Extended Property Patterns | No extra requirements |
Sealed record ToString | No extra requirements |
Source generators V2 API | No extra requirements |
Mix declarations and variables in deconstruction | No extra requirements |
AsyncMethodBuilder override | AsyncMethodBuilderAttribute (1) |
Enhanced #line directives |
No extra requirements |
Lambda improvements | No extra requirements |
Interpolated string improvements | InterpolatedStringHandler , InterpolatedStringHandlerArgument (2) |
File-scoped namespaces | No extra requirements |
Paremeterless struct constructors | No extra requirements |
CallerArgumentExpression |
CallerArgumentExpressionAttribute |
(1) The full type name is System.Runtime.CompilerServices.AsyncMethodBuilderAttribute
.
(2) The full type names are System.Runtime.CompilerServices.InterpolatedStringHandlerAttribute
and System.Runtime.CompilerServices.InterpolatedStringHandlerArgumentAttribute
.
Language Feature | Requirements |
---|---|
Target-typed new | No extra requirements |
Skip local init | SkipLocalsInitAttribute |
Lambda discard parameters | No extra requirements |
Native ints | No extra requirements |
Attributes on local functions | No extra requirements |
Function pointers | No extra requirements |
Pattern matching improvements | No extra requirements |
Static lambdas | No extra requirements |
Records | No extra requirements |
Target-typed conditional | No extra requirements |
Covariant Returns | .net5.0+ |
Extension GetEnumerator |
No extra requirements |
Module initializers | ModuleInitializerAttribute (1) |
Extending partials | No extra requirements |
Top level statements | No extra requirements |
(1) The full type name is System.Runtime.CompilerServices.ModuleInitializerAttribute
.
Language Feature | Requirements |
---|---|
Default Interface Methods | .net core 3.1+ |
Nullable reference types | A bunch of nullability attributes (1) |
Recursive Patterns | No extra requirements |
Async streams | Microsoft.Bcl.AsyncInterfaces or .net core 3.1+ |
Enhanced usings | No extra requirements |
Ranges | System.Index , System.Range |
Null-coalescing assignment | No extra requirements |
Alternative interpolated strings pattern | No extra requirements |
stackalloc in nested contexts | No extra requirements |
Unmanaged generic structs | No extra requirements |
Static local functions | No extra requirements |
Readonly members | No extra requirements |
(1) There are a lot of attributes: - [AllowNull]
, [DisallowNull]
, [DoesNotReturn]
, [DoesNotReturnIf]
, [MaybeNull]
, [MaybeNullWhen]
, [MemberNotNull], [MemberNotNullWhen]
, [NotNull]
, [NotNullIfNotNull]
, [NotNullWhen]
When the application creates tens of millions of strings with a high repetition rate such optimization is quite helpful and in this case it was reducing the memory footprint by about 10-15%. But when I looked into the profiling data I’ve noticed that the string interning was a huge bottle neck and the application was spending about 96% of the execution time in spin locks inside the string table.
This presented an interesting challenge: while string de-duplication helped with memory usage, it also significantly hurt startup performance, as most calls to string.Intern
were made during app initialization. Removing string interning indeed helped performance quite a lot, but I was cusious if another string de-duplication approaches might be better. So I’ve tried a naive one based on ConcurrentDictionary<string, string>
.
public static class StringCache
{
private static ConcurrentDictionary<string, string> cache = new(StringComparer.Ordinal);
public static string Intern(string str) => cache.GetOrAdd(str, str);
public static void Clear() => cache.Clear();
}
The cache currently uses a static ConcurrentDictionary<string, string
, but it can easily be made non-static and passed around as needed. Additionally, if we know that string de-duplication is only needed during application initialization, we can clear the cache once initialization is complete to avoid keeping transient strings that are not part of the final object graph. Having the ability to clear the cache solves one of the issues that a global string interning cache has.
However, performance of this naive implementation is a concern. To test performance, we need to be careful when benchmarking a global state like the string interning cache, since the benchmark is executed multiple times within the same process, which can skew the data. One solution is to clean a custom table on each iteration, but cleaning the string table cache requires running each iteration in a separate process.
But we need to start somewhere. So lets try this benchmark first:
private List<string> _list;
[Params(10_000, 100_000, 1_000_000)]
public int Count { get; set; }
[GlobalSetup]
public void Setup()
{
_list = Enumerable.Range(1, Count).Select(n => n.ToString()).ToList();
}
[Benchmark]
public void String_Intern()
{
_list.AsParallel().ForAll(static s => string.Intern(s));
}
[Benchmark]
public void StringCache_Intern()
{
_list.AsParallel().ForAll(static s => StringCache.TryIntern(s));
}
In this case we’re measuring the read performance, which still might be a useful thing to check. Here are the results for .NET 8 (but they’re pretty much the same for .NET Framework as well):
| Method | Count | Mean | StdDev | Allocated |
|------------------- |-------- |-------------:|-------------:|----------:|
| String_Intern | 10000 | 3,463.7 us | 47.04 us | 4.04 KB |
| StringCache_Intern | 10000 | 114.5 us | 3.61 us | 4.01 KB |
| String_Intern | 100000 | 39,546.8 us | 1,653.10 us | 4.1 KB |
| StringCache_Intern | 100000 | 1,371.8 us | 129.97 us | 4.03 KB |
| String_Intern | 1000000 | 823,046.8 us | 16,736.25 us | 5.05 KB |
| StringCache_Intern | 1000000 | 32,094.0 us | 3,291.34 us | 4.07 KB |
Ignore the allocations since they’re caused by PLINQ. The time looks bad! Why the built-in version is so slow?
To double check the runtime behavior (and to look the code under the profiler) I’ve decided to write a “simple” console app that calls de-duplication logic on 10M different strings multiple times. This is not the exact scenario our service has but it might be closer than the benchmark.
var bm = new StringInterningBenchmarks() { Count = 10 };
bm.Setup();
bm.String_Intern();
bm.StringCache_Intern();
bm.Count = 10_000_000;
bm.Setup();
GC.Collect();
// to make it easier to see the sections in profiling session
Thread.Sleep(2_000);
var sw = Stopwatch.StartNew();
// The first call will populate the cache
// and the second one will mostly read from the cache.
for (int i = 0; i < 10; i++)
bm.StringCache_Intern();
Console.WriteLine($"Custom string interning is done in {sw.Elapsed}");
GC.Collect();
// to make it easier to see the sections in profiling session
Thread.Sleep(2_000);
sw.Restart();
for (int i = 0; i < 10; i++)
bm.String_Intern();
Console.WriteLine($"String interning is done in {sw.Elapsed}");
The results:
Custom string interning is done in 00:00:03.9975182
String interning is done in 00:01:13.9881888
The difference is still huge (like 15-x). And by playing with the number of iterations, I got different ratios between the string interning and custom cache. It seems that the string interning is drastically slower (like 20-30x) in terms of reads, but “just” 2-3x slower in terms of writes.
And most importantly the string interning performance issue is not theoretical. After switching from the string interning to the custom StringCache
the startup time for our service dropped 2-x! With just a simple change! Plus we got an ability to clean-up the cache to get rid of the cached strings that are not part of the final state.
But before closing this topic, lets run the same custom benchmark with Native AOT:
Custom string interning is done in 00:00:03.3062479
String interning is done in 00:00:05.6756519
Why? The thing is that the string interning logic for both Full Framework and .NET Core is implemented in native code at StringLiteralMap::GetInternedString
. String interning for native AOT has a different implementation and is written in C#! The new implementation uses LockFreReaderHashtable<TKey,TValue>
which is used by the runtime in many other places. And that implementation is WAY MORE efficient than the native string interning implementation. It is somewhat comparable with ConcurrentDictionary
in terms of perf, but requires less memory for keeping all the records.
And running the same benchmark with Native AOT gives drastically different results as well:
| Method | Count | Mean | Error | StdDev | Allocated |
|------------------- |-------- |------------:|------------:|------------:|----------:|
| String_Intern | 10000 | 196.8 us | 3.82 us | 3.92 us | 4.11 KB |
| StringCache_Intern | 10000 | 211.9 us | 4.15 us | 5.67 us | 4.11 KB |
| String_Intern | 100000 | 1,680.1 us | 47.58 us | 140.28 us | 4.14 KB |
| StringCache_Intern | 100000 | 2,102.1 us | 86.83 us | 250.53 us | 4.13 KB |
| String_Intern | 1000000 | 31,059.8 us | 1,349.33 us | 3,827.82 us | 4.16 KB |
| StringCache_Intern | 1000000 | 40,368.6 us | 1,279.83 us | 3,713.02 us | 4.15 KB |
We can’t see the difference in memory consumption, since these benchmarks are essentially the stable state benchmarks, when all the records are already added to the string caches.
string.Intern
in your code you probably should think if you really should.ConcurrentDictionary<string, string>
is drastically faster then the string interning cache and gives you an opportunity to clean-up the cache.Recently, I was browsing a list of courses on Pluralsight and noticed one with a very promising title: “C# 10 Performance Playbook.” As an advanced course on a topic I’m passionate about, I decided to give it a go. I wasn’t sure if I’d find many new things, but since I talk about performance a lot, I’m always looking for an interesting perspective on how to explain this topic to others. The content of this course raised my eyebrows way too much, so I decided to share my perspective on it and use it as a learning opportunity.
This blog post is quite similar to what Nick Chapsas does in his “Code Cop,” with one difference: I’m not going to anonymize the sample code. Since it’s paid content, I feel that I have a right to give a proper review and potentially ask for changes, since the potential damage of such content on a platform like Pluralsight could be quite high.
In this blog post, I want to focus on a single topic that was covered in a section called “Classes, Structs, and Records.” The section is just over six minutes long, and I didn’t expect too many details, since the topic is quite large. But you can be concise and correct.
Here is the first benchmark used for comparing classes vs. structs:
public class ClassvsStruct
{
// This reads all the names from the resource file.
public List<string> Names => new Loops().Names;
[Benchmark]
public void ThousandClasses()
{
var classes = Names.Select(x => new PersonClass { Name = x });
}
[Benchmark]
public void ThousandStructs()
{
var classes = Names.Select(x => new PersonStruct { Name = x });
}
}
The results were:
| Method | Mean | Error | StdDev | Rank |
|---------------- |---------:|---------:|---------:|-----:|
| ThousandStructs | 32.05 us | 0.639 us | 1.136 us | 1 |
| ThousandClasses | 34.11 us | 0.841 us | 2.480 us | 2 |
The author concluded that structs are slightly faster, which is an interesting conclusion given the fact that there were no constructions of classes or structs involved in the code. The difference between the two benchmarks is probably just noise and has nothing to do with the actual performance characteristics of classes or structs.
But that’s not all. Here is the next iteration of the benchmarks:
public class ClassvsStruct
{
// This reads all the names from the resource file.
public List<string> Names => new Loops().Names;
[Benchmark]
public void ThousandClasses()
{
var classes = Names.Select(x => new PersonClass { Name = x });
for (var i = 0; i < classes.Count(); i++)
{
var x = classes.ElementAt(i).Name;
}
}
[Benchmark]
public void ThousandStructs()
{
var classes = Names.Select(x => new PersonStruct { Name = x });
for (var i = 0; i < classes.Count(); i++)
{
var x = classes.ElementAt(i).Name;
}
}
}
The results are:
| Method | Mean | Error | StdDev | Rank |
|---------------- |---------:|----------:|----------:|-----:|
| ThousandStructs | 2.315 ms | 0.0460 ms | 0.0716 ms | 1 |
| ThousandClasses | 9.664 ms | 0.1837 ms | 0.3710 ms | 2 |
And I’m quoting the author: “This time the difference is HUGE!” My first reaction was, “Okay, he’s going to fix this, right? He’s just playing with us, expecting us to catch the issue in the code. You can’t have O(N^2) in the benchmark!” But nope, this was the final version of the code.
Even though I think this is a very bad way to compare structs and classes, let’s use this example to learn how we should be analyzing the results of the benchmarks.
One thing every performance engineer should learn is the ability to interpret and explain the results. For instance, in this case, we changed the benchmarks to consume classes
variable in a loop 1k times, and all of a sudden, the benchmark duration increased by 100x. Is it possible that accessing 1K elements in C# takes milliseconds? This sounds horrible! My gut reaction is that the construction is probably more expensive than the consumption, so I would not expect the benchmark to be significantly slower if done correctly. If you see a 100x difference in performance results, you should stop and think: why am I getting these results? Can I explain them? Is it possible that something is wrong with the benchmark?
In many cases, developers can rely on good abstractions and ignore the implementation details, but this is not true for performance analysis. In order to properly interpret the results, a performance engineer should be able to look through the abstractions and see what’s going on under the hood:
Names
property do? What’s the complexity of accessing it? Is it backed by a field, or do we do some work every time we access it?All of these questions are crucial, since each and every step might drastically affect the results.
If the Names
property is expensive, then the benchmark will be measuring the work it does instead of the code inside the benchmark. And in the author’s case it was reading a list of names from the resource file. Meaning that we were doing a file IO in a benchmark which is not ok.
Different collection types have different performance characteristics. Even though the O-complexity is still the same, you’ll see significant difference between accessing an array or a linked list. Probably, the differences should be insignificant in real world cases, but the benchmark should show it since accessing an array is more cache-friendly since all the data are co-located (especially for structs).
And once you arrive with a hypothesis, you can check it by writing a benchmark that just access the elements of an array vs. elements of linked list with 1K elements:
| Method | Mean | Error | StdDev | Rank |
|------------------------- |-----------:|----------:|----------:|-----:|
| StructAccessInArray | 639.7 ns | 23.60 ns | 67.32 ns | 1 |
| ClassAccessInArray | 776.9 ns | 39.18 ns | 111.14 ns | 2 |
| StructAccessInLinkedList | 4,526.5 ns | 114.47 ns | 332.11 ns | 3 |
| ClassAccessInLinkedList | 4,806.1 ns | 141.65 ns | 410.96 ns | 4 |
These are the results I would expect: less then a nano second for accessing an array, 20-ish % difference between classes and structs and a significant differences between accessing an array vs. accessing a linked list. But even in this case we should not draw any conclusions on how changing array to linked list would affect performance in a real-world cases, since the code normally does way more than just getting the data.
Lastly, it’s important for every .NET engineer to have a solid understanding of algorithmic complexity and how LINQ works. We’ll revisit this topic after the tips, as it’s a key issue with these benchmarks.
The final tip is: make sure you understand the concepts being measured. There are many differences between structs and classes, and your mental model of these constructs should match the results. For example, you know that classes are heap-allocated, while structs can be allocated on the stack or inside other objects, which can impact performance. Classes are references, while structs are values, which can also affect performance in various ways.
However, you should ask yourself if you can interpret the results with your knowledge and intuition. If the answer is “no,” it could be due to a lack of understanding of the concept in this context, a flawed benchmark that introduces noise, or other factors that affect the results that you still don’t understand. In any case, you should not draw any conclusions from data that you can’t interpret.
Now, let’s try to understand the results that were presented.
First of all, we should avoid recomputing the Names
property over and over again. This is bad, especially when the property is getting data from a resource file.
However, the main reason why the benchmarks are not correct is because of LINQ and lazy evaluation.
Let’s take a closer look at the code:
// This reads all the names from the resources.
public List<string> Names => new Loops().Names;
[Benchmark]
public void ThousandClasses()
{
var classes = Names.Select(x => new PersonClass { Name = x });
for (var i = 0; i < classes.Count(); i++)
{
var x = classes.ElementAt(i).Name;
}
}
The classes
variable is an IEnumerable<PersonClass>
, which is essentially a query (or a promise, or a generator) that will produce new results each time we consume it. However, on each iteration, we call classes.Count()
, which calls new Loops().Names
that creates 1,000 PersonClass
instances just to return the number of items we want to consume. When you do O(N) work on each iteration, the entire loop’s complexity becomes O(N^2), which is already quite bad. Then, on each iteration, we call classes.ElementAt(i)
, which probably needs to traverse the entire sequence from the begining again.
This means that the overall complexity is O(2*N^2)
(which I know is still O(N^2)! And this O(2*N^2)
time complexity and O(2*N^2) memory complexity. Meaning that for 1,000 elements, the benchmark could be doing millions of operations and allocating millions of instances of PersonClass
` in the managed heap!
We can confirm this assumption by doing two things: 1) adding the MemoryDiagnoser
attribute to see the allocations and 2) adding another case with either 100 or 10,000 elements to access the asymptotic complexity of the code.
[MemoryDiagnoser]
public class ClassvsStruct
{
[Params(100, 1000)]
public int Count { get; set; }
public List<string> Names => new Loops(Count).Names;
[Benchmark]
public void ThousandClasses() {}
[Benchmark]
public void ThousandStructs() {}
}
And here are the results:
| Method | Count | Mean | Rank | Gen0 | Gen1 | Allocated |
|---------------- |------ |------------:|-----:|----------:|---------:|-----------:|
| ThousandStructs | 100 | 19.40 us | 1 | 0.6104 | - | 3.87 KB |
| ThousandClasses | 100 | 65.38 us | 2 | 39.5508 | 0.4883 | 242.93 KB |
| ThousandStructs | 1000 | 1,342.93 us | 3 | 5.8594 | - | 39.02 KB |
| ThousandClasses | 1000 | 4,844.48 us | 4 | 3835.9375 | 140.6250 | 23523.4 KB |
The results of this run are different from what was presented in the course, since my Loops().Names
property is just a LINQ query. However, the same differences between structs and classes are still present: structs are significantly faster than classes. Why? Because of the allocations. Allocations in the managed heap are fast, but when you need to do millions of them just to iterate the loop, they would skew the results badly. You can clearly see a non-linear complexity here: the count goes from 100 to 1,000 (10x), and the duration goes up by a factor of 70 and the allocations goes up by a factorof 100.
It seems that the complexity is O(N^2) rather than O(2*N^2) as I expected. This is interesting! Obviously, my understanding of LINQ was incorrect.
Why? When I saw the results, my line of reasoning was the loop is O(N), Enumerable.Count()
used in the loop is O(N), and Element.ElementAt(i)
is O(N) as well. So for each loop iteration we iterate the loop from the begining twice.
I first checked the full framework sources:
public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index) {
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null) return list[index];
if (index < 0) throw Error.ArgumentOutOfRange("index");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
while (true) {
if (!e.MoveNext()) throw Error.ArgumentOutOfRange("index");
if (index == 0) return e.Current;
index--;
}
}
}
Hm… This is definitely O(N)!
But what about .NET Core version?
public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
if (source is IPartition<TSource> partition)
{
TSource? element = partition.TryGetElementAt(index, out bool found);
if (found)
{
return element!;
}
}
else if (source is IList<TSource> list)
{
return list[index];
}
else if (TryGetElement(source, index, out TSource? element))
{
return element;
}
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index);
return default;
}
The code is definitely different! There is a different handling of IList<TSource>
and another case for IPartition<TSource>
. What’s that? This is an optimization to avoid excessive work in some common scenarios, like the one we have here. We construct classes
as a projection from List<T>
, so the actual type of classes
is SelectListIterator<TSource, TResult>
that implements IPartition<TResult>
and gets the i-th element without enumerating from the beginning every time.
Again, once we had a hypothesis, we can validate it. In this case, the simplest way to do that is to compare the number of allocations for the full framework and .NET Core versions using a profiler.
Full Framework results:
.NET Core results:
As you can see from the DotTrace output, the .NET Core version calls the PersonClass
constructor 1 million times, and the Full Framework version calls it 1.5 million times. This makes sense since the asymptotic complexity is the worst case that does not always happen. ElementAt(i)
has to iterate up to the i-th element and should go through the entire sequence only on the last iteration. But as you can see, the optimization that .NET Core has is quite significant.
Okay, we’ve analyzed and understood the data, but can I give advice on classes vs. structs? As I’ve mentioned already, this is a complicated topic, and I’m pretty sure benchmarking can’t provide any guidance here. The main difference between the two is the impact on allocations and garbage collection and how the instances are passed arounnd - by reference or via a copy. And its very hard to give an abstract advice on how and when this matters.
When I do a performance analysis, I start with a symptom: “low throughput” (compared to an expected one) or “high memory utilization” (again, compared to either a baseline or just “it looks way too high”). Then I take a few snapshots of the system in various states, run a profiler, or collect some other performance-related metrics. I do look into transient memory allocations to see if the system produces a lot of waste that could be an indication of a unnecessary work: allocating an iterator or a closure on a hot path could easily reduce the throughput of a highly loaded component by 2-3x. But if the allocations are happening infrequently, then I won’t even look there.
If I see GC-related performance issues, I would start looking into how I can optimize things. Using structs instead of classes is an option, but not always the first or the best one. Other options would be to see if we can avoid doing work by caching the results, or use some form of domain-specific optimizations. If I need to reduce allocations, I might switch to structs or try reducing the size of class instances by removing unused or rarely used fields.
Structs are definitely a good tool, but you really need to understand how to use them and when.
ElementAt
is trickier than you might think, and overall, be VERY careful with LINQ in your benchmarks and in hot paths.First, a bit of history. String interpolation is a quite popular concept that was added to C# 6 for creating strings with embedded expressions:
int n = 42;
string s = $"n == {n}"; // s is "n == 42"
But in the original form, this feature had some performance-related issues caused by a fairly naive implementation. To be fair, the language spec was intentionally vague in terms of how exactly the compiler should translate an interpolated string, so it was possible to have a better and more efficient code generation in the future.
Before C# 10, the compiler used to have a farily simple transformation. The code like string s = $"n == {n}"
was simply translated string s = string.Format("n == {0}", n)
.
Here are a few issues with this approach:
ToString
call on a captured expression is required, meaning that a bunch of transient strings will be allocated in the process.Starting from C# 10 all of those issues are solved!
Let’s look at a practical example that will show most of the benefits of the new implementation. Let’s say we have a very simple argument validation library, like RuntimeContracts and we want to check some invariants by calling Contract.Assert(predicate, message)
(*). And if the predicate is false
we want the contract to fail with an optional user-defined error message:
(*) The type name is intentionally the same as in System.Diagnostics.Contracts
namespace, but the “runtime contracts” do not require any tools for rewriting code before using them.
private int _state; // can be changed.
public void DoSomething(int n)
{
for (int i = 0; i < n; i++)
{
Contract.Assert(_state == 42, $"n must be 42 but was {_state}");
}
}
Can you see the issue here? The check is called in the loop and the message will be created on each iteration! This can be very problematic and can cause real issues if the code is on an application’s hot path. Let see how we can avoid allocations with the interpolated string improvements.
Instead of “lowering” an interpolated string to string.Format
call, the C# 10 compiler now uses “Interpolated String Handlers” pattern.
The handler is a type that follows a specific pattern: it must have a constructor that takes at least 2 arguments: literalLength
and formattedCount
, and may take some optional arguments as well as we’ll see later, and must have at least two methods: AppendLiteral(string)
and AppendFormatted<T>(T)
. The type must also be marked with a special attribute - InterpolatedStringHandlerAttribute
.
Starting from C# 6 an interpolated string expression was assignable to string
or System.FormattableString
and now it can be assigned to any type that follows the aforementioned pattern. Starting with .NET 6 there is a built-in handler called DefaultInterpolatedStringHandler
and by default, the compiler “lowers” an interpolated string expression to it.
int n = 0;
// s is System.String
var s = $"n == {n}";
// s2 is of type 'DefaultInterploatedStringHandler'
DefaultInterpolatedStringHandler s2 = $"n == {n}";
If you decompile this code you’ll see the changes in action:
int i = 0;
DefaultInterpolatedStringHandler defaultInterpolatedStringHandler = new DefaultInterpolatedStringHandler(5, 1);
defaultInterpolatedStringHandler.AppendLiteral("n == ");
defaultInterpolatedStringHandler.AppendFormatted(i);
// s is System.String
string s = defaultInterpolatedStringHandler.ToStringAndClear();
defaultInterpolatedStringHandler = new DefaultInterpolatedStringHandler(5, 1);
defaultInterpolatedStringHandler.AppendLiteral("n == ");
defaultInterpolatedStringHandler.AppendFormatted(i);
// s2 is of type 'DefaultInterploatedStringHandler'
DefaultInterpolatedStringHandler s2 = defaultInterpolatedStringHandler;
The DefaultInterpolatedStringHandler
is more efficient compared to a regular string.Format
call in multiple ways:
AppendFormatted
.AppendFormatted<T>(T)
avoids boxing when value types are captured in an interpolated string expression.ISpanFormattable
type is respected, and that allows writing an object’s string representation into a Span<char>
without allocating a separate string. (many built-in types do implement this interface already).AppendFormatted(ReadOnlySpan<char>)
that allows capturing the span of char in the interpolated expression that was not possible before: string s = $"Str={strArg.AsSpan().Trim()}"
.Here is a small benchmark that shows the differences:
[MemoryDiagnoser]
public class PerformanceBenchmark
{
private readonly DateTime _when = DateTime.Now;
private readonly long _v1 = 1;
private readonly long _v2 = 2;
private readonly long _v3 = 3;
[Benchmark]
public string StringFormat()
{
return string.Format("When: {0}, V1={1}, V2={2}, V3={2}", _when, _v1, _v2, _v3);
}
[Benchmark]
public string NewInterpolation()
{
return $"When: {_when}, V1={_v1}, V2={_v2}, V3={_v2}";
}
}
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|----------------- |---------:|---------:|--------:|-------:|----------:|
| StringFormat | 518.0 ns | 10.34 ns | 8.63 ns | 0.0648 | 272 B |
| NewInterpolation | 392.7 ns | 7.55 ns | 6.70 ns | 0.0286 | 120 B |
As we can see, the new implementation is 25% faster and allocates less than half of the string.Format
version.
ISpanFormattable
?A default API for getting a string representation of an object is Object.ToString()
that every (**) type supports. But calling ToString
by definition causes an extra allocation of a resulting string. And if you need to compose a string from multiple objects it may cause a lot of excessive allocations. To avoid this, many high performance applications instead of using Object.ToString
also have void ToString(StringBuilder)
for constructing a composed text without creating an extra string each time.
(**) Not every type per se, because pointers are types and they don’t support ToString()
. And ref structs must define ToString
methods explicitly because the base version defined in System.ValueType
is not accessible for them.
But starting with .NET 6 we have ISpanFormattable
interface that derives from IFormattable
and has one extra method:
namespace System;
public interface ISpanFormattable : IFormattable
{
/// <summary>
/// Tries to format the value of the current instance into the provided span of characters.
/// </summary>
bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}
ISpanFormattable
allows writing an object’s text representation into a destination
Span<char>
if the destination is large enough to accept it.
The API of this interface looks scary and maybe labor-intensive to do this manually all the time. Luckily, we can use interpolated strings to write into a Span<char>
as well!
public readonly struct Point : ISpanFormattable
{
public int X { get; }
public int Y { get; }
public Point(int x, int y) => (X, Y) = (x, y);
public override string ToString() =>
ToString(format: null, formatProvider: null);
public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider provider) =>
destination.TryWrite($"X={X}, Y={Y}", out charsWritten);
public string ToString(string format, IFormatProvider formatProvider) =>
return string.Create(formatProvider, $"X={X}, Y={Y}");
}
In this case, TryFormat
method calls MemoryExtensions.TryWrite
that will do exactly what we want: it will try adding a newly produced string into a target span if the destination has enough space.
Besides writing to a span, .NET 6 also updated the StringBuilder
API like Append
and AppendLine
to leverage new interpolated string handlers.
The calls like stringBuilder.AppendLine($"X = {X}, Y = {Y}");
used to create a separate string that was added to a StringBuilder
instance. But now both StringBuilder.Append
and StringBuilder.AppendLine
are taking AppendInterpolatedStringHandler
that appends an interpolated string in a very efficient way.
Ok, now it’s time to create a custom handler that will solve the issue that we had with our Contract.Assert
method.
Let’s start with a special handler type:
[InterpolatedStringHandler]
public ref struct ContractMessageInterpolatedStringHandler
{
// Will delegate all the work here!
private DefaultInterpolatedStringHandler _handler;
public ContractMessageInterpolatedStringHandler(int literalLength, int formattedCount, bool predicate, out bool handlerIsValid)
{
_handler = default;
if (predicate)
{
// If the predicate is evaluated to 'true', then we don't have to construct a message!
handlerIsValid = false;
return;
}
handlerIsValid = true;
_handler = new DefaultInterpolatedStringHandler(literalLength, formattedCount);
}
public void AppendLiteral(string s) => _handler.AppendLiteral(s);
public void AppendFormatted<T>(T t) => _handler.AppendFormatted(t);
public override string ToString() => _handler.ToStringAndClear();
}
Now we can change the Contract.Assert
signature to take the handler, and by using InterpolatedStringHandlerArgument
we can “tell” the compiler to pass the predicate
parameter to the constructor of the handler as well:
public static class Contract
{
// "Telling" the compiler to pass the 'predicate' parameter to the handler.
public static void Assert(bool predicate, [InterpolatedStringHandlerArgument("predicate")] ref ContractMessageInterpolatedStringHandler handler)
{
if (!predicate)
{
throw new Exception($"Precondition failed! Message:{handler.ToString()}");
}
}
}
Let’s check what will happen at runtime:
int n = 0;
// Contract is not violated! No messages will be constructed!
Contract.Assert(true, $"No side effects! n == {++n}");
The output will be:
n == 0
The compiler emitted the following code:
bool predicate = true;
bool handlerIsValid;
var handler = new ContractMessageInterpolatedStringHandler(22, 1, predicate, out handlerIsValid);
if (handlerIsValid)
{
handler.AppendLiteral("No side effects! n == ");
handler.AppendFormatted(++i);
}
Contract.Requires(predicate, ref handler);
The compiler generates the code that creates an instance of ContractMessageInterpolatedStringHandler
and passes the length of a string literal and the number of slots. It also passes the predicate flag that the handler checks and sets ‘handlerIsValid` depending on its value. And if the handler is invalid (because the assertion is not violated) we completely skip the message construction!
And now we can call Contract.Assert
with a custom error message in a loop and not be afraid of performance issues caused by excessive message construction!
private int _state; // can be set and changed.
public void DoSomething(int n)
{
for (int i = 0; i < n; i++)
{
// No performance issues anymore! The string will never be constructed if the assertion is not violated!
Contract.Assert(_state == 42, $"n must be 42 but was {_state}");
}
}
As always, the C# compiler uses the pattern-based approach for the new interpolated string improvements and it means that we can define required attributes manually in our code (but still put them into System.Runtime.CompilerServices
namespace) and use the new behavior with the older frameworks.
await
-ing in interpolated stringsOne thing that you may have noticed is that the interpolation string handlers are ref-structs and you may remember that ref-structs have some restrictions: they can’t be “allocated” in the managed heap so they can’t be embedded into other non-ref structs or objects. And because of that, they can’t be used in async methods.
But the following code was working fine before and should be working just fine in C# 10:
public async Task FooAsync()
{
string s = $"x = {await Task.Run(() => 42)}";
}
The language designers knew that the async case would be problematic. So they had a few options: 1) make handlers non-ref structs or 2) use different code generation when async code is involved. They decided to go with the second option and keep the handlers as ref structs and fallback in the async case to the old option and generate string.Format
call instead.
Contract.Assert
. The same “trick” can be used by logging frameworks to avoid string creation if the logging level is off.ReadOnlySpan<char>
like string s = "foo bar "; string str = $"Trimmed: {s.AsSpan().Trim()}";
.ISpanFormattable
is a very handy interface that allows an object’s string representation to be written into a span without allocating a string.MemoryExtensions.TryWrite
is a building block for implementing ISpanFormattable
interface using interpolated strings.StringBuilder.Append
and AppendLine
were updated in .NET 6 to use interpolated string handlers for higher efficiency.FileStream.Position
property.
The question is: how safe or unsafe the access to FileStream.Position
from another thread is? Of course, without any synchronization in place, the “watcher” could be a bit off and get a previous file position. And because Position
property is of type long
the read operation could yield some very weird results on a 32-bit platform for files larger than 2Gb. And, of course, the runtime could potentially do some weird optimizations due to lack of synchronization (even though this is not likely to happen in practice).
But is it possible for the watcher thread to affect the copy operation in a more drastic way? Like to corrupt the file?
Let’s do an experiment.
[Test]
public void ReadFileStreamPositionFromDifferentThread()
{
const string path = "test.txt";
int N = 10_000;
int blockSize = 1024;
using (var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write))
using (var writer = new StreamWriter(fileStream))
{
var cts = new CancellationTokenSource();
// Start a background position reader
Task.Run(async () =>
{
while (cts.IsCancellationRequested)
{
// Tracing the position. In this case, just obtaining it.
long currentPosition = fileStream.Position;
await Task.Delay(1);
}
});
for (int i = 0; i < N; i++)
{
// Generate blocks of 'a's, then 'b's etc to 'z's
var output = new string((char)('a' + (i%26)), blockSize);
writer.WriteLine(output);
}
cts.Cancel();
}
var fileLength = new FileInfo(path).Length;
// Need to count \r\n as well
var expectedLength = (blockSize + Environment.NewLine.Length) * N;
Assert.That(fileLength, Is.EqualTo(expectedLength));
}
We have a very simple code that writes synchronously to a file with blocks of 1024 characters N
times. We can increase the N
to be in millions, we can deploy this code to production and never see any errors for years. So we can make a conclusion that it is safe to read the FileStream.Position
property while the other thread writes the content to the file.
And then we make a simple change. We either call FileStream.SafeFileHandle
property on a FileStream
instance or we start creating a FileStream
by calling, for instance, new FileStream(safeHandle, FileAccess.Write)
.
[Test]
public void ReadFileStreamPositionFromDifferentThreadWithSafeFileHandleExposed()
{
const string path = "test.txt";
int N = 10_000;
int blockSize = 1024;
using (var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write))
using (var writer = new StreamWriter(fileStream))
{
// This is the key difference here: touching SafeFileHandle property.
var handle = fileStream.SafeFileHandle;
var cts = new CancellationTokenSource();
// Start a background position reader
Task.Run(async () =>
{
while (cts.IsCancellationRequested)
{
// Tracing the position. In this case, just obtaining it.
long currentPosition = fileStream.Position;
await Task.Delay(1);
}
});
for (int i = 0; i < N; i++)
{
// Generate blocks of 'a's, then 'b's etc to 'z's
var output = new string((char)('a' + (i%26)), blockSize);
writer.WriteLine(output);
}
cts.Cancel();
}
var fileLength = new FileInfo(path).Length;
// Need to count \r\n as well
var expectedLength = (blockSize + Environment.NewLine.Length) * N;
Assert.That(fileLength, Is.EqualTo(expectedLength));
}
And now, if we run the test, we’ll get a failure, Expected: 10260000 But was: 10258976
. What. Is. Going. On. Here?
When the internal file handle is exposed (by calling FileStream.SafeFileHandle
or by creating a FileStream
instance by a given SafeFileHandle
), then a FileStream
instance forces some additional internal safety checks. If FileStream._exposedHandle
is true, then every read, write, flush or Position
getter calls VerifyOSHandlePosition
, that calls SeekCore(0, SeekOrigin.Current)
that reads the current position of the file and updates a current position by changing _pos
field.
It means, that if _exposedHandle
is true, the call to FileStream.Position
is no longer pure! It updates a FileStream
internal state that can affect a write operation happening in the other thread. To understand the problem, let’s take a look at FileStream.BeginWriteCore
implementation (that is called from synchronous Write
as well):
unsafe private FileStreamAsyncResult BeginWriteCore(byte[] bytes, int offset, int numBytes, AsyncCallback userCallback, Object stateObject)
{
// Create and store async stream class library specific data in the async result
FileStreamAsyncResult asyncResult = new FileStreamAsyncResult(0, bytes, _handle, userCallback, stateObject, true);
NativeOverlapped* intOverlapped = asyncResult.OverLapped;
if (CanSeek) {
// Make sure we set the length of the file appropriately.
long len = Length;
//Console.WriteLine("BeginWrite - Calculating end pos. pos: "+pos+" len: "+len+" numBytes: "+numBytes);
// Make sure we are writing to the position that we think we are
if (_exposedHandle)
VerifyOSHandlePosition();
if (_pos + numBytes > len) {
//Console.WriteLine("BeginWrite - Setting length to: "+(pos + numBytes));
SetLengthCore(_pos + numBytes);
}
// Now set the position to read from in the NativeOverlapped struct
// For pipes, we should leave the offset fields set to 0.
intOverlapped->OffsetLow = (int)_pos;
intOverlapped->OffsetHigh = (int)(_pos>>32);
If the file is not yet flushed and the next write operation is called when another thread calls FileStream.Position
property, then the internal _pos
field can be changed based on actual file position, effectively losing one of the rights and corrupting the content of the file!
No one should assume that a property is thread-safe unless it’s clearly stated in the documentation and there are no such claims for any FileStream
properties. On the other hand, when we think about thread unsafety due to concurrent reads of a property we rarely think about such drastic effects like corrupted files. Framework Design Guidelines taught us to treat properties as smart fields without such drastic side effects like IO operations in a property getter.
I do understand that the FileStream
implementation tries its best to protect us, the users, from undesirable errors and inconsistent state. But I also believe that such side effects, like potential file corruptions, should be more explicitly documented.
TLDR; Reading a FileStream.Position
from another thread during write operations when a stream’s underlying SafeFileHandle
is exposed, is extremely dangerous and may cause file corruption.
P.S. The issue could happen in full framework as well as in .NET Core.
It was a very important lesson for me, that even a simple change could have a drastic effect on a distributed system.
We’ve been running a service with concurrent Position
reads for many years without any issues and a simple change in the code that switched FileStream
to an “unsafe” mode caused a very strange and hard to understand issues in the system. But that was a very useful lesson for me anyway.
P.S. The issue affects both .NET Framework version as well as .NET Core version of FileStream
.
To understand the problem, let’s review the following code. Suppose we have a service that processes internal requests in a “dedicated thread”. To do that it creates a long-running task by passing TaskCreationOptions.LongRunning
into Task.Factory.StartNew
method and creates a continuation for error reporting purpose.
public class Processor
{
private Task _task;
private readonly BlockingCollection<Request> _queue;
public Processor()
{
_task = Task.Factory.StartNew(LoopAsync, TaskCreationOptions.LongRunning);
_task.ContinueWith(_ =>
{
// Trace the error.
// Maybe even restart the loop.
}, TaskContinuationOptions.OnlyOnFaulted);
}
public void Stop() => _queue.CompleteAdding();
private async Task LoopAsync()
{
foreach (var request in _queue.GetConsumingEnumerable())
{
await ProcessRequest(request);
}
}
}
What is the problem with this code? Quite a few, actually. And all of them are related to LoopAsync
method return type.
First of all, let’s think about the long-running aspect. TaskCreationOptions.LongRunning
indicates that a given operation is such a long running procedure that it deserves a dedicated thread. That makes sense because indeed LoopAsync
can run for the entire lifetime of the service until Stop
method is called.
But here is the catch: from CLR’s point of view the duration of LoopAsync
is not “linear” and the operation “finishes” on the first await
. It means that this code spawns a thread just to wait for and to process the first request. And once the first request is processed, the continuation inside LoopAsync
is called in a thread pool’s thread causing the original thread to die.
The code creates unnecessary threads and this is not the best thing in the world, but this is not the most dangerous part here.
The type of the _task
field is Task
, but what is the actual type of the object at runtime? Is it just System.Threading.Tasks.Task
? The actual type is Task<Task>
.
Task.Factory.StartNew
“wraps” the result of a given delegate into the task, and if the delegate itself returns a task, then the result is a task that creates a task.
In this case, it means that the error handling here is completely wrong. _task.ContinueWith
creates a continuation of an outer task that will fail only if something will go terribly wrong with the system and the TPL will fail to launch a new thread. Otherwise, the outer task will succeed “hiding” potential issues with the inner task.
Here is a simpler example:
static void Main(string[] args)
{
var task = Task.Factory.StartNew(async () =>
{
Console.WriteLine("Inside the delegate");
throw new Exception("Error");
return 42;
}, TaskCreationOptions.LongRunning);
task.ContinueWith(
_ => { Console.WriteLine($"Error: {_.Exception}"); },
TaskContinuationOptions.OnlyOnFaulted);
Console.ReadLine();
}
When we run this code we’ll see Inside the delegate
message on the screen and nothing else. And if we’ll check the status of the task
variable at runtime we’ll notice that the task is actually finished successfully and the continuation, that supposes to handle the error, is never called.
What should you do in this case? The simplest solution is just to switch to Task.Run
that will return an underlying task because the API was designed with async methods in mind.
TaskExtensions.Unwrap
extension method to get the underlying task from Task<Task>
instance.But if you have to use Task.Factory.StartNew
because you need to pass some other task creation options, then you can “unwrap” the resulting task to obtain the underlying task instance:
static void Main(string[] args)
{
var task = Task.Factory.StartNew(async () =>
{
Console.WriteLine("Inside the delegate");
throw new Exception("Error");
return 42;
}).Unwrap();
// Now, task actually points to the underlying task and the next continuation works as expected.
task.ContinueWith(
_ => { Console.WriteLine($"Error: {_.Exception}"); },
TaskContinuationOptions.OnlyOnFaulted);
Console.ReadLine();
}
One way at least to mitigate the issues like this is to always react to unhandled exceptions in tasks. When a task fails but the user fails to “observe” the error, the TaskScheduler.UnobservedTaskException
is triggered. Back in .NET 4.0 days unhandled task exceptions were “critical” and were causing an application to crash. Starting from .NET 4.5 the default behavior has changed (*) and unhandled task exceptions may stay unnoticed (use <ThrowUnobservedTaskExceptions>
configuration section if you want to change it back).
(*) The reason for this change is quite simple: it is extremely simple in this “async” days to get an unobserved task exception. Simple code like this can cause it:
var t1 = AsyncMethod1();
var t2 = AsyncMethod2();
// If both t1 and t2 will fail, then t2's error will be unobserved.
await t1;
await t2;
Task.Factory.StartNew
with TaskCreationOptions.LongRunning
if the given delegate is backed by an async method.Task.Run
over Task.Factory.StartNew
and use the latter only when you really have to.Task.Factory.StartNew
with async methods, always call Unwrap
to get the underlying task back.If you work on a codebase that was started in .NET 4.0 era, I would highly recommend you search for Task.Factory.StartNew
usages and double check that you don’t have the issues mentioned in this post.