Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Turbocharged: Writing High Performance C# and ....

Steve Gordon
October 23, 2023

Turbocharged: Writing High Performance C# and .NET Code (60 mins)

Steve Gordon

October 23, 2023
Tweet

More Decks by Steve Gordon

Other Decks in Technology

Transcript

  1. @stevejgordon www.stevejgordon.co.uk What we'll cover • What is performance? •

    Measuring application and code performance • Span<T>, ReadOnlySpan<T> and Memory<T> • ArrayPool • System.IO.Pipelines and ReadOnlySequence<T> • System.Text.Json
  2. @stevejgordon www.stevejgordon.co.uk Measuring Application Performance • Visual Studio Diagnostic Tools

    (debugging) • Visual Studio Profiling / PerfView / dotTrace / dotMemory • ILSpy / JustDecompile / dotPeek / ILDASM • Production metrics and monitoring • Elastic APM Agent for .NET
  3. @stevejgordon www.stevejgordon.co.uk Benchmark .NET • Library for .NET (micro)benchmarking •

    High precision measurements • Extra data and output available using diagnosers • Compare performance on different platforms, architectures, JIT versions and GC Modes • Used extensively by .NET Runtime, CoreClr and ASP.NET Core teams https://benchmarkdotnet.org https://github.com/dotnet/BenchmarkDotNet
  4. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  5. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  6. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  7. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  8. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  9. @stevejgordon www.stevejgordon.co.uk // * Summary * BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 10

    (10.0.19045.3570/22H2/2022Update) 11th Gen Intel Core i5-1135G7 2.40GHz, 1 CPU, 8 logical and 4 physical cores .NET SDK 8.0.100-rc.2.23502.2 [Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2 DefaultJob : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2 | Method | Mean | Error | StdDev | Gen0 | Allocated | |------------ |---------:|---------:|---------:|-------:|----------:| | GetLastName | 116.2 ns | 10.96 ns | 32.15 ns | 0.0343 | 144 B | (1 / 0.0343) x 1000 = 29,154.5 operations before Gen 0 collection.
  10. @stevejgordon www.stevejgordon.co.uk Span<T> • System.Memory package. Built into .NET Core

    2.1. • Provides a read/write 'view' onto a contiguous region of memory • Heap (Managed objects) – e.g. Arrays, Strings • Stack (via stackalloc) • Native/Unmanaged (P/Invoke) • Index / Iterate to modify the memory within the Span • Almost no overhead
  11. @stevejgordon www.stevejgordon.co.uk Span<T>.Slice Slicing a Span is a constant time/cost

    operation – O(1) Int[] myArray = new int[9] Span<int> span1 = myArray.AsSpan() Span<int> span2 = span1.Slice(start: 2, length: 5) Int[9] 0 1 2 3 4 5 6 7 8 0 1 2 3 4
  12. Requirement: We need a method, that takes an array and

    returns ¼ of its elements, starting from the middle element.
  13. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  14. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  15. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  16. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { // SETUP METHODS

    UP HERE! ... [Benchmark(Baseline = true)] public int[] Original() => _myArray.Skip(Size / 2).Take(Size / 4).ToArray(); ... }
  17. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Allocated | Alloc Ratio | |----------- |------ |---------------:|-------:|-------:|----------:|------------:| | Original | 100 | 103.6874 ns | | 0.0535 | 224 B | | | | | | | | | | | Original | 1000 | 638.2920 ns | | 0.2670 | 1120 B | | | | | | | | | | | Original | 10000 | 5,924.3520 ns | | 2.4109 | 10120 B | |
  18. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { ... [Benchmark] public

    int[] ArrayCopy() { var newArray = new int[Size / 4]; Array.Copy(_myArray, Size / 2, newArray, 0, Size / 4); return newArray; } ... }
  19. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Allocated | Alloc Ratio | |----------- |------ |---------------:|-------:|-------:|----------:|------------:| | Original | 100 | 103.6874 ns | | 0.0535 | 224 B | | | ArrayCopy | 100 | 14.1013 ns | -86.3% | 0.0306 | 128 B | -43% | | | | | | | | | | Original | 1000 | 638.2920 ns | | 0.2670 | 1120 B | | | ArrayCopy | 1000 | 52.6257 ns | -91.7% | 0.1627 | 1024 B | -9% | | | | | | | | | | Original | 10000 | 5,924.3520 ns | | 2.4109 | 10120 B | | | ArrayCopy | 10000 | 419.1335 ns | -92.9% | 1.5917 | 10024 B | -1% |
  20. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { ... [Benchmark] public

    Span<int> Span() => _myArray.AsSpan().Slice(Size / 2, Size / 4); ... }
  21. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Allocated | Alloc Ratio | |----------- |------ |---------------:|-------:|-------:|----------:|------------:| | Original | 100 | 103.6874 ns | | 0.0535 | 224 B | | | ArrayCopy | 100 | 14.1013 ns | -86.3% | 0.0306 | 128 B | -43% | | Span | 100 | 0.7088 ns | -99.2% | - | - | -100% | | | | | | | | | | Original | 1000 | 638.2920 ns | | 0.2670 | 1120 B | | | ArrayCopy | 1000 | 52.6257 ns | -91.7% | 0.1627 | 1024 B | -9% | | Span | 1000 | 0.6492 ns | -99.9% | - | - | -100% | | | | | | | | | | Original | 10000 | 5,924.3520 ns | | 2.4109 | 10120 B | | | ArrayCopy | 10000 | 419.1335 ns | -92.9% | 1.5917 | 10024 B | -1% | | Span | 10000 | 0.6643 ns | -99.9% | - | - | -100% |
  22. @stevejgordon www.stevejgordon.co.uk Working with Strings S ReadOnlySpan<char> t e v

    e J G o r d o n ReadOnlySpan<char>.Slice(start: 8) ReadOnlySpan<char> span = "Steve J Gordon".AsSpan(); G o r d o n
  23. @stevejgordon www.stevejgordon.co.uk Span<T> Limitations • It's a stack only Value

    Type - ref struct • Requires C# >= 7.2 for ref struct feature • Cannot be boxed • Cannot be a field in a class or standard (non ref) struct • Cannot be used as an argument or local variable inside async methods • Cannot be captured by lambda expressions
  24. @stevejgordon www.stevejgordon.co.uk Memory<T> • Similar to Span<T> but can live

    on the heap • A readonly struct but not a ref struct • Slightly slower to slice into Memory<T> • Call its Span property to get a Span over the same data
  25. @stevejgordon www.stevejgordon.co.uk // CS4012 Parameters or locals of type 'Span<byte>'

    cannot be declared // in async methods or lambda expressions. private async Task SomethingAsync(Span<byte> data) { ... // Would be nice to do something with the Span here await Task.Delay(1000); }
  26. @stevejgordon www.stevejgordon.co.uk private async Task SomethingAsync(Memory<byte> data) { Memory<byte> dataSliced

    = data.Slice(0, 100); await Task.Delay(1000); } private void SomethingNotAsync(Span<byte> data) { // some code }
  27. @stevejgordon www.stevejgordon.co.uk private async Task SomethingAsync(Memory<byte> data) { // CS4012

    Parameters or locals of type 'Span<byte>' cannot be declared // in async methods or lambda expressions. var span = data.Span.Slice(1); SomethingNotAsync(span); await Task.Delay(1000); } private void SomethingNotAsync(Span<byte> data) { // some code }
  28. @stevejgordon www.stevejgordon.co.uk Putting it into practice – Key Builder Microservice

    which: 1. Reads an SQS message 2. Deserialise the JSON message 3. Stores a copy of the message to S3 using an object key derived from properties of the message. S3ObjectKeyGenerator
  29. @stevejgordon www.stevejgordon.co.uk Object Key Builder Benchmarks | Method | Mean

    [ns] | Ratio | Gen0 | Allocated [B] | Ratio | |------------- |----------:|---------:|-------:|--------------:|------:| | Original | 557.6 ns | | 0.1736 | 728 B | | | SpanBased | 235.1 ns | -56% | 0.0458 | 192 B | -74% | ~2x Faster ~3.8x Less Allocations 18 million messages: Reduction of 9.65GB of allocations daily
  30. @stevejgordon www.stevejgordon.co.uk ArrayPool • Pool of arrays for re-use •

    Found in System.Buffers • ArrayPool<T>.Shared.Rent(int length) • You are likely to get an array larger than your minimum size • ArrayPool<T>.Shared.Return(T[] array, bool clearArray = false) • Warning! By default returned arrays are not cleared
  31. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var buffer = new byte[1000]; // allocates DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  32. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var buffer = new byte[1000]; // allocates DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  33. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var arrayPool = ArrayPool<byte>.Shared; var buffer = arrayPool.Rent(1000); DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array - must now track position of final byte and slice } }
  34. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var arrayPool = ArrayPool<byte>.Shared; var buffer = arrayPool.Rent(1000); try { DoSomethingWithBuffer(buffer); } finally { arrayPool.Return(buffer); } } private void DoSomethingWithBuffer(byte[] buffer) { // use the array - must now track position of final byte and slice } }
  35. @stevejgordon www.stevejgordon.co.uk System.IO.Pipelines • Originally created by ASP.NET team to

    improve Kestrel rps • Improves I/O performance scenarios (~2x vs. streams) • Removes common hard to write, boilerplate code • Unlike streams, pipelines manages buffers for you from the ArrayPool • Two ends to a pipe, a PipeWriter and a PipeReader
  36. @stevejgordon www.stevejgordon.co.uk Pipelines PipeWriter : IBufferWriter<byte> Pipe PipeReader Memory<byte> m

    = pw.GetMemory(); … pw.Advance(1000) await pw.FlushAsync() ReadResult r = await reader.ReadAsync(); ReadOnlySequence<byte> b = r.Buffer;
  37. @stevejgordon www.stevejgordon.co.uk Putting it into practice: Span<T> Parsing Microservice which:

    1. Retrieves S3 object (TSV file) from AWS 2. Decompresses file 3. Parses TSV to get 3 of 25 columns for each row 4. Indexes data to Elasticsearch CloudFrontParser
  38. @stevejgordon www.stevejgordon.co.uk TSV Parsing Optimisation - Results | Method |

    Mean |Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |Ratio | |---------- |----------:|-----:|---------:|---------:|---------:|----------:|-----:| | Original | 46.662 ms | - | 15833.33 | 3333.33 | 1250.00 | 96.98 MB | - | | Optimised | 8.584 ms | -81% | 578.13 | 468.88 | 46.88 | 3.32 MB | -97% | Processing 1 file of 10,000 rows ~30x Less Heap Memory Allocations NOTE: ~2.85MB are the string allocations for the parsed data. Overhead = 0.45MB
  39. @stevejgordon www.stevejgordon.co.uk System.Text.Json APIs - .NET Core 3.0 • In

    the box (>= .NET Core 3.0) JSON APIs • Low-Level – Utf8JsonReader and Utf8JsonWriter • Mid-Level – JsonDocument • High-Level – JsonSerializer and JsonDeserializer
  40. @stevejgordon www.stevejgordon.co.uk Putting it into practice: Parsing JSON Microservice which:

    1. Perform Elasticsearch Bulk Index 2. Deserialise JSON response to check for errors 3. Return a list of the IDs which errored BulkResponseParser
  41. @stevejgordon www.stevejgordon.co.uk System.Text.Json vs JSON.NET - Results | Method |

    Mean | Ratio | Gen 0 | Gen 1 | Allocated | Alloc Ratio | |---------- |-------------:|-------:|---------:|-------:|----------:|------------:| | Original | 192,558.9 ns | | 22.9492 | - | 94.13 KB | | | Optimised | 201.8 ns | -99.9% | - | - | 0 KB | -100% | Processing Successful Response | Method | Mean | Ratio | Gen 0 | Gen 1 | Allocated | Alloc Ratio | |---------- |-------------:|-------:|---------:|-------:|----------:|------------:| | Original | 195,890.0 ns | | 24.1699 | 0.2441 | 99.4 KB | | | Optimised | 63,950.0 ns | -67% | 3.7482 | - | 15.7 KB | -84% | Processing Failure Response
  42. @stevejgordon www.stevejgordon.co.uk Business Buy-In •Identify a quick win •Use a

    scientific approach to demonstrate gains •Put gains into a monetary value •Cost to benefit ratio
  43. @stevejgordon www.stevejgordon.co.uk Cost Saving Example: Input Processor This work is

    a small part of a much bigger potential gain For a single microservice handling 18 million messages per day Reduction of at least 50% of allocations. At least 1 less VM needed per year saving $1,700 Potential to at least double per instance throughput
  44. @stevejgordon www.stevejgordon.co.uk Scale Matters A single (micro)service could save $1,700.

    These gains can scale with additional (micro)services. $17,000?? $170,000???
  45. @stevejgordon www.stevejgordon.co.uk Summary • Measure, don't assume! • Be scientific;

    make small changes each time and measure again • Focus on hot paths • Don't copy memory, slice it! Span<T> is less complex than it may first seem. • Use ArrayPools where appropriate to reduce array allocations • Consider Pipelines for I/O scenarios • Consider System.Text.Json APIs for high-performance JSON parsing