Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Turbocharged: Writing High Performance C# and ....

Turbocharged: Writing High Performance C# and .NET Code (50 mins)

In this session, you'll learn how to write C# code which executes faster and allocates less. This session is packed with practical examples and demos of where the latest high-performance APIs and language features can be applied in your applications.

During this session, we'll apply types such as Span and Memory to efficiently process data and to parse strings. We'll examine System.IO.Pipelines, offering high-performance I/O and we'll utilise ArrayPool to help reduce GC allocations. In .NET Core 3.0, we have new high-performance JSON APIs which we'll also add to our arsenal. Microsoft has made fantastic performance gains to the .NET Core framework; now it's time to apply them to your code!

We'll begin by discussing when and why performance matters in your applications. You'll learn how to measure your code, and use a data-driven approach to focus your optimizations.

These features can seem complicated, unapproachable and difficult to apply. In this session, Steve introduces high-performance newcomers to the features, showing you how they work, where they can be applied, and how to measure performance improvements in your code.

This talk is for developers, who like Steve, are ready to begin their journey towards writing faster .NET code, which allocates less.

Steve Gordon

May 22, 2023
Tweet

More Decks by Steve Gordon

Other Decks in Technology

Transcript

  1. @stevejgordon www.stevejgordon.co.uk • What is performance? • Measuring application and

    code performance • Span<T>, ReadOnlySpan<T> and Memory<T> • ArrayPool • System.IO.Pipelines and ReadOnlySequence<T>
  2. @stevejgordon www.stevejgordon.co.uk • Visual Studio Diagnostic Tools (debugging) • Visual

    Studio Profiling / PerfView / dotTrace / dotMemory • ILSpy / JustDecompile / dotPeek / ILDASM • Production metrics and monitoring • Elastic APM Agent for .NET
  3. @stevejgordon www.stevejgordon.co.uk • Library for .NET (micro)benchmarking • High precision

    measurements • Used extensively by .NET Runtime, CoreClr and ASP.NET Core teams https://benchmarkdotnet.org https://github.com/dotnet/BenchmarkDotNet
  4. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  5. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  6. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  7. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  8. @stevejgordon www.stevejgordon.co.uk namespace BenchmarkExample { public class Program { public

    static void Main(string[] args) => _ = BenchmarkRunner.Run<NameParserBenchmarks>(); } [MemoryDiagnoser] public class NameParserBenchmarks { private const string FullName = "Steve J Gordon"; private static readonly NameParser Parser = new NameParser(); [Benchmark] public void GetLastName() { Parser.GetLastName(FullName); } } }
  9. @stevejgordon www.stevejgordon.co.uk // * Summary * BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362 Intel

    Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.0.100 [Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated | ------------ |-----------:|-----------:|-----------:|-----------:|-------:|-------:|-------:|----------:| GetLastName | 163.18 ns | 3.1903 ns | 4.2590 ns | 161.87 ns | 0.0379 | - | - | 160 B |
  10. @stevejgordon www.stevejgordon.co.uk • System.Memory package. Built into .NET Core 2.1.

    • Provides a read/write 'view' onto a contiguous region of memory • Heap (Managed objects) – e.g. Arrays, Strings • Stack (via stackalloc) • Native/Unmanaged (P/Invoke) • Index / Iterate to modify the memory within the Span • Almost no overhead
  11. @stevejgordon www.stevejgordon.co.uk Slicing a Span is a constant time/cost operation

    – O(1) Int[] myArray = new int[9] Span<int> span1 = myArray.AsSpan() Span<int> span2 = span1.Slice(start: 2, length: 5) Int[9] 0 1 2 3 4 5 6 7 8 0 1 2 3 4
  12. Requirement: We need a method, that takes an array and

    returns ¼ of its elements, starting from the middle element.
  13. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  14. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  15. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { private int[] _myArray;

    [Params(100, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i < Size; i++) _myArray[i] = i; } // MORE CODE COMING RIGHT UP!!...
  16. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { // SETUP METHODS

    UP HERE! ... [Benchmark(Baseline = true)] public int[] Original() => _myArray.Skip(Size / 2).Take(Size / 4).ToArray(); ... }
  17. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Gen 1 | Gen 2 | Allocated | |----------- |------ |---------------:|------:|-------:|-------:|------:|----------:| | Original | 100 | 154.9018 ns | 1.00 | 0.0534 | - | - | 224 B | | | | | | | | | | | Original | 1000 | 727.2669 ns | 1.00 | 0.2670 | - | - | 1120 B | | | | | | | | | | | Original | 10000 | 7,332.0136 ns | 1.00 | 2.4109 | - | - | 10120 B |
  18. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { ... [Benchmark] public

    int[] ArrayCopy() { var newArray = new int[Size / 4]; Array.Copy(_myArray, Size / 2, newArray, 0, Size / 4); return newArray; } ... }
  19. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Gen 1 | Gen 2 | Allocated | |----------- |------ |---------------:|-------:|-------:|-------:|------:|----------:| | Original | 100 | 154.9018 ns | 1.000 | 0.0534 | - | - | 224 B | | ArrayCopy | 100 | 24.5267 ns | 0.159 | 0.0051 | - | - | 128 B | | | | | | | | | | | Original | 1000 | 727.2669 ns | 1.000 | 0.2670 | - | - | 1120 B | | ArrayCopy | 1000 | 104.7282 ns | 0.142 | 0.1627 | - | - | 1024 B | | | | | | | | | | | Original | 10000 | 7,332.0136 ns | 1.000 | 2.4109 | - | - | 10120 B | | ArrayCopy | 10000 | 801.1695 ns | 0.109 | 1.5917 | - | - | 10024 B |
  20. @stevejgordon www.stevejgordon.co.uk [MemoryDiagnoser] public class ArrayBenchmarks { ... [Benchmark] public

    Span<int> Span() => _myArray.AsSpan().Slice(Size / 2, Size / 4); ... }
  21. @stevejgordon www.stevejgordon.co.uk | Method | Size | Mean | Ratio

    | Gen 0 | Gen 1 | Gen 2 | Allocated | |----------- |------ |---------------:|-------:|-------:|-------:|------:|----------:| | Original | 100 | 154.9018 ns | 1.000 | 0.0534 | - | - | 224 B | | ArrayCopy | 100 | 24.5267 ns | 0.159 | 0.0051 | - | - | 128 B | | Span | 100 | 0.9233 ns | 0.006 | - | - | - | - | | | | | | | | | | | Original | 1000 | 727.2669 ns | 1.000 | 0.2670 | - | - | 1120 B | | ArrayCopy | 1000 | 104.7282 ns | 0.142 | 0.1627 | - | - | 1024 B | | Span | 1000 | 0.9016 ns | 0.000 | - | - | - | - | | | | | | | | | | | Original | 10000 | 7,332.0136 ns | 1.000 | 2.4109 | - | - | 10120 B | | ArrayCopy | 10000 | 801.1695 ns | 0.109 | 1.5917 | - | - | 10024 B | | Span | 10000 | 0.9095 ns | 0.000 | - | - | - | - |
  22. @stevejgordon www.stevejgordon.co.uk S ReadOnlySpan<char> t e v e J G

    o r d o n ReadOnlySpan<char>.Slice(start: 8) ReadOnlySpan<char> span = "Steve J Gordon".AsSpan(); G o r d o n
  23. @stevejgordon www.stevejgordon.co.uk • It's a stack only Value Type -

    ref struct • Requires C# >= 7.2 for ref struct feature • Cannot be a field in a class or standard (non ref) struct • Cannot be used as an argument or local variable inside async methods • Cannot be captured by lambda expressions
  24. @stevejgordon www.stevejgordon.co.uk • Similar to Span<T> but can live on

    the heap • A readonly struct but not a ref struct • Slightly slower to slice into Memory<T> • Span property to get a Span over the same memory
  25. @stevejgordon www.stevejgordon.co.uk // CS4012 Parameters or locals of type 'Span<byte>'

    cannot be declared // in async methods or lambda expressions. private async Task SomethingAsync(Span<byte> data) { ... // Would be nice to do something with the Span here await Task.Delay(1000); }
  26. @stevejgordon www.stevejgordon.co.uk private async Task SomethingAsync(Memory<byte> data) { Memory<byte> dataSliced

    = data.Slice(0, 100); await Task.Delay(1000); } private void SomethingNotAsync(Span<byte> data) { // some code }
  27. @stevejgordon www.stevejgordon.co.uk private async Task SomethingAsync(Memory<byte> data) { // CS4012

    Parameters or locals of type 'Span<byte>' cannot be declared // in async methods or lambda expressions. var span = data.Span.Slice(1); SomethingNotAsync(span); await Task.Delay(1000); } private void SomethingNotAsync(Span<byte> data) { // some code }
  28. @stevejgordon www.stevejgordon.co.uk Microservice which: 1. Reads an SQS message 2.

    Deserialise the JSON message 3. Stores a copy of the message to S3 using an object key derived from properties of the message. S3ObjectKeyGenerator
  29. @stevejgordon www.stevejgordon.co.uk | Method | Mean [ns] | Ratio |

    Gen0 | Allocated [B] | Ratio | |------------- |----------:|---------:|-------:|--------------:|------:| | Original | 790.1 ns | | 0.1154 | 728 B | | | SpanBased | 386.5 ns | -51% | 0.0305 | 192 B | -74% | | StringCreate | 310.9 ns | -60% | 0.0305 | 192 B | -74% | >2x Faster ~3.8x Less Allocations 18 million messages: Reduction of 9.65GB of allocations daily Removes approx. 1528 Gen 0 collections
  30. @stevejgordon www.stevejgordon.co.uk • Pool of arrays for re-use • Found

    in System.Buffers • ArrayPool<T>.Shared.Rent(int length) • You are likely to get an array larger than your minimum size • ArrayPool<T>.Shared.Return(T[] array, bool clearArray = false) • Warning! By default, returned arrays are not cleared
  31. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var buffer = new byte[1000]; // allocates DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  32. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var buffer = new byte[1000]; // allocates DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  33. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var arrayPool = ArrayPool<byte>.Shared; var buffer = arrayPool.Rent(1000); DoSomethingWithBuffer(buffer); } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  34. @stevejgordon www.stevejgordon.co.uk public class Processor { public void DoSomeWorkVeryOften() {

    var arrayPool = ArrayPool<byte>.Shared; var buffer = arrayPool.Rent(1000); try { DoSomethingWithBuffer(buffer); } finally { arrayPool.Return(buffer); } } private void DoSomethingWithBuffer(byte[] buffer) { // use the array } }
  35. @stevejgordon www.stevejgordon.co.uk • Originally created by ASP.NET team to improve

    Kestrel rps • Improves I/O performance scenarios (~2x vs. streams) • Removes common hard to write, boilerplate code • Unlike streams, pipelines manages buffers for you from the ArrayPool • Two ends to a pipe, a PipeWriter and a PipeReader
  36. @stevejgordon www.stevejgordon.co.uk PipeWriter : IBufferWriter<byte> Pipe PipeReader Memory<byte> m =

    pw.GetMemory(); … pw.Advance(1000) await pw.FlushAsync() ReadResult r = await reader.ReadAsync(); ReadOnlySequence<byte> b = r.Buffer;
  37. @stevejgordon www.stevejgordon.co.uk Microservice which: 1. Retrieves S3 object (TSV file)

    from AWS 2. Decompresses file 3. Parses TSV to get 3 of 25 columns for each row 4. Indexes data to Elasticsearch CloudFrontParser
  38. @stevejgordon www.stevejgordon.co.uk | Method | Mean |Ratio | Gen 0

    | Gen 1 | Gen 2 | Allocated |Ratio | |---------- |----------:|-----:|---------:|---------:|---------:|----------:|-----:| | Original | 47.47 ms | - | 14090.91 | 3272.73 | 1454.55 | 100.68 MB | - | | Optimised | 10.86 ms | -77% | 546.87 | 531.25 | 15.63 | 3.35 MB | -97% | Processing 1 file of 10,000 rows ~30x Less Heap Memory Allocations NOTE: ~2.85MB are the string allocations for the parsed data. Overhead = 0.5MB
  39. @stevejgordon www.stevejgordon.co.uk •Identify a quick win •Use a scientific approach

    to demonstrate gains •Put gains into a monetary value •Cost to benefit ratio
  40. @stevejgordon www.stevejgordon.co.uk This work is a small part of a

    much bigger potential gain For a single microservice handling 18 million messages per day Reduction of at least 50% of allocations. At least 1 less VM needed per year saving $1,700 Potential to at least double per instance throughput
  41. @stevejgordon www.stevejgordon.co.uk A single (micro)service could save $1,700. These gains

    can scale with additional (micro)services. $17,000?? $170,000???
  42. @stevejgordon www.stevejgordon.co.uk • Measure, don't assume! • Be scientific; make

    small changes each time and measure again • Focus on hot paths • Don't copy memory, slice it! Span<T> is less complex than it may first seem. • Use ArrayPools where appropriate to reduce array allocations • Consider Pipelines for I/O scenarios