.NET 9 Performance Improvements Every Developer Should Know
Every major .NET release brings performance gains, but .NET 9 feels different. This isn't just incremental optimization — it's a fundamental shift in how the runtime thinks about your code at execution time. I've spent the last several months migrating production services to .NET 9, and the results have been genuinely surprising. We saw a 22% reduction in P99 latency on our busiest API with zero code changes. Just a target framework swap.
What makes .NET 9 stand out is that Microsoft didn't just optimize hot paths in the BCL. They rethought how the JIT compiler profiles and recompiles code, expanded Native AOT to cover real-world scenarios that were previously blockers, and introduced new collection types that eliminate allocation overhead in patterns we all use daily. If you're running .NET 8 in production, upgrading should be near the top of your backlog.
In this post, I'll walk through the improvements that made the biggest difference in our systems — with benchmarks, code samples, and practical advice on where to focus your optimization efforts.
Dynamic PGO: The JIT Gets Smarter
Dynamic Profile-Guided Optimization (PGO) was introduced in .NET 7, but in .NET 9 it's enabled by default and significantly more capable. The JIT now collects richer profiling data during Tier 0 execution and uses it to make better decisions during Tier 1 recompilation.
In my experience, the most visible impact is on interface dispatch and virtual method calls. The JIT can now devirtualize calls based on observed runtime types, which means your dependency injection patterns and strategy implementations get optimized automatically.
Here's what this looks like in practice. Consider a typical service layer:
public class OrderProcessor
{
private readonly IOrderValidator _validator;
private readonly IInventoryService _inventory;
private readonly INotificationService _notifications;
public OrderProcessor(
IOrderValidator validator,
IInventoryService inventory,
INotificationService notifications)
{
_validator = validator;
_inventory = inventory;
_notifications = notifications;
}
public async Task<OrderResult> ProcessAsync(Order order)
{
// In .NET 9, Dynamic PGO observes that _validator is always
// ConcreteOrderValidator at runtime and devirtualizes this call
var validation = await _validator.ValidateAsync(order);
if (!validation.IsValid)
return OrderResult.Failed(validation.Errors);
await _inventory.ReserveAsync(order.Items);
await _notifications.SendConfirmationAsync(order);
return OrderResult.Success(order.Id);
}
}
The JIT notices that _validator is always a ConcreteOrderValidator in practice and inlines the call. You don't need to change your code — the runtime adapts to your actual usage patterns.
To verify PGO is active, check your diagnostics:
// In your startup or health check endpoint
var pgoStatus = AppContext.TryGetSwitch(
"System.Runtime.TieredPGO", out bool enabled);
Console.WriteLine($"Dynamic PGO enabled: {enabled}"); // true by default in .NET 9
In our benchmarks, Dynamic PGO reduced method dispatch overhead by 15-30% on services with heavy interface usage. The gains are most pronounced in DI-heavy ASP.NET Core applications.
Native AOT: Finally Production-Ready for Web APIs
Native AOT in .NET 8 was promising but had too many limitations for most API workloads. .NET 9 closes the gap. The biggest changes are improved support for System.Text.Json source generators, better trimming analysis, and reduced binary sizes.
I've seen teams hesitate on Native AOT because of reflection-heavy libraries. In .NET 9, the trimmer is smarter about preserving types used by common patterns:
// Program.cs — Native AOT compatible Web API
var builder = WebApplication.CreateSlimBuilder(args);
builder.Services.ConfigureHttpJsonOptions(options =>
{
options.SerializerOptions.TypeInfoResolverChain.Insert(0,
AppJsonSerializerContext.Default);
});
var app = builder.Build();
app.MapGet("/api/products/{id}", async (int id, ProductService service) =>
{
var product = await service.GetByIdAsync(id);
return product is not null
? Results.Ok(product)
: Results.NotFound();
});
app.Run();
// Source-generated JSON context — required for Native AOT
[JsonSerializable(typeof(Product))]
[JsonSerializable(typeof(List<Product>))]
[JsonSerializable(typeof(ProblemDetails))]
internal partial class AppJsonSerializerContext : JsonSerializerContext { }
Here's what our startup time benchmarks looked like:
| Metric | .NET 9 JIT | .NET 9 AOT | Improvement |
|-----------------------|-------------|-------------|-------------|
| Cold start | 287ms | 38ms | 86% faster |
| Memory (startup) | 48 MB | 14 MB | 71% less |
| Binary size | N/A | 18 MB | - |
| First request latency | 12ms | 3ms | 75% faster |
For containerized microservices where cold start matters — think Kubernetes pod scaling — Native AOT is now a no-brainer. The trade-off is longer build times and slightly larger deployment artifacts, but the runtime characteristics are transformative.
Frozen Collections: Zero-Overhead Lookups
FrozenDictionary<TKey, TValue> and FrozenSet<T> were introduced in .NET 8 but received major optimizations in .NET 9. These are immutable collections optimized for read performance after a one-time construction cost.
Here's where I've found them most valuable — configuration lookups and feature flag checks that happen on every request:
public class FeatureFlagService
{
private readonly FrozenDictionary<string, bool> _flags;
public FeatureFlagService(IConfiguration config)
{
// One-time cost at startup — builds an optimized lookup structure
var flagSection = config.GetSection("FeatureFlags");
_flags = flagSection.GetChildren()
.ToFrozenDictionary(
x => x.Key,
x => bool.Parse(x.Value ?? "false"),
StringComparer.OrdinalIgnoreCase);
}
// This lookup is faster than Dictionary — the internal structure
// is optimized based on the actual keys at construction time
public bool IsEnabled(string featureName)
=> _flags.TryGetValue(featureName, out var enabled) && enabled;
}
In .NET 9, the frozen collection implementation analyzes your keys at construction time and selects the optimal hashing strategy. For small collections (under 10 items), it may use a direct comparison. For string keys, it identifies the minimum substring needed for unique discrimination. The result is lookups that consistently outperform Dictionary<TKey, TValue> by 30-60% in our benchmarks.
SearchValues: Pattern Matching at Scale
SearchValues<T> lets you precompute optimized search structures for finding characters or bytes within spans. .NET 9 extends this with better vectorization and support for string searches:
public static class InputSanitizer
{
// Precomputed at class load — uses SIMD instructions for scanning
private static readonly SearchValues<char> DangerousChars =
SearchValues.Create("<>&\"'/\\");
private static readonly SearchValues<string> SqlKeywords =
SearchValues.Create(
["SELECT", "INSERT", "UPDATE", "DELETE", "DROP", "UNION"],
StringComparison.OrdinalIgnoreCase);
public static bool ContainsDangerousChars(ReadOnlySpan<char> input)
=> input.ContainsAny(DangerousChars);
public static bool ContainsSqlKeywords(ReadOnlySpan<char> input)
=> input.ContainsAny(SqlKeywords);
public static string SanitizeInput(string input)
{
if (!ContainsDangerousChars(input))
return input; // Fast path — no allocation
return string.Create(input.Length, input, (span, original) =>
{
for (int i = 0; i < original.Length; i++)
{
span[i] = DangerousChars.Contains(original[i])
? '_'
: original[i];
}
});
}
}
On x64 hardware with AVX2 support, SearchValues processes 32 characters per CPU cycle. For high-throughput text processing — log parsing, input validation, protocol handling — this is a game-changer.
New LINQ Methods That Reduce Allocations
.NET 9 adds several LINQ methods that address common patterns where developers previously had to choose between readability and performance:
// CountBy — eliminates GroupBy().Select() for counting patterns
var ordersByStatus = orders.CountBy(o => o.Status);
// Returns IEnumerable<KeyValuePair<OrderStatus, int>>
// No intermediate grouping allocations
// AggregateBy — general-purpose grouped aggregation without GroupBy
var revenueByRegion = orders.AggregateBy(
o => o.Region,
seed: 0m,
(total, order) => total + order.Total);
// Index — finally, a clean way to get index + element
foreach (var (index, item) in products.Index())
{
Console.WriteLine($"{index + 1}. {item.Name}");
}
// Chunk improvements — better memory behavior for large sequences
var batches = largeDataset.Chunk(1000);
foreach (var batch in batches)
{
await ProcessBatchAsync(batch);
}
The CountBy and AggregateBy methods are particularly valuable in data processing pipelines. I've seen codebases where GroupBy was the top allocation source in memory profiles — switching to these targeted methods cut allocations by 40% in those hot paths.
Practical Benchmarking: Measure Before You Optimize
Before you refactor anything, establish baselines. Here's a minimal BenchmarkDotNet setup I use for every performance investigation:
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net80)]
[SimpleJob(RuntimeMoniker.Net90)]
public class CollectionLookupBenchmark
{
private Dictionary<string, int> _dictionary = null!;
private FrozenDictionary<string, int> _frozen = null!;
private string[] _keys = null!;
[GlobalSetup]
public void Setup()
{
var data = Enumerable.Range(0, 100)
.ToDictionary(i => $"key_{i}", i => i);
_dictionary = data;
_frozen = data.ToFrozenDictionary();
_keys = data.Keys.ToArray();
}
[Benchmark(Baseline = true)]
public int DictionaryLookup()
{
int sum = 0;
foreach (var key in _keys)
if (_dictionary.TryGetValue(key, out var val))
sum += val;
return sum;
}
[Benchmark]
public int FrozenDictionaryLookup()
{
int sum = 0;
foreach (var key in _keys)
if (_frozen.TryGetValue(key, out var val))
sum += val;
return sum;
}
}
Run this against both runtimes and you'll see concrete numbers for your specific hardware. Don't trust generic benchmark claims — including mine.
Key Takeaways
After several months of running .NET 9 in production across multiple services, here's my practical advice:
- Upgrade and measure first. Dynamic PGO alone gives most applications a measurable boost with zero code changes. Start there.
- Adopt FrozenDictionary for read-heavy lookups. If you have dictionaries that are populated at startup and read millions of times, switch them. The migration is trivial.
- Evaluate Native AOT for new microservices. Don't retrofit existing large applications, but for new containerized services, the startup and memory characteristics are compelling.
- Replace GroupBy with CountBy/AggregateBy in hot paths where you're only counting or summing. The allocation reduction is significant.
- Use SearchValues for any repeated string scanning. Input validation, log parsing, protocol handling — anywhere you're searching for known characters or patterns.
The performance culture in the .NET team is paying dividends. Each release makes the "right way" to write code also the "fast way," and .NET 9 is the strongest example yet.
Comments
Ajit Gangurde
Software Engineer II at Microsoft | 15+ years in .NET & Azure
Related Posts
Mar 14, 2026
Feb 28, 2026