Showing posts with label API. Show all posts
Showing posts with label API. Show all posts

Saturday, 9 August 2025

Testing API resilience with Polly Chaos engine

Polly is a transient failure handling and resilience library that makes it convenient to build robust APIs based on policies for handling errors that occur and offer different resilience strategies for handling these errors. The errors are not only of errors occurings either externall or internally in API, but offer also alternative strategies such as fallbacks, rate-limiting, circuit breakers and other overall act upon either reactively or proactively. A great overview of Polly can be seen in this video, although some years old now - back to 2019 - of getting an overview of Polly : NDC Oslo 2019 - System Stable: Robust connected applications with Polly, the .NET Resilience Framework - Bryan Hogan With Polly, it is possible to test out API resilience with the built in Polly Chaos engine. The Chaos engine was previously offered via the Simmy library.

Simmy - Logo



The source code in this article is available in my Github repo here: Note - the code shown in the methods below are called from Program.cs to be able to be used in the API. The sample app is an Asp.net application written with C# and with .NET 8 Target Framework. https://github.com/toreaurstadboss/HttpClientUsingPolly/

Testing out API resilience with fallbacking API endpoints

First off, the fallback strategy resilience. Polly offers a way to define fallback policies. Let's look at a way to define an HTTP client that will provide a fallback if the statuscode from the endpoint is InternalServerError = 501. The fallback is just a Json payload in this simple example.

PollyExtensions.cs



    public static void AddPollyHttpClientWithFallback(this IServiceCollection services)
    {
        services.AddHttpClient(Constants.HttpClientNames.FallbackHttpClientName, client =>
        {
            client.DefaultRequestHeaders.UserAgent.Add(new ProductInfoHeaderValue("MyApp", "1.0"));
        })
        .AddResilienceHandler(
             $"{FallbackHttpClientName}{ResilienceHandlerSuffix}",
            (builder, context) =>
        {
            var serviceProvider = services.BuildServiceProvider();
            var logger = serviceProvider.GetRequiredService<ILoggerFactory>().CreateLogger(Constants.LoggerName);

            builder.AddFallback(new FallbackStrategyOptions<HttpResponseMessage>
            {
                ShouldHandle = args =>
                {
                    // Fallback skal trigges ved status 500 eller exception
                    return ValueTask.FromResult(
                        args.Outcome.Result?.StatusCode == HttpStatusCode.InternalServerError ||
                        args.Outcome.Exception is HttpRequestException
                    );
                },
                FallbackAction = args =>
                {
                    logger.LogWarning("Fallback triggered. Returning default response.");

                    var jsonObject = new
                    {
                        message = "Fallback response",
                        source = "Polly fallback",
                        timestamp = DateTime.UtcNow
                    };

                    var json = JsonSerializer.Serialize(jsonObject);

                    var fallbackResponse = new HttpResponseMessage(HttpStatusCode.OK)
                    {
                        Content = new StringContent(json, System.Text.Encoding.UTF8, "application/json")
                    };

                    return ValueTask.FromResult(Outcome.FromResult(fallbackResponse));
                }
            });

            // Inject exceptions in 80% of requests
            builder.AddChaosOutcome(new ChaosOutcomeStrategyOptions<HttpResponseMessage>()
            {
                Enabled = true,
                OutcomeGenerator = static args =>
                {
                    var response = new HttpResponseMessage(HttpStatusCode.InternalServerError);
                    return ValueTask.FromResult<Outcome<HttpResponseMessage>?>(Outcome.FromResult(response));
                },
                InjectionRate = 0.8,
                OnOutcomeInjected = args =>
                {
                    logger.LogWarning("Outcome returning internal server error");
                    return default;
                }
            });

        });
    }


Next up, let's look at the client endpoint defined with Minimal API in Aspnet core.

SampleEndpoints.cs



 app.MapGet("/test-v5-fallback", async (
 [FromServices] IHttpClientFactory httpClientFactory) =>
 {
     using var client = httpClientFactory.CreateClient(Constants.HttpClientNames.FallbackHttpClientName);

     HttpResponseMessage? response = await client.GetAsync("https://example.com");

     if (!response.IsSuccessStatusCode)
     {
         var errorContent = await response.Content.ReadAsStringAsync();
         return Results.Problem(
             detail: $"Request failed with status code {(int)response.StatusCode}: {response.ReasonPhrase}",
             statusCode: (int)response.StatusCode,
             title: "External API Error"
         );
     }

     var json = await response!.Content.ReadAsStringAsync();
     return Results.Json(json);

 });



Note the usage of [FromServices] attribute and IHttpClientFactory. The code creates the named Http client defined earlier. The fallback will return a json with fallback content in 80% of the requests in this concrete example.


Testing out API resilience with circuit breaking API endpoints


Next, the circuit breaker strategy for API resilience. Polly offers a way to define circuit breaker policies. Let's look at a way to define an HTTP client that will provide a circuit breaker if it fails for 3 consecutive requests within 30 seconds, resulting in a 10 second break. The circuit breaker strategy will stop requests that opens the circuit defined here. After the break, the circuit breaker half opens. It will accept new request, but fail immediately and open up the circuit again if it fails again, further postponing.

PollyExtensions.cs



  public static void AddPollyHttpClientWithExceptionChaosAndBreaker(this IServiceCollection services)
  {

      services.AddHttpClient(CircuitBreakerHttpClientName, client =>
      {
          client.DefaultRequestHeaders.UserAgent.Add(new ProductInfoHeaderValue("MyApp", "1.0"));
      })
      .AddResilienceHandler(
          $"{CircuitBreakerHttpClientName}{ResilienceHandlerSuffix}",
          (builder, context) =>
      {
          var serviceProvider = services.BuildServiceProvider();
          var logger = serviceProvider.GetRequiredService<ILoggerFactory>().CreateLogger(Constants.LoggerName);

          //Add circuit breaker that opens after three consecutive failures and breaks for a duration of ten seconds
          builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage>
          {
              MinimumThroughput = 3, //number of CONSECUTIVE requests failing for circuit to open (short-circuiting future requests until given BreakDuration is passed)
              FailureRatio = 1.0, //usually 1.0 is used here..
              SamplingDuration = TimeSpan.FromSeconds(30), //time window duration to look for CONSECUTIVE requests failing
              BreakDuration = TimeSpan.FromSeconds(10), //break duration. requests will be hindered at during this duration set 
              ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                  .HandleResult(r => !r.IsSuccessStatusCode)
                  .Handle<HttpRequestException>(), //defining when circuit breaker will occur given other conditions also apply
              OnOpened = args =>
              {
                  logger.LogInformation("Circuit breaker opened");
                  return default;
              },
              OnClosed = args =>
              {
                  logger.LogInformation("Circuit breaker closed");
                  return default;
              },
              OnHalfOpened = args =>
              {
                  logger.LogInformation("Circuit breaker half opened"); //half opened state happens after the circuit has been opened and break duration has passed, entering 'half-open' state (usually ONE test call must succeeed to transition from half open to open state)
                  return default;
              }
          });


          // Inject exceptions in 80% of requests
          builder.AddChaosOutcome(new ChaosOutcomeStrategyOptions<HttpResponseMessage>()
          {
              Enabled = true,
              OutcomeGenerator = static args =>
              {
                  var response = new HttpResponseMessage(HttpStatusCode.InternalServerError);
                  return ValueTask.FromResult<Outcome<HttpResponseMessage>?>(Outcome.FromResult(response));
              },
              InjectionRate = 0.8,
              OnOutcomeInjected = args =>
              {
                  logger.LogWarning("Outcome returning internal server error");
                  return default;
              }
          });


      });
  }



Let's look at the client endpoint defined with Minimal API in Aspnet core for circuit-breaker example.

SampleEndpoints.cs



  app.MapGet("/test-v4-circuitbreaker-opening", async (
  [FromServices] IHttpClientFactory httpClientFactory) =>
  {
      using var client = httpClientFactory.CreateClient(Constants.HttpClientNames.CircuitBreakerHttpClientName);

      HttpResponseMessage? response = await client.GetAsync("https://example.com");

      if (!response.IsSuccessStatusCode)
      {
          var errorContent = await response.Content.ReadAsStringAsync();
          return Results.Problem(
              detail: $"Request failed with status code {(int)response.StatusCode}: {response.ReasonPhrase}",
              statusCode: (int)response.StatusCode,
              title: "External API Error"
          );
      }

      var json = await response!.Content.ReadAsStringAsync();
      return Results.Json(json);

  });



Testing out API resilience for latency induced timeout API endpoints

Next, the timeout strategy for API resilience. Polly offers a way to define timeout policies and can combine these for testing by injecting latency (additional execution time). Let's look at a way to define an HTTP client that will provide a timeout if it times out already after one second with a 50% chance of getting a 3 second latency, which will trigger the timeout.

PollyExtensions.cs



   public static void AddPollyHttpClientWithIntendedRetriesAndLatencyAndTimeout(this IServiceCollection services)
  {
      services.AddHttpClient(RetryingTimeoutLatencyHttpClientName, client =>
      {
          client.DefaultRequestHeaders.UserAgent.Add(new ProductInfoHeaderValue("MyApp", "1.0"));
      })
      .AddResilienceHandler(
          $"{RetryingTimeoutLatencyHttpClientName}{ResilienceHandlerSuffix}",
      (builder, context) =>
       {
           var serviceProvider = services.BuildServiceProvider();
           var logger = serviceProvider.GetRequiredService<ILoggerFactory>().CreateLogger(Constants.LoggerName);

           // Timeout strategy : fail if request takes longer than 1s
           builder.AddTimeout(new HttpTimeoutStrategyOptions
           {
               Timeout = TimeSpan.FromSeconds(1),
               OnTimeout = args =>
               {
                   logger.LogWarning($"Timeout after {args.Timeout.TotalSeconds} seconds");
                   return default;
               }
           });

           // Chaos latency: inject 3s delay in 30% of cases
           builder.AddChaosLatency(new ChaosLatencyStrategyOptions
           {
               InjectionRate = 0.5,
               Latency = TimeSpan.FromSeconds(3),
               Enabled = true,
               OnLatencyInjected = args =>
               {
                   logger.LogInformation("... Injecting a latency of 3 seconds ...");
                   return default;
               }
           });

           // Chaos strategy: inject 500 Internal Server Error in 75% of cases
           builder.AddChaosOutcome<HttpResponseMessage>(
               new ChaosOutcomeStrategyOptions<HttpResponseMessage>
               {
                   InjectionRate = 0.5,
                   Enabled = true,
                   OutcomeGenerator = static args =>
                   {
                       var response = new HttpResponseMessage(HttpStatusCode.InternalServerError);
                       return ValueTask.FromResult<Outcome<HttpResponseMessage>?>(Outcome.FromResult(response));
                   },
                   OnOutcomeInjected = args =>
                   {
                       logger.LogWarning("Outcome returning internal server error");
                       return default;
                   }
               });
       });

  }


Let's look at the client endpoint defined with Minimal API in Aspnet core for timeout with latency example.

SampleEndpoints.cs



 app.MapGet("/test-v3-latency-timeout", async (
 [FromServices] IHttpClientFactory httpClientFactory) =>
 {
     using var client = httpClientFactory.CreateClient(Constants.HttpClientNames.RetryingTimeoutLatencyHttpClientName);

     var response = await client.GetAsync("https://example.com");

     if (!response.IsSuccessStatusCode)
     {
         var errorContent = await response.Content.ReadAsStringAsync();
         return Results.Problem(
             detail: $"Request failed with status code {(int)response.StatusCode}: {response.ReasonPhrase}",
             statusCode: (int)response.StatusCode,
             title: "External API Error"
         );
     }

     var json = await response.Content.ReadAsStringAsync();
     return Results.Json(json);

 });



The following screenshot shows the timeout occuring after the defined setup of induced latency by given probability and defined timeout.

Testing out API resilience with retries

Retries offers an API endpoint to gain more robustness, by allowing multiple retries and define a strategy for these retries.The example http client here also adds a chaos outcome internal server error = 501 that is thrown with 75% probability (failure rate).

PollyExtensions.cs



    public static void AddPollyHttpClientWithIntendedRetries(this IServiceCollection services)
   {
       services.AddHttpClient(RetryingHttpClientName, client =>
           {
               client.DefaultRequestHeaders.UserAgent.Add(new ProductInfoHeaderValue("MyApp", "1.0"));
           })
           .AddResilienceHandler("polly-chaos", (builder, context) =>
           {
               var serviceProvider = services.BuildServiceProvider();
               var logger = serviceProvider.GetRequiredService<ILoggerFactory>().CreateLogger(Constants.LoggerName);

               //Retry strategy
               builder.AddRetry(new RetryStrategyOptions<HttpResponseMessage>
               {
                   ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                       .HandleResult(r => !r.IsSuccessStatusCode)
                       .Handle<HttpRequestException>(),
                   MaxRetryAttempts = 3,
                   DelayGenerator = RetryDelaysPipeline,
                   OnRetry = args =>
                   {
                       logger.LogWarning($"Retrying {args.AttemptNumber} for requesturi {args.Context.GetRequestMessage()?.RequestUri}");
                       return default;
                   }
               });

               // Chaos strategy: inject 500 Internal Server Error in 75% of cases
               builder.AddChaosOutcome<HttpResponseMessage>(
                   new ChaosOutcomeStrategyOptions<HttpResponseMessage>
                   {
                       InjectionRate = 0.75,
                       Enabled = true,
                       OutcomeGenerator = static args =>
                       {
                           var response = new HttpResponseMessage(HttpStatusCode.InternalServerError);
                           return ValueTask.FromResult<Outcome<HttpResponseMessage>?>(Outcome.FromResult(response));
                       }
                   });

           });

   }


Let's look at the client endpoint defined with Minimal API in Aspnet core for the retrying example.

SampleEndpoints.cs


   app.MapGet("/test-retry-v2", async (
       [FromServices] IHttpClientFactory httpClientFactory) =>
   {
       using var client = httpClientFactory.CreateClient(Constants.HttpClientNames.RetryingHttpClientName);

       var response = await client.GetAsync("https://example.com");

       return Results.Json(response);
   });


There are multiple resilience scenarios that Polly offers, the table below lists them up (this article has presented most of them):

Summary

The following summary explains what this article has presented.

🧪 Testing API Resilience with Polly Chaos Engineering In this article, we have explored how to build resilient APIs using Polly v9 and its integrated chaos engine. Through practical examples, the article has demonstrated how to simulate real-world failures—like latency, timeouts, and internal server errors—and apply resilience strategies such as fallbacks, retries, and circuit breakers. By injecting controlled chaos, developers can proactively test and strengthen their systems against instability and external dependencies. Polly v9 library offers additionally scenarios also for making robust APIs. Info: Asp net core is used in this article, together with C# and .NET 8 as Target Framework.