Slf4j logging context intermitently dissapears in http4s api

So I’m building an api using http4s, log4cats and logback-classic as the logging backend. I built some logging middleware (correlationIdMiddleware) for my api which inserts a CorrelationID into the MDC logging context while the requests are being processed.

However I noticed that sometimes the CorrelationID is missing from the MDC context in one my endpoint handlers namely pseudonymizeColumnHandler and I’m not sure why this happens, why it only happens intermittently and why only in that specific handler.

I can’t observe this happening at all for my helloHandler. I thought it might have something to do with MDC context not being preserved across fibers, but afaict from the implementation the context is preserved. Here is my code:

def correlationIdMiddleware(routes: HttpRoutes[IO]): HttpRoutes[IO] =
  Kleisli { (req: Request[IO]) =>
    OptionT(
      for
        logger <- Slf4jLogger.create[IO]
        correlationId <-
          req.headers
            .get(ci"X-Correlation-ID")
            .map(_.head)
            .fold(FULID[IO].generate)(x => IO.pure(x.value))
        _ <- IO(MDC.put("CorrelationID", correlationId))
        response <- routes(req).value
        _ <- IO(MDC.remove("CorrelationID"))
      yield response
    )
  }

// CorrelationID always present in MDC here
private val helloHandler: IO[Response[IO]] =
  for
    logger <- Slf4jLogger.create[IO]
    _ <- logger.info("I'm inside a request handler")
    res <- Ok("...and hello to you.")
  yield res

// CorrelationID only sometimes present in MDC here
def pseudonymizeColumnHandler(request: Request[IO]): IO[Response[IO]] =
  for
    logger <- Slf4jLogger.create[IO]
    eitherRes <- request.attemptAs[PseudonymizeRequest].value
    // For debugging, sometimes I get None here
    correlationId <- IO(MDC.get("CorrelationID"))
    _ <- logger.info(s"The correlationId is $correlationId")
    res <- eitherRes match
      case Left(err) =>
        val errMsg = s"Failed to decode JSON body. Cause ${err.cause}"
        logger.error(errMsg) *> BadRequest(errMsg)
      case Right(pseudReq) => logger.info(pseudReq.toString) *> Ok(
          PseudonymizeResponse("Received :)").asJson
        )
  yield res


val routes: HttpRoutes[IO] = HttpRoutes.of {
  case GET -> Root / "hello" => helloHandler
  case req @ POST -> Root / "pseudonymize" / "column" =>
    pseudonymizeColumnHandler(req)
}

val httpApp = correlationIdMiddleware(routes).orNotFound

My goal is to be able to annotate logging messaged with the CorrelationID when it’s present through logback:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration>
<configuration>
    <import class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"/>
    <import class="ch.qos.logback.core.ConsoleAppender"/>
    <appender name="STDOUT" class="ConsoleAppender">
        <encoder class="PatternLayoutEncoder">
            <pattern>%cyan(%d{HH:mm:ss.SSS}) %X{CorrelationID} %magenta([%thread]) %highlight(%-5level) %magenta(%logger{36}) - %msg%n</pattern>
        </encoder>
    </appender>
    <root level="info">
        <appender-ref ref="STDOUT"/>
    </root>
</configuration>

I’d be quite surprised if the MDC were reliable at all in this setup. The slf4j MDC is only an adapter for the underlying logging system’s MDC (if present at all). Most concrete MDC mechanisms will be implemented using ThreadLocal or similar, which won’t play well with IO fibers. I just experimented a little, e.g.

IO { MDC.put("foo", "bar") } >>
  List.fill(10)(IO { Option(MDC.get("foo")) }.flatMap(IO.println))
    .parSequence_

…and at least with logback as the underlying log system, seeing the entry at all seems to be a stroke of fortune.

The code you link to is about the log4cats logger pushing and retracting its context into/from the underlying system’s MDC around log actions at its own level - nothing to be inferred about lower level MDC behavior there.

You could switch to running in Kleisli[IO, CorrelationID, *] instead of IO and feed the context data into the (log4cats) logger context for each logging action. However, this will probably incur quite some refactorings and preparations - I have found these changes worthwhile, YMMV. See this post regarding a related logging issue to see what application code might look like with this approach.

Thanks for the reply. I’ve got it working now using a StructuredLogger which copies its context to MDC allowing my logback pattern to pick up the CorrelationID. The logger is then passed implicitly to every route handler. Refactored code:

def correlationIdLoggerMiddleware(
    service: StructuredLogger[IO] => HttpRoutes[IO]
): HttpRoutes[IO] =
  Kleisli { req =>
    OptionT(
      for
        correlationId <-
          req.headers
            .get(ci"X-Correlation-ID")
            .map(_.head)
            .fold(FULID[IO].generate)(x => IO.pure(x.value))
        logger <- Slf4jLogger.create[IO].map(
          _.addContext("CorrelationID" -> Shown(correlationId))
        )
        response <- service(logger)(req).value
      yield response
    )
  }

def service(logger: StructuredLogger[IO]): HttpRoutes[IO] =
  given implicitLogger: StructuredLogger[IO] = logger
  HttpRoutes.of {
    case GET -> Root / "hello" => helloHandler
    case req @ POST -> Root / "pseudonymize" / "column" =>
      pseudonymizeColumnHandler(req)
  }

val httpApp = correlationIdLoggerMiddleware(service).orNotFound