Using Structured Logging for Production Insight


>>Hi, welcome to the DevOps
Lab. I am Damian Brady.>>I’m Nick Blumhardt.>>Today, we are
going to be talking about Structured Logging,>>And how you should be prepared for diagnostics in production.>>Perfect. See you then.>>Hi, everyone. This is
Damian coming to you for the DevOps Lab from sunny
Brisbane back in Australia. I’m joined today by
Nick Blumhardt, and we are going to
be talking about Structured Logging and
logging in production, which is a bit of
a special interest of yours. You own and run a company that does a lot
of this production logging, Structured Logging and
tooling for developers. Yeah, what we’re
talking about before was that a lot of the time, when people talk about DevOps, they kind of stop when
something hits production. You know, you deploy
to production and then we’re live and we have
a party and so on, but that’s not where
the story stops.>>This is I guess, for me, for the first probably
10 years that I was doing.NET Development, just getting to production was the big challenge that
everybody had to face, and I really watched all that time how we matured
through unit testing, and then we moved along to continuous integration
and automating builds, and we’ve actually gotten really quite good at getting
code into production now, where in the past all the big failures that you
would hear about would be these projects that consume millions and millions of dollars and never shift a line of code, and I’ve actually
gotten quite good at not getting the code
out the door.>>Yeah.>>I think that the next stage
along how growing up in the ecosystem
has been getting a lot better at doing
production monitoring and diagnostics
because obviously, the failures that you
hear mostly about now are downtime and major bugs or all the
domain of code that has to be kept running
predictively, I think.>>And when something goes wrong because things do
get wrong, right? Nobody writes perfect software.>>We just wait until
somebody got on the phone to tell us when things are
going wrong, I suppose.>>That’s not a good situation.>>No, but the tools
have really exploded in the last few years
for monitoring apps. The Structured Logging kind
of idea has come out of just taking the kind of printf()
debugging that we used to do, Console.WriteLine and just trying to scale that up to
something that was actually real engineering so that you could plan for how you
were going to monitor systems in production
and how you were going to react to different
kinds of errors and eventually find ways to trace errors
back to the causes, which is really the hardest thing to do in a big busy system. You’ve got things going
on all over the place.>>So, Structured Logging, just to boil it down
to its essence is, well, explain it to me, what exactly is
Structured Logging?>>Yes, well, Structured Logging, we probably should flip up.
I’ve prepared some code.>>Yeah, sure.>>Right now, we’re
looking at just a very basic Console App,
.NET Core Console App, and you see the line
that we’ve got highlighted there in
the middle is the kind of logging code that we’ve been doing in
.NET for a long time. It’s just broadening out
some information about the environment that
the code is running in. And that kind of logging which will be
done using Web Form.NET, is focused on producing this kind of narrative
of events that read to tell a story
of really what the code is doing at each step of some kind
of computation, and the problem with
that is that, of course, once you’re running a lot of servers in parallel and a lot of presses
is on that servers, that narrative is
not linear anymore.>>Right.>>And, of course, the events that are kind of describing what’s
happening in the system, they’re all interleaved and so I’m taking that and getting some kind of meaningful insight into what’s running and what’s happening in a real running
system is very difficult. So, Structured Logging, as we know in .NET right now especially because ASP.NET Core has popularized a lot of
this as well as Serilog, which is a library that I started building several years ago. It has really just
taken that same printf() kind of debugging where you see a .NET format
string like “Hello”, and this will be my username, and we’ve replaced that with the notion of templated messages where instead of actually putting in numeric
placeholder in there, we attached a property
to that event, and then when we run these applications that are instrumental with
Structured Logs, we still get that kind of the debugging experience of reading a log of what’s
going on and that, but Serilog and
ASP.NET Core as well, both will actually
preserve the values of those pieces of data that got pulled into the event
so that later on, when you want to actually
go and record that pops to diagnostics service
or some other kind of structured data store, that log event can be formatted out and the same events that you can read during
development time, it can be searched as structured data in a busy system so we can filter for things like name and Bloomheart and find those events that were raised just from my machine
for instance.>>So, rather than having to
do gigantic searches through reams of text to find
things that correspond>>Rather than have to
open up notepad and then, realize that what
you’re searching for is too complex to
actually find in notepad and then pulling out the book on
regular expressions. Regular expression,
fill this online.>>Yeah, exactly.>>Yeah, so we’ve kind
of simplified from there and instead
of throwing away all of that information
that you had about your log event once you
formatted it into a text, we now preserve
all that information that you have so
that then later on, when you’re going
to do diagnostics, you’re not really thinking about narrative messages anymore. You’re thinking about a stream of events that have
properties and those are the properties that you
are going to use as an engineer to get
back from some kind of failure to trace all of the related events that happened earlier
on in the system.>>So one of the ways that people kind of tie
these events together especially in micro-macro
services scenario, where a message gets sent
into all over the place. There’s a whole lot of
different sets of logs. They might have a correlation
ID or some kind of, this is now identified
it exactly happened.>>Yeah, well, the
correlation IDs are interesting because Log data is really
highly dimensional, too technical term
but it’s interesting. Correlation is
definitely one thing that you get out of Structured
Logs and that’s really important where you
can trace all of the events that happened
downstream from some activity. The tough thing about diagnostics though and why including the actual event data as well as properties is really
important is that when you are trying to find
the root cause of a bug, there are just so
many dimensions and axes that you need to slice
through that data on. It might be that, “I’m seeing problems with
transactions but only where the number of items in an order was
greater than 15”. It’s this individual event, it’s the data on these individual events
that you usually find to actually.>>I see, yeah.>>So, then, I think
that one place where, what you think of this
correlation IDs, and say WCF, if you’ve done that programming with that built-in correlation, is that we do in ASP.NET apps. Now, since the ASP.NET
Core revision, we get correlation IDs
that have built into the request pipelines
through middleware, and if we have an ASP.NET app, this one is set up
using Serilog as well. When we run this one, we find actually there are a couple of very
interesting things about ASP.NET Core when
it comes to log output. The first is that out of the box, there is just a huge amount of log data that surfaced
by the framework that was completely missing in all of the earlier
versions of ASP.NET. So, here, we started up
at ASP.NET Core App, and these are all events that were built into
ASP.NET Core itself, not added to my Demo Apps, so the template out of the box. The kind of information
that you can get out of the apps now will tell you what request was coming in on for what request URL
or the shipping method, but also things like which controller was chosen
to handleRout, and what the arguments were, and then right through
to the very end of the pipeline when we
are going to use, and things rendered out. So, all that information is now exposed about the framework
in a structured form, which means that if you start to build your own logging
into these apps, you can start to correlate
your application logging with all of the mechanics of the framework and say, when, if you have a look
in a bit more detail at the logs from this one, you’ll see when we have an error
in our application code, we can actually follow that back through the request handling that came out of the ASP.NET
framework itself.>>So, that logging or logs
we were seeing just then, that’s out of the box, you didn’t add any other-?>>With Serilog plugged in, only because the Console
formatting out of the box is a bit more verbose so it takes a bit more
screen real estate. So, ASP.NET Core comes
with a logging framework, built-in code, Microsoft.Extensions.Login, which is the.NET Core platform. They have a bunch of interfaces that can handle all kinds
of different logging, styles like Structured
Logging, text logging. They have provisions
for event IDs and all other switches, and then a series of providers, and there’s a default Console
provider that you get. There’s an application
insights provider, a debug trace provider, and all of those have
got different ways of rendering out the data. So, the provider I’m
using is based on Serilog because Serilog is actually
a complete logging pipeline, so it’s not just an interface of a bunch of providers
but it offers a lot of facilities like
enriching events with correlation IDs and dispatching to a hundred different targets.>>Right.>>So, this app with
just one controller. You can see the template doesn’t really have a lot
of modification.>>Yeah, I’m pretty familiar with the Hello world template.>>Yeah.>>I get it a lot>>Yeah. So, we just
take it. We just take a parameter from the URL, and we print that
out as a greeting. If we’re greeting Damien, you see, this is
the typical developer accepting page that you’re
getting at ASP.NET.>>Yeah.>>The nice thing is that with just plugging in that provider, we get all of those exceptions that are caught by
the framework now. If you remember in
the earlier versions of ASP, you would have to set up
global exception handlers and things wire letting yourself. When errors get thrown at ASP. NET, you see, we look at Seq, which is the product
our company builds. It’s just an HTP-based
log server and we collect that info from
the ASP.NET app as well. You see that we get
unhandled exception occurred. In this case, we get
the exception which it is. I don’t know that name, sorry. But you also see
that this correlation data that’s baked in from the
framework is exposed as well, so that now we had an error
happened in the app. We have a request ID. We can search based
on the request ID in whatever logstore
you’re using. We get not just our
applications events but we get all of
the framework data as well.>>So, the of data
for the most part that we’re receiving here comes straight from the
ASP.NET framework. It’s just being collected by the>>Yeah, exactly. But as your application grows obviously because this is
the default template, you find that you’re really engineering
the diagnosability and the transparency into
the app as you go. So, you see that
the controller that’s failing here was Greeting Damien. This is one of
our events that we’ve popped in to HomeController. You see, I’ve preempted that
you were coming along here. But see there is
that same template of the log message with
the name plugged into it. Then, when we’ve collected
that in JSON format, we’ve got the name there. So, now, we can start
to test a theory to all of the requests where
we create Damien fail. This is an early on with
the request ID but should we actually should we run
that same request again, we start to test that theory. So, we find another one
where the name is Damien and then we can give
it back to the request ID, and we are navigating between
those dimensions of data, the name of the person that
we are dealing with in this case and how
that request finished.>>So, Serilog is kind of a side Seq, which is
what we’re looking at, is a good way of navigating through
all that data as well. But ultimately, it’s logging, this logging is so structured->>Yes.>>-to make this stuff easier.>>There is two ways, I guess, to handle log data and they
used to be antagonistic. You would either collect text and then deal
with it in the wash, and try to figure
out how to extract meaning from the text
or you would collect JSON and then have to try to figure
out how to read that without your eyes
bleeding from the angle brackets and things. Where we’ve really
landed with Serilog with the Structure
Logging in.NET, you can really choose
the format based on the target. So, in this case, we can view text logs
in the Console and we can send JSON logs
to any where we like. So, we might send those to
SQL to some other log seller or we could potentially
format them off it as Axonal and drop them into a table and SQL Server and
use Axonal query.>>Right, Cosmos DB,
any kind of JSON.Core.>>Yeah, exactly
because most DB is initiating.
Interesting one there.>>I could throw that in.>>Definitely. Yes, I think that the important
thing actually, you do need to
collect logs during development for them to
be useful in production. So, when we used to
tested in production, before the days are
really having tests, we know how that went. Testing didn’t ever really become an engineering discipline until we started to use
unit tests to drive development, and a diagnostic is the same. It doesn’t really matter
where you are going to store your logs in in production. There are lots of options
and often it’s just a switch to pop them to
a different destination, but you have to make
sure that you are using your logs
during development, at least as a soundly check
to make sure that the code executed as
you were expecting. When the code doesn’t
execute as it’s expecting and yet the log
shows no trace of that, then you’ve obviously gone off. You need to improve the logs, or when something
did go wrong and you can’t use those logs instead of a debugger to go back and figure out what
led to that problem, and that iterative
process of making those logs better is the only way that when you
get to production, you’re not going to be
faced with just a wall of useless information.>>Right. So, what’s
important here, I guess, is the fact that we are able to diagnose what happens
in production. As a developer, I need the ownership of that
to make sure that when something does go wrong with my code that
it’s able to be diagnosed.>>Exactly. It wasn’t
until we started to take responsibility
for testing as developers and test code as we built it and then
feedback the quality of the code to improve those tests before we actually start to have some success
with unit testing with better quality software.>>Yeah.>>It’s the same situation
with logs unless we are the ones who are diagnosing
problems from our logs, where we’re doing
that in development, trying to find the cause of
something the way that we’re building networking
or in production, feeding that experience of trying to solve a problem
back into producing better diagnostics in
the AFAM beta logs is really the only way that we’re going to then be confident
we’ve got what we need when something’s actually
going wrong in production.>>Yeah, definitely. Can’t just throw the problem to Ops anymore.>>Throw it out of the wall.>>Throw it out of the world. That’s not acceptable anymore. So, that’s awesome. Thanks for spending the time talking through your Structured Logging. I guess, the main message here is being able to diagnose stuff in production is really important, and Structured Logging
is a great way of being able to sort
through those logs.>>Crucial part of that.>>Crucial part of that. Awesome.>>Thanks so much for
your time. We will see you everybody next
time on the DevOps lab.

Author:

Leave a Reply

Your email address will not be published. Required fields are marked *