Cross App Queries in Azure Log Analytics

I’ll keep it short and simple this time. Here’s a great way to debug your app across multiple App Insights instances.

So, I have two Azure Functions services running, with one serving as an API, and the other serving as BE processing engine. Both report telemetry to App Insights (different apps), and I am passing a context along from one to the other – so I can correlate exceptions and bugs.

Wouldn’t it be great to be able to see what happened in a single session across the 2 apps?

It’s possible – using ‘app‘ – just plugin the name of the app insights resource you want to query, and a simple ‘union‘.

Here you go:

let session="reReYiRu";
union app('FE-prod').traces, app('BE-prod').traces
| where session_Id == session 
| project timestamp, session_Id, appName, message
| order by timestamp asc 

 

Don’t forget –

  1. You can use the field ‘appName‘ to see which app this particular trace is coming from.
  2. Different machines have different times.. Don’t count on the timestamp ordering to always be correct.

A Simple Way to Extract Data From Traces – ‘Parse’

There is a nifty little operator in Azure Log Analytics that has really simplified how I  work with regular expressions – It’s called “parse” and I’ll explain it through a little example.

Let’s say you have a service that emits traces like:

traces
| where message contains "Error"
| project message

11:07 Error-failed to connect to DB(code: 100)

12:02 Error-failed to connect to DB(code: 100)

12:05 Error-query failed on syntax(code: 355)

12:06 Error-query failed on timeout(code: 567)

I’d like to count how many errors I have from each code, and then put the whole thing on a timechart that I can add to my dashboard, in order to monitor errors in my service.

Obviously I’d like to extract the error code from the trace, so I need a regular expression.

Well, if you’re anything like me the first thing you’ll do is start feverishly googling regular expressions to try to remember how the heck to do it… and then flailing for like an hour until getting it right.

Well, using parse, things are much much easier:

traces
| where message contains "Error"
| parse message with * "(code: " errorCode ")" *
| project errorCode

100
100
355
567

And from here summarizing is just a breeze:

traces
| where message contains "Error"
| parse message with * "(code: " errorCode ")" *
| summarize count() by errorCode, bin(timestamp, 1h)
| render areachart kind=stacked

Happy parsing!

App Insights Analytics: Extracting data from traces

I wanna show two real-world examples (it really happened to me!) of extracting data from traces, and then using that data to get really great insights.

So a little context here – I have a service that reads and processes messages from an Azure Queue. This message processing can fail, causing the same message to be retried many times.

I We recently introduced a bug into the service (as usual.. ) which caused requests to fail on a null reference exception. I wanted to know exactly how many messages were affected by this bug, but it was kind of hard to tell because the retries cause a lot of my service metrics to be off.

Luckily I have a trace just as I am beginning to process a message that shows the message id :

Start handling message id: 0828ae20-ba09-4f83-bb46-69f4fe25b510, dequeue count: 1, message: …

So what I did is extract the message id from the trace using a simple regex, and was then able to count messages using dcount:

traces
 | where timestamp > ago(1d)
 | where message startswith "Start handling"
 | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
 | summarize dcount(messageid)

And in order to count how many messages were affected by the exception, I did a double join – to the failed requests and to exceptions tables:

requests 
| where timestamp > ago(1d)
| where success == "False"
| join (exceptions
   | where timestamp > ago(1d)
   | where type contains "NullRef"
   ) on operation_Id
| join (traces
   | where timestamp > ago(1d)
   | where message startswith "Start handling"
   | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
   ) on operation_Id
| summarize dcount(messageid)

Voila!

The second example is similar, but this time I extracted a measurement.

Again I started from a trace – I have a trace detailing exactly how late a message that came in the queue is. It looks like this:

Latency: 21 minutes.

I wanted to turn these traces into measurable data that I can slice and dice on. So I used the same extend+extract method as before + a todouble:

traces
| where timestamp > ago(1d)
| where message contains "Latency: "
| extend latency = todouble(extract("Latency: ([^:\\/\\s]+) minutes.", 1, message))
| summarize percentile(latency, 90)

AWESOME!