A Simple Way to Extract Data From Traces – ‘Parse’

There is a nifty little operator in Azure Log Analytics that has really simplified how I  work with regular expressions – It’s called “parse” and I’ll explain it through a little example.

Let’s say you have a service that emits traces like:

traces
| where message contains "Error"
| project message

11:07 Error-failed to connect to DB(code: 100)

12:02 Error-failed to connect to DB(code: 100)

12:05 Error-query failed on syntax(code: 355)

12:06 Error-query failed on timeout(code: 567)

I’d like to count how many errors I have from each code, and then put the whole thing on a timechart that I can add to my dashboard, in order to monitor errors in my service.

Obviously I’d like to extract the error code from the trace, so I need a regular expression.

Well, if you’re anything like me the first thing you’ll do is start feverishly googling regular expressions to try to remember how the heck to do it… and then flailing for like an hour until getting it right.

Well, using parse, things are much much easier:

traces
| where message contains "Error"
| parse message with * "(code: " errorCode ")" *
| project errorCode

100
100
355
567

And from here summarizing is just a breeze:

traces
| where message contains "Error"
| parse message with * "(code: " errorCode ")" *
| summarize count() by errorCode, bin(timestamp, 1h)
| render areachart kind=stacked

Happy parsing!

Creating Beautiful Sankey Diagrams From App Insights Custom Events

I came across this tweet the other day:

It sounded a lot like a challenge to me, so I just couldn’t resist!!

Sankey diagrams for those who don’t know are an amazing tool for describing user flows AND are the basis for one of the most famous data visualizations of all-time. But really, I had no idea how to create one. So, I googled Sankey + PowerBI and came across this fantastic Amir Netz video:

So all you need to create the diagram is a table with 3 columns:

  • Source
  • Destination
  • Count

And PowerBI takes care of the rest.

We already know you can take any App Insights Analytics query and “export” the data to PowerBI.

So the only problem now is how do I transform my AppInsights custom events table into a Sankey events table?

Let’s go to Analytics!

First it’s best to have an idea of what we’re trying to do. Decide on 5-10 events that make sense for a user session flow. In my case, I want to see the flow of users through a new feature called “CTD”. So the events I chose are:

  • CTD_Available (feature is available for use)
  • CTD_Result (user used the feature and got a result)
  • CTD_DrillIn (user chose to further drill-in the results)
  • CTD_Feedback (user chose to give feedback)

In every step, I’m interested in seeing how many users I’m “losing”, and what they’re doing next.

Ok, let’s get to work!

First query we’ll

  • Filter out only relevant events
  • Sort by timestamp asc (don’t forget that in this chart, order is important!)
  • Summarize by session_id using makelist, to put all events that happened in that session in an ordered list. If you’re unfamiliar with makelist, all it does is take all the values of the column, and stuffs them into a list. The resulting lists are the ordered events that users triggered in each session.
customEvents
| where timestamp > ago(7d)
| where name=="CTD_Available" or name=="CTD_Result" or 
        name=="CTD_Drillin" or name== "CTD_Feedback"
| sort by timestamp asc
| summarize l=makelist(name) by session_Id

Next step I’ll do is add an “EndSession” event to each list, just to make sure my final diagram is symmetric. You might already have this event as part of your telemetry, I don’t. This is optional, and you can choose to remove this line.

| extend l=todynamic(replace(@"\]", ',"EndSession" ]', tostring(l))) 

Next step, I’d like to create “tuples” for source and destination from each list. I want to turn:

Available -> Result -> Feedback -> EndSession

Into:

[Available, Result], [Result, Feedback], [Feedback, EndSession]

To do this, I need to chop off the first item in the list and “zip” it (like a zipper) with the original list. In c# this is very easy – list.Zip(list.Skip(1))..

Amazingly, App Analytics has a zip command! Tragically, it doesn’t have a skip… :(. Which means we need to do some more ugly regex work in order to chop off the first element.

| extend l_chopped=todynamic(replace(@"\[""(\w+)"",", @"[", tostring(l)))

Then I zip, and use mvexpand, to create one row per tuple created

| extend z=zip(l, l_chopped) 
| mvexpand z

And I remove the lonely “EndSession”s which are useless artifacts.

| where tostring(z[0]) != "EndSession"

Last thing left to do is summarize to get the counts for each unique tuple, and remove the tuples with identical source and destinations.

The final query:

customEvents
| where timestamp > ago(7d)
| where name=="CTD_Available" or name=="CTD_Result" or 
        name=="CTD_Drillin" or name== "CTD_Feedback"
| sort by timestamp asc
| summarize l=makelist(name) by session_Id
| extend l=todynamic(replace(@"\]", ',"EndSession" ]', tostring(l))) 
| extend l_chopped=todynamic(replace(@"\[""(\w+)"",", @"[", tostring(l)))
| extend z=zip(l, l_chopped) 
| mvexpand z
| where tostring(z[0]) != "EndSession"
| summarize cnt() by source=tostring(z[0]),dest=tostring(z[1])
| where source!=dest

We’re in business!

Now I want to get this data to Power BI…

  • Download Power BI desktop.
  • Goto the Power BI visuals gallery, search “sankey” and download the visuals you want (my preference is “sankey with labels”)
  • Open Power BI, and goto GetData -> BlankQuery
  • In Query editor, goto View -> Advanced Editor
  • Now go back to the prepared AppAnalytics query, and hit export to Power BI

ExportPBI

  • Take the query in the downloaded text file, and paste it into the query editor
  • Press done

Now the data should be there! Just one more step!

  • Back in PowerBI, import the visuals you previously downloaded

ImportVisuals

  • Select Source -> source, Destination -> dest, Weight -> count_

WOOT!

Sankey

App Insights Analytics: Extracting data from traces

I wanna show two real-world examples (it really happened to me!) of extracting data from traces, and then using that data to get really great insights.

So a little context here – I have a service that reads and processes messages from an Azure Queue. This message processing can fail, causing the same message to be retried many times.

I We recently introduced a bug into the service (as usual.. ) which caused requests to fail on a null reference exception. I wanted to know exactly how many messages were affected by this bug, but it was kind of hard to tell because the retries cause a lot of my service metrics to be off.

Luckily I have a trace just as I am beginning to process a message that shows the message id :

Start handling message id: 0828ae20-ba09-4f83-bb46-69f4fe25b510, dequeue count: 1, message: …

So what I did is extract the message id from the trace using a simple regex, and was then able to count messages using dcount:

traces
 | where timestamp > ago(1d)
 | where message startswith "Start handling"
 | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
 | summarize dcount(messageid)

And in order to count how many messages were affected by the exception, I did a double join – to the failed requests and to exceptions tables:

requests 
| where timestamp > ago(1d)
| where success == "False"
| join (exceptions
   | where timestamp > ago(1d)
   | where type contains "NullRef"
   ) on operation_Id
| join (traces
   | where timestamp > ago(1d)
   | where message startswith "Start handling"
   | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
   ) on operation_Id
| summarize dcount(messageid)

Voila!

The second example is similar, but this time I extracted a measurement.

Again I started from a trace – I have a trace detailing exactly how late a message that came in the queue is. It looks like this:

Latency: 21 minutes.

I wanted to turn these traces into measurable data that I can slice and dice on. So I used the same extend+extract method as before + a todouble:

traces
| where timestamp > ago(1d)
| where message contains "Latency: "
| extend latency = todouble(extract("Latency: ([^:\\/\\s]+) minutes.", 1, message))
| summarize percentile(latency, 90)

AWESOME!

Cool AppInsights Analytics: Extracting url host with a regular expression

Another nice feature of Kusto / Application Insights Analytics is full on support for regular expressions using the extract keyword.

A very useful application of this is all matter of manipulations you can do over the “url” field in requests. A common ask is understanding how much traffic is generated by any of your different hosts.

Since Analytics only carries the full url field, we need to parse out the host out using a regex. I took a really really simple regex in this case, but obviously it can be much more complex.

pageViews
| where timestamp > ago(1d)
| extend urlhost=extract('^(http://|https://)([^:\\/\\s]+)', 2, url)
| summarize count() by urlhost
| render piechart

Update:

There is now a simpler method to extract all url parts – parseurl.

requests
| take 5
| extend urlParts = parseurl(url)
| project url, urlParts, urlParts.Scheme, urlParts.Host, urlParts.Path, urlParts.Port

host