App Insights Analytics: Extracting data from traces

May 16, 2016August 9, 2017assaf___ 3 Comments

I wanna show two real-world examples (it really happened to me!) of extracting data from traces, and then using that data to get really great insights.

So a little context here – I have a service that reads and processes messages from an Azure Queue. This message processing can fail, causing the same message to be retried many times.

I We recently introduced a bug into the service (as usual.. ) which caused requests to fail on a null reference exception. I wanted to know exactly how many messages were affected by this bug, but it was kind of hard to tell because the retries cause a lot of my service metrics to be off.

Luckily I have a trace just as I am beginning to process a message that shows the message id :

Start handling message id: 0828ae20-ba09-4f83-bb46-69f4fe25b510, dequeue count: 1, message: …

So what I did is extract the message id from the trace using a simple regex, and was then able to count messages using dcount:

traces
 | where timestamp > ago(1d)
 | where message startswith "Start handling"
 | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
 | summarize dcount(messageid)

And in order to count how many messages were affected by the exception, I did a double join – to the failed requests and to exceptions tables:

requests 
| where timestamp > ago(1d)
| where success == "False"
| join (exceptions
   | where timestamp > ago(1d)
   | where type contains "NullRef"
   ) on operation_Id
| join (traces
   | where timestamp > ago(1d)
   | where message startswith "Start handling"
   | extend messageid = tostring(extract("Start handling message id: ([^:\\/\\s]+), ", 1, message))
   ) on operation_Id
| summarize dcount(messageid)

Voila!

The second example is similar, but this time I extracted a measurement.

Again I started from a trace – I have a trace detailing exactly how late a message that came in the queue is. It looks like this:

Latency: 21 minutes.

I wanted to turn these traces into measurable data that I can slice and dice on. So I used the same extend+extract method as before + a todouble:

traces
| where timestamp > ago(1d)
| where message contains "Latency: "
| extend latency = todouble(extract("Latency: ([^:\\/\\s]+) minutes.", 1, message))
| summarize percentile(latency, 90)

AWESOME!

Cool Azure Log Analytics: How many unique users were affected by 404’s?

April 19, 2016August 9, 2017assaf___ 1 Comment

Here’s another nifty little trick useful for counting how many unique users were impacted from a service issue.

In order to do this, I use the “dcount” aggregation. It counts how many unique values are in the column.

requests | where timestamp > ago(7d) | where resultCode == "404" | summarize dcount(user_Id)

Cool AppInsights Analytics: Extracting url host with a regular expression

April 5, 2016August 9, 2017assaf___ Leave a comment

Another nice feature of Kusto / Application Insights Analytics is full on support for regular expressions using the extract keyword.

A very useful application of this is all matter of manipulations you can do over the “url” field in requests. A common ask is understanding how much traffic is generated by any of your different hosts.

Since Analytics only carries the full url field, we need to parse out the host out using a regex. I took a really really simple regex in this case, but obviously it can be much more complex.

pageViews
| where timestamp > ago(1d)
| extend urlhost=extract('^(http://|https://)([^:\\/\\s]+)', 2, url)
| summarize count() by urlhost
| render piechart

Update:

There is now a simpler method to extract all url parts – parseurl.

requests
| take 5
| extend urlParts = parseurl(url)
| project url, urlParts, urlParts.Scheme, urlParts.Host, urlParts.Path, urlParts.Port

host

Cool Azure Log Analytics: Joining requests and dependencies

March 31, 2016August 9, 2017assaf___ 4 Comments

Another cool thing you can do with App Insights Analytics is join different data types to get a good understanding of what’s happening in your app.

A great example are remote dependencies – this is an out-of-the-box feature in App Insights that logs all remote dependency calls such as SQL, Azure, http etc. If you’ve got that data flowing, you can get amazing insights with just a few small queries.

Here’s a small example – Lets’ try and find out which resources are real time-hogs in my service. The query I spun out is – per http request, get the average duration spent calling each dependency type.

requests
| where timestamp > ago(1d)
| project timestamp, operation_Id
| join (dependencies
        | where timestamp > ago(1d)
        | summarize sum(duration) by operation_Id, type 
        ) on operation_Id
| summarize avg_duration_by_type=avg(sum_duration) by type, bin(timestamp, 20m)
| render barchart

request_join_dependencies

Cool AppInsights Analytics: Counting sampled data

March 30, 2016July 20, 2016assaf___ Leave a comment

If you’re doing stuff you’re supposed to be doing in Analytics – like slicing and dicing request, counting page views, etc. – then you should probably make sure you’re counting correctly.

2 big pitfalls here are:

If you’re sampling your data with App Insights 2.0 sdk, then you should obviously reflect that when counting.
If you’ve got a bunch of tests set up, then you probably don’t want to count those as page views.

For #1, you need to make sure you are always summing items – do sum(itemCount) instead of a simple count().

For #2, remember to add a where clause on the synthetic source field.

Here’s an example:

requests | where timestamp > ago(1d) | where operation_SyntheticSource == "" | summarize sum(itemCount) by performanceBucket

Cool AppInsights Analytics: Charting request failure rate

March 29, 2016July 20, 2016assaf___ Leave a comment

Here is a really cool App Analytics query over App Insights that shows the request failure ratio of your app over the last week.

I use “extend” with the “iff” features to create a a successes field I can count, and then use “extend” again to create a failure ratio.

FailPercent

Cool AppInsights Analytics: Charting common exceptions causing failed requests

March 27, 2016March 15, 2017assaf___ 2 Comments

Here’s a really simple but powerful query charting the most common exceptions causing requests to fail.

We do this by first getting all the failed requests, and joining them to exceptions according to operation_id.

Then we just chart it using a timechart.

requests
| where timestamp > ago(3d)
| where success == "False"
| project timestamp, duration, id, operation_Id
| join (exceptions
   | where timestamp > ago(3d)
   | project type, method, operation_Id) on operation_Id
| summarize count() by type, timestamp bin = time(1h)
| render timechart

request_join_exceptions

Cool AppInsights Analytics: Percentiles

March 23, 2016March 29, 2016assaf___ 3 Comments

Another awesome feature in App Analytics is the ability to calculate statistics on the fly on your data. One example of that is percentile stats.

Here is an easy and extremely useful example – analyzing the duration of server requests in your service.

requests 
| where timestamp > ago(7d)
| summarize percentiles(duration, 50, 90, 99) by bin(timestamp, 1h)
| render timechart

percentile

Cool AppInsights Analytics: Custom dimensions and measurements

March 21, 2016November 2, 2017assaf___ 2 Comments

In App Analytics you can slice and dice on your App Insights custom dimensions and measurements just as easily as any of the so-called “standard” properties.

The only thing that’s a little bit tricky is extracting them first.

It’s tricky because of 2 things:

You have to explicitly set the type of the measurement/dimension after you extract it.
Extracting properties that contain spaces and special characters is a little bit of a hassle.

Here is an example of me doing both:

customEvents 
| where timestamp > ago(3h)
| where name == "Query"
| extend query_time = todouble(customMeasurements.['Query Time'])
| extend query_name = tostring(customDimensions.['Query Name'])
| project query_time, query_name
| summarize avg(query_time) by query_name 
| render barchart

CustomDimensions

If you liked this, check out some other cool analytics queries:

Cool AppInsights Analytics: Piechart of request failures by response code

March 21, 2016March 29, 2016assaf___ 1 Comment

Here is an awesome little pie chart query, to get the most common failures – by response code and operation name.

requests
 | where timestamp > ago(3h)
 | where success == "False"
 | summarize count() by resultCode, operation_Name
 | render piechart

PieChart

4pp1n51ght5

Hacks, tips and tricks from a dev on the Microsoft Windows Cyber team

Tag Application Insights Analytics