Calculating Stickiness Using AppInsights Analytics

Update:

There is a new, simpler, better way to calculate usage metrics such as stickiness, churn and return rate.


In previous posts I  demonstrated some simple yet nifty tricks to get stuff done in app insights analytics – like extracting data from traces, or joining tables.

Those were mostly pretty simple queries, showing some basic Kusto techniques.

In this post I’m going to show something much more complex, with some advanced concepts.

We’re gonna take it slow, but be warned!

What I wanna do is calculate “Stickiness“. This is a measure of user engagement, or addiction to your app. It’s computed by dividing DAU (daily active users) by MAU (monthly active users) in a rolling 28 day window. It basically shows what percentage of your total user base is using your app daily.

Computing your DAU is pretty simple in analytics, and can be done using a simple dcount aggregation:

requests
| where timestamp > ago(60d)
| summarize dcount(user_Id) by bin(timestamp, 1d)

But how do you compute a rolling 28-day window unique count of users? For this we’re gonna need to get familiar with some new Kusto operators:

hll() – hyperloglog – calculates the intermediate results of a dcount.

hll_merge() – used to merge together several hll intermediate results.

dcount_hll() – used to calculate the final dcount from an hll intermediate result.

range() – generates a dynamic array with equal spacing

mvexpand() – expands a list into rows

let – binds names to expressions. I’ve already shown a use for let in a past post.

It’s kind of a lot, but let’s get going and see how we’re gonna use each of these along the way.

Let’s do this in steps. Our goal is to calculate a moving 28 day window MAU. First thing, instead of dcount we’ll use hll, to get the intermediate results:

requests
| where timestamp > ago(60d)
| summarize hll(user_Id) by bin(timestamp, 1d)

With the intermediate results in place, the next phase is to think about which dates will use each intermediate result. If we take 20/1/2017 as an example, well, we know that each subsequent day, 28 days forward, will want to use this hll for it’s moving window result. So we build a list of [21/1/2017, 22/1/2017 … 18/2/2017].

So what we do here, and this is a little dirty, is create a list of all the future dates that will need this result. We do this using the range operator:

requests
| where timestamp > ago(60d)
| summarize hll(user_Id) by bin(timestamp, 1d)
| extend periodKey = range(bin(timestamp, 1d), timestamp+28d, 1d)

Now let’s turn every item in the periodKey column list, into a row in the table. We’ll do this with mvexpand:

requests
| where timestamp > ago(60d)
| summarize hll(user_Id) by bin(timestamp, 1d)
| extend periodKey = range(bin(timestamp, 1d), timestamp+28d, 1d)
| mvexpand periodKey

So now, when sorting by periodKey, each date in that column has exactly 28 rows, each with an hll from a different date it needs to calculate the total dcount. We’re almost done! Let’s calculate the dcount:

requests
| where timestamp > ago(60d)
| summarize hll(user_Id) by bin(timestamp, 1d)
| extend periodKey = range(bin(timestamp, 1d), timestamp+28d, 1d)
| mvexpand periodKey
| summarize rollingUsers = dcount_hll(hll_merge(hll_user_Id)) by todatetime(periodKey)

That’s the 28 day rolling MAU right there!

Now let’s make this entire query modular, so we can calculate any length rolling dcount we’d like – including a zero day rolling (DAU actually) – and calculate our metric:

let start=ago(60d);
let period=1d;
let RollingDcount = (rolling:timespan)
{
requests
| where timestamp > start
| summarize hll(user_Id) by bin(timestamp, period)
| extend periodKey = range(bin(timestamp, period), timestamp+rolling, period)
| mvexpand periodKey
| summarize rollingUsers = dcount_hll(hll_merge(hll_user_Id)) by todatetime(periodKey)
};
RollingDcount(28d)
| join RollingDcount(0d) on periodKey
| where periodKey < now() and periodKey > start + 28d
| project Stickiness = rollingUsers1 *1.0/rollingUsers, periodKey
| render timechart

STICKINESS ON THE FLY!

stickiness

Cool AppInsights Analytics: Extracting url host with a regular expression

Another nice feature of Kusto / Application Insights Analytics is full on support for regular expressions using the extract keyword.

A very useful application of this is all matter of manipulations you can do over the “url” field in requests. A common ask is understanding how much traffic is generated by any of your different hosts.

Since Analytics only carries the full url field, we need to parse out the host out using a regex. I took a really really simple regex in this case, but obviously it can be much more complex.

pageViews
| where timestamp > ago(1d)
| extend urlhost=extract('^(http://|https://)([^:\\/\\s]+)', 2, url)
| summarize count() by urlhost
| render piechart

Update:

There is now a simpler method to extract all url parts – parseurl.

requests
| take 5
| extend urlParts = parseurl(url)
| project url, urlParts, urlParts.Scheme, urlParts.Host, urlParts.Path, urlParts.Port

host

Cool Azure Log Analytics: Joining requests and dependencies

Another cool thing you can do with App Insights Analytics is join different data types to get a good understanding of what’s happening in your app.

A great example are remote dependencies – this is an out-of-the-box feature in App Insights that logs all remote dependency calls such as SQL, Azure, http etc. If you’ve got that data flowing, you can get amazing insights with just a few small queries.

Here’s a small example – Lets’ try and find out which resources are real time-hogs in my service. The query I spun out is – per http request, get the average duration spent calling each dependency type.

requests
| where timestamp > ago(1d)
| project timestamp, operation_Id
| join (dependencies
        | where timestamp > ago(1d)
        | summarize sum(duration) by operation_Id, type 
        ) on operation_Id
| summarize avg_duration_by_type=avg(sum_duration) by type, bin(timestamp, 20m)
| render barchart

request_join_dependencies

Cool AppInsights Analytics: Counting sampled data

If you’re doing stuff you’re supposed to be doing in Analytics – like slicing and dicing request, counting page views, etc. – then you should probably make sure you’re counting correctly.

2 big pitfalls here are:

  1. If you’re sampling your data with App Insights 2.0 sdk, then you should obviously reflect that when counting.
  2. If you’ve got a bunch of tests set up, then you probably don’t want to count those as page views.

For #1, you need to make sure you are always summing items – do sum(itemCount) instead of a simple count().

For #2, remember to add a where clause on the synthetic source field.

Here’s an example:

requests
| where timestamp > ago(1d)
| where operation_SyntheticSource == ""
| summarize sum(itemCount) by performanceBucket

Cool AppInsights Analytics: Charting request failure rate

Here is a really cool App Analytics query over App Insights that shows the request failure ratio of your app over the last week.

I use “extend” with the “iff” features to create a a successes field I can count, and then use “extend” again to create a failure ratio.

requests
| where timestamp > ago(7d)
| extend isSuccesss=iff(success=="True" ,1, 0)
| summarize failures=sum(1-isSuccesss) , successes=sum(isSuccesss)
by timestamp bin=20m
| extend ratio=todouble(failures) / todouble(failures+successes)
| project timestamp, failure_Percent=ratio*100
| render timechart

FailPercent

 

Cool AppInsights Analytics: Charting common exceptions causing failed requests

Here’s a really simple but powerful query charting the most common exceptions causing requests to fail.

We do this by first getting all the failed requests, and joining them to exceptions according to operation_id.

Then we just chart it using a timechart.

requests
| where timestamp > ago(3d)
| where success == "False"
| project timestamp, duration, id, operation_Id
| join (exceptions
   | where timestamp > ago(3d)
   | project type, method, operation_Id) on operation_Id
| summarize count() by type, timestamp bin = time(1h)
| render timechart

request_join_exceptions

Cool AppInsights Analytics: Percentiles

Another awesome feature in App Analytics is the ability to calculate statistics on the fly on your data. One example of that is percentile stats.

Here is an easy and extremely useful example – analyzing the duration of server requests in your service.

requests 
| where timestamp > ago(7d)
| summarize percentiles(duration, 50, 90, 99) by bin(timestamp, 1h)
| render timechart

percentile

Cool AppInsights Analytics: Custom dimensions and measurements

In App Analytics you can slice and dice on your App Insights custom dimensions and measurements just as easily as any of the so-called “standard” properties.

The only thing that’s a little bit tricky is extracting them first.

It’s tricky because of 2 things:

  1. You have to explicitly set the type of the measurement/dimension after you extract it.
  2. Extracting properties that contain spaces and special characters is a little bit of a hassle.

Here is an example of me doing both:

customEvents 
| where timestamp > ago(3h)
| where name == "Query"
| extend query_time = todouble(customMeasurements.['Query Time'])
| extend query_name = tostring(customDimensions.['Query Name'])
| project query_time, query_name
| summarize avg(query_time) by query_name 
| render barchart

CustomDimensions

If you liked this, check out some other cool analytics queries: