Analytics from the Show Me State

Estimating the effects of cookie-deletion

There are differing opinions on how to label the metric historically known as “Unique Visitors”. On one side of the fence are those who think it should be relabeled “Unique Cookies”, since that is the most popular method used for calculations. On the other side of the fence are others who think the metric is a catch-all for the “best available” measurement (authenticated visitors, cookies if those aren’t available, IP/UA combination if neither is available) and should be replaced with a different term if/when a better, standardized way to measure people comes along. What we all agree on, though, is that a Unique Visitor metric measured with cookies is terribly inaccurate.

How bad is it? As usual, it depends. A site where people tend to visit on a daily basis will see more inflation from cookie-deletion than a site that is only visited once per month. In the former example, one person may count toward the monthly total as many as 30 or so times, while in the latter, even a frequent cookie-deleter would only count once.

Let’s pretend that we know something about the actual people visiting a site, and see if we can determine by how much our web analytics numbers might be affected by cookie-deletion.

In order to make the calculations easy, I have assumed people only visit or delete their cookies on daily, weekly, or monthly boundaries, and I am only considering a one month time frame. However, the same logic could be applied to more granular data. It ultimately boils down to a matrix algebra problem, but I doubt many of us are eager to get into that level of detail.

An example

crowd1Consider 10,000 people: not 10,000 “cookies” and not 10,000 “unique visitors,” but 10,000 real-life, carbon-based beings. Suppose we are able to observe these people in such a way that we know “the truth” about their online behavior. Suppose also we have observed that, on average, 10% of our people delete their cookies every day, 15% delete once per week, and the remainder delete their cookies monthly or less frequently.

We have also observed that 20% of these people visit our website every day, 30% visit once per week, and the remainder only visit once in a given month. There’s no reason for these people to login to our website — all visits are anonymous — and we count Unique Visitors using a cookie.

How bad is it?

Our first step is to find out how many different cookies each person will receive over the course of the month, based on the number of times they visit our site and how often they delete their cookies. For simplicity’s sake, we’ll assume each month has 30 days and 4 weeks.

Daily deleters will receive a different cookie each time they visit, so daily visitors will log 30 cookies and weekly visitors log 4. Weekly deleters who visit every day will log 4 different cookies over the course of the month, as will weekly deleters who visit once per week. Everybody else’s activity will be logged with one cookie. We can summarize as shown below.

One month’s time… Daily deleters Weekly deleters Monthly deleters
Visit every day 30 cookies 4 cookies 1 cookie
Visit once/week 4 cookies 4 cookies 1 cookie
Visit once/month 1 cookie 1 cookie 1 cookie

Using the above factors, we can determine the cookie-contribution from each set of people:

(Monthly Calculations) Delete daily (10%) Delete weekly (15%) Delete monthly (75%) # Cookies
Visit daily (2000) 2000 x 10% x 30 = 6000 2000 x 15% x 4 = 1200 2000 x 75% x 1 = 1500 8700
Visit weekly (3000) 3000 x 10% x 4 = 1200 3000 x 15% x 4 = 1800 2250 x 75% x 1 = 2250 5250
Visit monthly (5000) 5000 x 10% x 1 = 500 5000 x 15% x 1 = 750 5000 x 75% x 1 = 3750 5000
7700 3750 7500 18,950

Wow. Our 10,000 people are being represented as 18,950 unique visitors: the Unique Visitors number is inflated by 90%!

Visitor loyalty reports are affected, too

Unique Visitors isn’t the only number that’s affected by cookie-deletion. Any visitor-based number is going to be off, so you have a lot of trouble understanding visitor loyalty. You can tell when your efforts to improve loyalty are working, since the numbers will move in the right direction, but the magnitude of change will be misleading.

For the above example, we know that 20% of our people visited daily, 30% visited weekly, and 50% visited monthly, so the number of visits in a month works out to 77,000 (2000 x 30 + 3000 x 4 + 5000). Visits aren’t affected by cookie-deletion to any great extent (a good argument for visit-based analysis!) – so our tool will also report 77,000 visits.

This means our people averaged 7.70 visits (77000/10000) over the course of the month, but our web analytics tool will only report 4.06 (77000/18950) visits per visitor because the visitors are inflated. Our average visits per visitor are under-reported by 47%!

If you prefer to view loyalty using a histogram (# visitors who visited once, twice, three times, etc.), then we need to determine to which bin each visitor’s cookies will be credited.

(Monthly histogram) Delete daily Delete weekly Delete monthly
Visit daily Each cookie is logged only once Each cookie is seen 7 times in a month Each cookie is seen 30 times (every day)
Visit weekly Each cookie is logged only once Each cookie is logged only once Each cookie is seen 4 times in a month
Visit monthly Each cookie is logged only once Each cookie is logged only once Each cookie is logged only once

The first thing that jumps out of this table is that the majority of the cookies are only encountered once, regardless of how many times someone actually visited. This explains why visitor loyalty graphs, regardless of the tool used, are often overloaded with so many one-time visitors.

visitor duration graphs

visitor duration graphs

Applying the frequencies in the histogram table to the numbers in the calculations table show us how our visitor retention graph is affected by cookie-deletion.

visitor frequency corrections

visitor frequency corrections

Again, wow! While 20% of our people visited the site every day, with cookie-based visitor counting, only 8% appear in this super-loyal segment. The majority of the “visitors” that were added to the site via cookie-deletion appear in the 1 visit bin, inflating that number by almost a factor of 3.

Adding Authentication

If 100% of the people to the above site authenticated, and the authenticated visitor identifier were used to count unique visitors, then the number would be pretty accurate (ignoring shared logins, etc.). But most sites don’t require authentication to see certain pages, so the likelihood of 100% authentication is low except in special cases, like intranets.

For our example above, what if half the visitors authenticated? Half of the 10,000 people would be more-or-less accurately represented, while our cookie-deletion calculations would apply to the remaining 5,000. The unique visitor multiplier factor decreases with increasing percentage of authenticated people.

Assumed: 10/15/75 cookie deletion and 20/30/50 visiting frequency (daily/weekly/monthly).

Assumed: 10/15/75 cookie deletion and 20/30/50 visiting frequency (daily/weekly/monthly).

Is it always that bad?

Not necessarily, it could be worse or it could be better. The above examples assumed that 20% of people visited the website every day, and 30% visited weekly. This was an arbitrary example meant to make calculations easier. In real life, you may have far fewer daily visitors (or more, it just depends on the site). Running the numbers assuming 5% daily and 50% weekly visitors, for example, results in a unique visitor inflation of 1.5 instead of the 1.9 calculated in our example.

I’ve attached my spreadsheet so you can run your own what-ifs.


10 Responses to “Estimating the effects of cookie-deletion”

  1. Christopher Berry Says:

    Timely and relevant!

    A very serious attempt to estimate the size of the problem.

  2. Ned Kumar Says:

    Great post Angie and a good analytical excercise. The one thing I would caution folks is not to use the cookie output to determine the % of daily, weekly, and monthly visits — I know it is common sense but sometimes we fall into the catch22 situations :-)

    Unfortunately, this is is an issue. I have found from my own excercise that the inflation can up up 200% in some cases. The good news was that I found the cookie inflation to have a variation among various segments of customers and so at least you know which one is relatively accurate and which one you have to take a with a bucket of salt (or do some mathematical wizardry like Eric had mentioned in one of his posts).

    In addition to the technique you mention above, an alternative way to get an estimate would be to use your “known” visitors (with a login id or something similar) to draw parallel conclusions for your site. With cookies, one thing I have found is that a triangulation method is the best approach to attack this problem — coming at it from various angles and then sythensize the results for an action plan.

    Again, nice job with the explanation.

  3. angie Says:

    Thank you Chris!

    Ned, that’s a very important point: the visitor frequency (daily, weekly, monthly) from our analytics tools is completely out of whack due to cookie deletion. And it’s why I think developing a real cookie-correction algorithm isn’t going to be an easy task.

    I suspect that if someone is good with matrix algebra, they *might* find that decent visitor loyalty numbers could be estimated from WA tool numbers given assumed cookie-deletion rates, but I haven’t taken it that far, and I no longer consider myself to be “good with matrix algebra.”

  4. jake Says:

    Cookie deletion is indeed a factor, but a similar analysis could be done with multiple devices too. Mobile devices are making this especially apparent. I might be a bit of an outlier, but just today I’ve hit my normal round of sites on my work laptop, iPod Touch, home laptop, and home desktop. 1 person, 4 cookies, and no way to tell of any overlap, unless authenticated. Throw in multiple users on one device (my home desktop and laptop are shared) and you’ve got quite a mess that I just can’t see any algorithm overcoming. It would be as arbitrary as UV’s already is.

  5. angie Says:

    Jake, I agree, and it’s another reason I can’t get behind requiring a cookie-deletion algorithm for web analysts or for the tools they use. What we use now is bad, no doubt, but a “correction” won’t necessarily make it better. There are too many facets to this problem: cookie deletion, multiple devices, and multiple browsers pulling numbers one way, while shared logins and shared computers (big issues for several of my B2C sites) pull the numbers the other way.

    The only ways I can think of to get better unique visitor numbers come with a hefty price on privacy, real or perceived. There are definitely no easy answers.

  6. Dan Says:

    Wouldn’t we be more interested in the trend anyway, rather than the actual number?

  7. Jim Williams Says:

    Angie – great post. You have prompted me to calculate the visitor inflation for our site – a job I have been avoiding for sometime. Not quite sure where I am going to get the deletion numbers from upon which to base my assumptions though. I saw a comScore paper sometime ago – trouble is our visitors are generally teenagers and we have strong evidence that they are visiting on multiple machines at home and at school. We also make the assumption that cookies are deleted from school machines after every session – although we have no evidence to back this up. Luckily we do have registration data which would indicate that the top end for visitor inflation must be somewhere in the region of 50% but we must do a more formal analysis.

  8. David Says:

    comScore has been grappling with this for some time now, and you can check out the cookie deletion white paper here: http://www.comscore.com/Press_Events/Presentations_Whitepapers/2007/Cookie_Deletion_Whitepaper

Trackbacks

  1. Trending Upward | More Great Analytics Resources
  2. Fréquence : prendre en compte la suppression des cookies | Blogue.PubInteractive.ca