Category: Bad Stats

Bad Stats – Cyber-Value Connection Report

CGI, a multinational cybersecurity and business consultancy, released what they call The Cyber-Value Connection report. This is a white paper focused on the relationship between data breaches and the shareholder value of companies. CGI says that their study shows that, after a data breach, “that share prices fall by an average of 1.8 per cent on a permanent basis following a severe breach. To put that in context, investors in a typical FTSE 100 firm would be worse off by an average of £120 million.” Unfortunately, there are many problems with their analysis, and most of them stem from the poor use of statistics.

Just Trust Us

One common problem that occurs when an organization releases a white paper is that while they say the analysis is based on data, but they do not release that data. Companies will try to add an academic veneer to their work, such as CGI referring to Oxford Economics developing “analytical methodology to examine share price movements in companies that experienced cyber breaches.” An actual academic paper would release the data used, though, so that other anlaysts or researchers would be able to verify the results and, critically, develop alternative explanations for those results which may better fit the data.

Two thirds of companies had their share price adversely impacted, in comparison with their peer group, after suffering a cyber breach.

If a full third of the companies surveyed had their share price positively impacted, that raises questions right off the bat. Does that mean that, in some cases, having a data breach is positive for a company? Or, more likely, does that mean that there is not really a strong correlation between what a share price will do and whether or not a data breach happened? Since the data supporting their finding is not released, then we can’t say for sure.

Even Though We Make Stuff Up

Another common problem in white papers written to seem more academic than they are is to create their own definitions. Compounding the problem, they put that alternative definition in a footnote, which are often skipped by readers and a technique which can be used to hide the fact that you are using a non-standard definition of the term.

In this study, the term ‘breach’ is used to describe any form of major cyber incident.

This doesn’t match up with what most security practitioners would say a breach is. For one, breaches do not have to be “major”. More importantly, a breach is usually taken to mean that data has been viewed or stolen by an unauthorized person. There are other types of cyber incidents, however. For instance, a DDoS attack would not generally be considered a data breach. Ransomware may not, either. And, of course, something like the 2015 and 2016 power outages in the Ukraine would not be considered data breaches. Those would all be cyber incidents, though. It is important when trying to do academic work, or in this case pass something off as a rigorous study, that terms be used appropriately.

The Scale on a Graph Matters

Another misleading technique somebody might use is to change the scale on the x- or y-axes of a graph. In this report, they did that by making the absolute distance different for each of the axes, even though both of them are showing comparable subjects.

BadStats Cyber-Value Connection 1

In this graph, performance above or below the company’s peer group prior to the incident is on the x-axis and performance above or below the company’s peer group after the incident is on the y-axis. But, the value of 10% on the horizontal axis is quite a bit further out than on the vertical axis. This serves to make the findings look more like a line of best fit would be really appropriate. We can look at what the graph would look like if the scales were not skewed, though.

BadStats - Cyber-Value Connection 2.png

This makes it look less like a line could be fitted neatly going from the lower-left quadrant to the upper-right quadrant. In the second graph it looks more like a big blob in the middle. While that isn’t the largest problem with this white paper, and this misuse of the scale isn’t as blatant as other bad graphs, any time you see a graph which skews the scales like that, it should raise a red flag in your mind that they may be purposely trying to mislead you.

Extrapolation Can Lead You Astray

There are several times where CGI applies their findings to “a typical FTSE 100 firm,” saying that the 1.8% loss in market capitalization would equate to £120 million loss in value.

The problem is, there is nothing in the data presented which would lead one to conclude that a “typical FTSE 100 firm” sees that 1.8% drop. The FTSE 100 are the 100 companies on the London Stock Exchange with the highest market capitalization. The CGI white paper luckily includes an appendix on the methodology they used. In there, they say that they, “focus on 65 ‘severe’ and ‘catastrophic’ breaches occurring since 2013 across seven global stock exchanges.” In statistics, if you extrapolate outside of the data set you actually used, then the possibility of erros occurring increases dramatically. If they stayed within the data set they used, then we would say they were interpolating the data, which is much less likely to introduce errors into your analysis.

CGI could have found the average market capitalization of the companies on those seven exchanges and then found what 1.8% of that average market cap would be. This would have been interpolating from the data. But, this would lead to a much smaller dollar value and CGI would no longer be able to use that “£120 million loss in value” line, as they did several times throughout the white paper.

CGI does include in the article the country of the ten firms which had the largest percentage drop in share price. The top two countries both came from the UK, but none of the other top ten did. Maybe this means that the two “worst” companies were from the FTSE 100 and the £120 million number should be higher. Or maybe those two weren’t even in the FTSE 100, and smaller companies had larger percentage effects. Since the data was not released, we cannot know the answer.

Playing with Timelines

CGI also played with timelines. They used two cases to demonstrate their idea that share price dropped.

BadStats - Cyber-Value Connection 3.png

BadStats - Cyber-Value Connection 4.png

The problem with this analysis is that the stock market is not a four-week long process. It is quite likely that the price would have rebounded in the following weeks (for example, see this Harvard Business Review article which discusses just that phenomenon). By creating this artificial cutoff of the timeline, the results can be very misleading. That second graph also refers to a “UK communications firm” in CGI’s description of the graph. Their largest effect, where the share price fell by -15%, also said it was in the UK Media and Communications sector. So, it is likely that they cherry-picked their most dramatic example for creating this graph.

CGI also plays with timelines in the next graph they show, although in a different way.

BadStats - Cyber-Value Connection 5.png

This time, they show the percent impact on a firm’s share price as being worse as time goes on. But, since they decided to combine 2015 and 2016 into a single entry, it makes me wonder why they would do that? One explanation would be that 2015 actually had a big jump, but then 2016 had a regression back towards where the 2014 effect was. This would not work with the FUD they’re trying to sell about things getting worse, and so they would need to combine those two years. Personally, I cannot see any other reason to make the graph this way.

Their next graph, like the line graphs of share price over time, also uses an artificial cutoff date. There is no logical reason to use an arbitrary day of the week to perform this analysis. Assuming CGI’s hypothesis is correct, then why wouldn’t they compare 7 days after the breach instead of on Friday. Or why not pick the Wednesday after the breach to focus on? There’s no real explanation for why the Friday following the breach is significant.

BadStats - Cyber-Value Connection 6.png

Data & Results Aren’t Important Enough to Talk About

Another clue that can tell you that the data and the findings should be questioned is if the data and results are given short shrift in the white paper. This More than half of this white paper dealt with things unrelated to the data, such as CGI using an entire page to a partner insurance agency to sell their insurance offerings.

BadStats - Cyber-Value Connection 8.png

Internal Discrepancies

Remember the two graphs showing the share price of a given company for the two weeks prior to a breach and the one week after?

BadStats - Cyber-Value Connection 3

Notice that the graph starts four weeks prior to the data breach? Well, look at this from the appendix on the methodology.

To gain a realistic assessment of share price performance, the analysis tracked the subject companies’ shares in the two weeks leading up to the breach (emphasis added).

This is an example of what I would call an internal discrepancy. At the least, this implies there was a serious lack of fact-checking, editing, and quality control on the production of this white paper. At the worst, this means that they were making things up. Again, since the data was not released, in the end it is impossible to know what this means, other than it should be noted and remembered when you think about whether you should use data from this study.

Do the Conclusions Fit the Data?

Last of all, we can look at the conclusions and ask ourselves they fit the data we’ve been shown. CGI says that, “Overall, share values in affected companies were seen to perform less well than shares in companies that had not been affected. Furthermore, this damage is permanent: an affected company’s shares do not recover their pre-breach performance relative to the control group.”

This is a huge stretch. From their methodology section, they say that, “The analysis tracked the movement in the share price for one week following the breach incident: using this short time window eliminated the influence of any ‘noise’ from factors unrelated to the cyber breach affecting the share price.” That a share price did not recover in one week has absolutely no informaitional value in trying to determine whether the company’s shares recover or not. Share prices go down for a week all the time, for all sorts of reasons. To try to say that just because their price went down for one week means their “shares do not recover” is somewhere on the spectrum between misleading, wrong, and idiotic.