Crowdsourcing a cyber risk model

At Open Risk we are very fond of what digital technology can do to improve financial risk management (and thus, actually, finance). We are also keenly aware of the new and material risks that new technologies carry with them. Whether algorithmic biases or any of the diverse new IT risks, the so-called fintech (r)evolution is likely to be a stillborn unless those emerging risks are harnessed effectively and demonstrably.

The overall challenge is enormous and will profitably occupy many professionals for many years to come. Our thesis has been that whether fintech simply “rediscovers” age-old risk management issues or creates brave new ones, collaboration, transparency and standards are a key part of the solution. In the past few years we have been busy translating that vision into tangible tools.

As another small step in this direction we want to put together and support a timely and interesting crowdsourced project that will further add to a forward-looking risk management toolkit.

March is Data Breach month

During March we want to explore how far we can go with crowdsourcing an open source knowledge base for understanding and potentially measuring cyber risk and more specifically the risk of digital data breaches


There are three primary deliverables:

  1. Additional / improved entries at the Open Risk Manual around the definition and nature of IT Risk (and data breaches in particular)
  2. Cleansed / organized and interpreted risk event data sets providing historical context around this risk type
  3. An open source library for estimating the profile of related risk events

Main work areas

Roughly speaking, besides the overall definition and framing the work that is done in this blog post, can be split into four segments that will be documented on separate section as indicated below

  1. Identifying Data and Assessing Data Quality
  2. Exploratory Data Analysisto obtain insights
  3. Model Development (if supported by sufficient data)
  4. Validation and Assesment(subject to 3)

How to get involved

Depending on your background, knowledge, available time you can join in any or all of those sub-projects. The following resources are available

  • This set of five live blog posts documenting the progress of the project and providing further links
  • An open source Github Repository for collecting data and/or code (a github account is required to contribute)
  • A Discord Server for chat / voice communication to facilitate coordination (a discord account is required to join)
  • The Open Risk Manual for article entries (a wiki account is required to contribute)

Background and Getting Started

In this section we collect various resources that can help as background reading for getting up to speed with the project:

  • IT Risk and Cyber Risk Definitions
  • The Data Breaches Event Category
  • Data Sources
  • Data Quality Process
  • Statistical Analysis
  • Modelling Options
  • Validation and Assessment

IT Risk and Cyber Risk Definitions

In this section we capture relevant definitions of the risk we want to model. Defining risks precisely is not an easy task as they depend a lot on context. One useful tool that help frame the concept is a risk taxonomy. This usually a tree structure, whereby risks higher in the hierarchy are decomposed into more specific (granular) manifestations. The most useful for our purposes is the IT risk taxonomy developed by European Banking Authority in its Final Guidelines on ICT Risk Assessment.

Out the five subtypes of that taxonomy (Availability, Security, Change, Data Integrity, Outsourcing) it is the IT Security Risk that is the most relevant for our purposes (keeping in mind that large scale incidents may have dependencies or implications to the other sub-types as well.

IT Security Risk can be further subdivided along a wide range of possible dimensions. For example whether the threat is acting in a digital manner or on physical assets, whether it is external or internal to the systems being attacked, and the technique being used (malware, hacking, social engineering, misuse)

The Data Breaches Event Category

A data breach is the intentional or unintentional release of secure or private/confidential information to an untrusted environment. The impact of a data breach is fully captured in terms of the so-called Parkerian Hexad:

  • Confidentiality: violating the limits on who can see what kind of information
  • Possession or Control: change the set of who can act on information
  • Integrity: changing the correctness or consistency of data
  • Authenticity: affecting the veracity of the claim of origin or authorship of the information
  • Availability: preventing timely access to information
  • Utility: affecting the usefulness of data

The definition of data breach shows yet another possible classification of IT security risk, this time with a focus on the impact of risk events. As can be seen from the number of digital asset attributes enumerated by the Parkerian Hexad, the impact can be felt in a wide variety of ways. In particular, the conversion of impact into a one dimensional measure (such as monetary value / loss) may not always be easy.

Data Sources

We focus primarily on data sources for significant, recorded (historical), published data breaches (not general vulnerabilities, incidents or threat monitoring) but we list all public resources

Data Quality

  • Data quality frameworks
  • Address the five levels of DQ analysis
  • Produce a golden source data set

Exploratory Data Analysis

  • Data facets
  • Statistical measures
  • Univaritate analysis
  • Multivariate analysis

Model Development

  • Model Categories
  • Frequency Models
  • Severity Models
  • Putting it all together

Model Validation

  • Validation of implementation
  • Validation of model components
  • Qualitative validation
  • Recommendations

Leave a comment