Tag Archives: data engineering

Data Engineering for Risk Analytics with Risk Data Open Standard

This article was originally published by DZone

What Is Risk Analytics?

The picture below on the left shows the extensive flooding at industrial parks north of Bangkok, Thailand. Western Digital had 60 percent of its total hard drive production coming from the country – floods disrupted production facilities at multiple sites to dramatically affect a major, global supply chain. And the picture on the right – showing flooding on the New York Subway from Hurricane Sandy, caused widespread disruption and nearly US$70 billion of losses across the northeastern U.S.

In both examples, the analysis of risk should not only help with physical protection measures such as stronger buildings through improved building codes or better defenses, but also the protection available through financial recovery. Providing financial protection is the job of the financial services and insurance industries. Improving our understanding of and practices in risk analytics as a field is one of the most interesting problems in big data these days, given the increasing set of risks we have to watch for.

Flooding at industrial parks north of Bangkok, Thailand in 2011 (left) and flooded subway stations in New York after Hurricane Sandy in 2012 (right) Image credit: Wikimedia/Flickr

How Does Risk Analytics Work?

Obviously, the risk landscape is vast. It stretches from “natural” events – such as severe hurricanes and typhoons, to earthquakes to “human-generated” disasters, such as cyberattacks, terrorism and so on.

The initial steps of risk analytics start with understanding the exposure – this is the risks a given asset, individual etc. are exposed to. Understanding exposure means detailing events that lead to damage and the related losses that could result from those events. Formulas get more complicated from here. There is a busy highway of data surrounding this field. Data engineers, data scientists, and others involved in risk analytics work to predict, model, select, and price risk to calculate how to provide effective protection.

Data Engineering for Risk Analytics

Let’s look at property-focused risks. In this instance, risk analytics starts with an understanding of how a property – such as a commercial or a residential building, is exposed to risk. The kind of events that could pose a risk and the associated losses that could result from those events depends on many variables.

The problem is that in today’s enterprise, if you want to work with exposure data, you have to work with multiple siloed systems that have their own data formats and representations. These systems do not speak the same language. For a user to get a complete picture, they need to go across these systems and constantly translate and transform data between them. As a data engineer, how do you provide a unified view of data across all systems? For instance, how can you enable a risk analyst to understand all kinds of perils – from a hurricane, a hailstorm to storm surge, and then roll this all up so you can guarantee the coverage on these losses?

There are also a number of standards used by the insurance industry to integrate, transfer, and exchange this type of information. The most popular of these formats is the Exposure Data Model (EDM). However, EDMs and some of their less popular counterparts (Catastrophe Exposure Database Exchange – CEDE, and Open Exposure Data – OED) have not aged well and have not kept up with the industry needs:

  • These older standards are property centric; risk analytics requires an accommodation and understanding of new risks, such as cyberattacks, liability risks, and supply chain risk.
  • These older standards are propriety-designed for single systems that do not take into account the needs of various systems, for example, they can’t support new predictive risk models.
  • These standards don’t come with the right containment to represent high fidelity data portability – the exposure data formats do not usually represent losses, reference data, and settings used to produce the loss information that can allow for data integrity.
  • These standards are not extensible. Versioning and dependencies on specific product formats (such as database formats specific to version X of SQL Server etc) constantly make data portability harder.

This creates a huge data engineering challenge. If you can’t exchange information with high fidelity, forget getting reliable insights. As anyone dealing with data will say: garbage in, garbage out!

For any data engineer dealing with risk analytics, there is great news. There is a new open standard that is designed to remove shortcomings of the EDM and other similar formats. This new standard has been in the works for several years. It is the Risk Data Open Standard. The Risk Data Open Standard (RDOS) is designed to simplify data engineering. It is designed to simplify integrating data between systems that deal with exposure and loss data. It isn’t just RMS working to invent and validate this standard in isolation. A steering committee of thought leaders from influential companies is working on validating the Risk Data OS.

The Risk Data OS will allow us to work on risk analytics much more effectively. This is the way we can better understand the type of protection we need to create to help mitigate against climate change and other natural or human-made disasters. You can find details on the Risk Data OS here. If you are interested in the Risk Data OS, have feedback, or would like to help us define this standard, you can email the Risk Data OS steering committee by clicking here .