Data Engineering for Risk Analytics with Risk Data Open Standard
Cihan BiyikogluDecember 10, 2019
What Is Risk Analytics?
The picture below on the left shows the extensive flooding at industrial parks north of Bangkok, Thailand. Western Digital had 60 percent of its total hard drive production coming from the country – floods disrupted production facilities at multiple sites to dramatically affect a major, global supply chain. And the picture on the right – showing flooding on the New York Subway from Hurricane Sandy, caused widespread disruption and nearly US$70 billion of losses across the northeastern U.S.
In both examples, the analysis of risk should not only help with physical protection measures such as stronger buildings through improved building codes or better defenses, but also the protection available through financial recovery. Providing financial protection is the job of the financial services and insurance industries. Improving our understanding of and practices in risk analytics as a field is one of the most interesting problems in big data these days, given the increasing set of risks we have to watch for.
How Does Risk Analytics Work?
Obviously, the risk landscape is vast. It stretches from “natural” events – such as severe hurricanes and typhoons, to earthquakes to “human-generated” disasters, such as cyberattacks, terrorism and so on.
The initial steps of risk analytics start with understanding the exposure – this is the risks a given asset, individual etc. are exposed to. Understanding exposure means detailing events that lead to damage and the related losses that could result from those events. Formulas get more complicated from here. There is a busy highway of data surrounding this field. Data engineers, data scientists, and others involved in risk analytics work to predict, model, select, and price risk to calculate how to provide effective protection.
Data Engineering for Risk Analytics
Let’s look at property-focused risks. In this instance, risk analytics starts with an understanding of how a property – such as a commercial or a residential building, is exposed to risk. The kind of events that could pose a risk and the associated losses that could result from those events depends on many variables.
The problem is that in today’s enterprise, if you want to work with exposure data, you have to work with multiple siloed systems that have their own data formats and representations. These systems do not speak the same language. For a user to get a complete picture, they need to go across these systems and constantly translate and transform data between them. As a data engineer, how do you provide a unified view of data across all systems? For instance, how can you enable a risk analyst to understand all kinds of perils – from a hurricane, a hailstorm to storm surge, and then roll this all up so you can guarantee the coverage on these losses?
There are also a number of standards used by the insurance industry to integrate, transfer, and exchange this type of information. The most popular of these formats is the Exposure Data Model (EDM). However, EDMs and some of their less popular counterparts (Catastrophe Exposure Database Exchange – CEDE, and Open Exposure Data – OED) have not aged well and have not kept up with the industry needs:
These older standards are property centric; risk analytics requires an accommodation and understanding of new risks, such as cyberattacks, liability risks, and supply chain risk.
These older standards are propriety-designed for single systems that do not take into account the needs of various systems, for example, they can’t support new predictive risk models.
These standards don’t come with the right containment to represent high fidelity data portability – the exposure data formats do not usually represent losses, reference data, and settings used to produce the loss information that can allow for data integrity.
These standards are not extensible. Versioning and dependencies on specific product formats (such as database formats specific to version X of SQL Server etc) constantly make data portability harder.
This creates a huge data engineering challenge. If you can’t exchange information with high fidelity, forget getting reliable insights. As anyone dealing with data will say: garbage in, garbage out!
For any data engineer dealing with risk analytics, there is great news. There is a new open standard that is designed to remove shortcomings of the EDM and other similar formats. This new standard has been in the works for several years. It is the Risk Data Open Standard. The Risk Data Open Standard (RDOS) is designed to simplify data engineering. It is designed to simplify integrating data between systems that deal with exposure and loss data. It isn’t just RMS working to invent and validate this standard in isolation. A steering committee of thought leaders from influential companies is working on validating the Risk Data OS.
The Risk Data OS will allow us to work on risk analytics much more effectively. This is the way we can better understand the type of protection we need to create to help mitigate against climate change and other natural or human-made disasters. You can find details on the Risk Data OS here. If you are interested in the Risk Data OS, have feedback, or would like to help us define this standard, you can email the Risk Data OS steering committee by clicking here .
Cihan Biyikoglu is the Executive Vice President, Product for RMS, responsible for product management across the full suite of RMS models and risk management tools. He has extensive experience in leading product management for innovative machine learning and big data analytics solutions at Fortune 500 companies over the last 20 years.
As a former Vice President of Product at Databricks and Redis Labs, Cihan both developed the product strategy and road map for open-source technologies such as Apache Spark and Redis and respective enterprise offerings in the public and private cloud platforms.
Cihan also worked on products at Microsoft, Couchbase, and Twitter, where he focused on on-premises and cloud offerings in the data and analytics space. At Microsoft, Cihan focused on the incubation of the Azure Cloud Platform in its early days and the SQL Server product line, both of which have grown to multi-billion-dollar businesses for Microsoft.
Cihan holds a number of patents in the data management and analytics space, and he has a master’s degree in database systems and a bachelor’s degree in computer engineering.