Geocoding: The Underappreciated Science of Catastrophe Modeling

“Geocoding”’ is the ability to assign descriptive address information to portfolio locations for the purposes of assessing hazard at a given location. It is a core step that takes place within the catastrophe modeling process, ranging from a single location assessment to bulk look-ups for an entire portfolio.

Despite advances in peril mapping and geocoding technology, the geocoding process often remains a challenge. Here we consider these challenges and how the quality of the data can be improved.

Is the Value of High Quality Geocoding Fully Understood?

Many parts of the world are moving towards achieving finer resolutions of data, but as an industry, how much of an understanding do we have over the quality and sensitivity of this data?

Figure 1: Illustration shows how improving geocoding resolution can increase the accuracy of risk assessment. Source: RMS U.K. Flood Model outline

At a time of heightened introspection about the uses for exposure data and associated modeling outputs, we consider these challenges and how these can best be overcome.

When is High Resolution Geocoding Most Needed?

There are numerous instances when having a high geographic resolution is essential, for example, when modeling a portfolio of risks with large sums insured — where the accumulation of risk across multiple assureds is important to consider.

For instance, when insuring high value assets such as industrial facilities, to derive an accurate output it is essential to separately capture the risk from multiple buildings. To get this level of granularity, it is essential to have high resolution data and geocoding.

For certain perils that are localized in their extents, high resolution peril modeling is crucial such as flood, earthquake, or fire where it is important to assess exactly where the insured asset is located.

Frequent Changes to High Resolution Geocoding Data

However, high resolution geocoding is subject to more frequent changes in geographic information over time. Street networks, postal code or other administrative boundaries can frequently change annually or even quarterly, resulting in multiple versions of the same geographic location needing to be stored, with models then requiring periodic updates. Accurately capturing geographic boundaries or locating a precise building address then requires the most up-to-date geographic data available in the market.

High resolution geocoding information tends to change more often, due to population increases (or decreases), new construction developments or politically-led boundary changes. By contrast, coarser administrative areas (i.e. CRESTA zones, states, provinces, and regions) have fewer boundary changes and are therefore updated much less frequently.

Figure 2: Increasing the analysis resolution – changes to Colonia level (Admin 3) geocoding for Mexico, RMS Version 16 (left) and RMS Version 17 (right) due to enhanced geographic coverage. In Version 17, there is broader coverage, which will lead to more successful matches at the Colonia level, if data is captured at this resolution.

How to Evaluate Whether High Resolution Equals Better Data

An evaluation of the impact to modeled loss from geographic data changes is best practice to determine if such an update is warranted. The longer the time span implementing such updates, the larger the change in loss can be expected for a given location, account, or portfolio. Therefore, if geocoding data is updated every one to three years, loss changes can typically range from a fraction of a percent to less than five percent. However, if longer than three years, it is not uncommon to see loss changes greater than five, ten or even twenty percent in some cases.

When assessing geocoding quality is there too much focus on achieving the highest resolution geocoding possible? In seeking to locate the precise location for a portfolio of risks there is a trade-off between obtaining a high rate of completeness with that of accuracy.

“Completeness” in this context, indicates the proportion of the portfolio of risks that are geocoded to a high resolution. To increase the robustness of any portfolio analysis there is a perceived need to achieve as high a rate as possible to the highest geocoding resolution.

However, this can sometimes be achieved at the expense of “accuracy”. This is a reflection of how close the modeled and actual coordinates are to one another. Inaccurate geocoding renders any analysis on such data fairly redundant.

Figure 3: Theoretical yet commonly observed example showing how an increase in geocoding “completeness” in moving to point B is typically at the expense of “accuracy”

Comparing the Accuracy of Geocoding Between Two or More Providers

Obtaining latitude-longitude coordinates from two providers enables a comparison of these coordinates. Via some simple coordinate geometry distances, between points can be calculated and the cumulative frequency of distance between providers can be assessed.

Figure 4: Evaluating the distance between coordinates from providers A and B shows in this example that 90 percent of locations have coordinates within 3,000 meters of one another.

The locations that either have the greatest distance and/or have the highest level of risk associated with them can then be validated against local insight or alternative providers, to gauge which geocoder provides more accurate coordinates.

In summary, the key to a robust approach to geocoding is to:

  • Use the highest level of geocoding possible: Especially for portfolios with high sums insured that are exposed to perils with high hazard gradients.
  • Use geocoding providers that regularly update detailed geocoding data: To consider changes in boundaries and to be able to readily quantify the impact on loss assessments that this makes.
  • Use intelligent algorithms: That can read address data in multiple languages and account for text being misspelt or jumbled with numbers.
  • Be mindful of the trade-off between geocoding completeness and accuracy: Any high resolution geocode should not be achieved by compromising the accuracy of the original data.

Chris Sams is a senior product manager for geocoding at RMS.

Tim Edwards is a regional director and head of catastrophe analytics Europe at Willis Re. Tim leads Willis Re’ European catastrophe modeling team, holds a bachelor's degree in Economics and is ACII qualified. He has been at Willis for seven years and leads analytical initiatives covering the Willis View of Risk, model evaluation, emerging risks and portfolio optimization.

Leave a Reply

Your email address will not be published. Required fields are marked *