# Connecting the Dots: Concentration, diversity, inequality and sparsity in economic networks

In this second Open Risk White Paper on "Connecting the Dots" we examine measures of concentration, diversity, inequality and sparsity in the context of economic systems represented as network (graph) structures.

Page content

## Concentration, diversity, inequality and sparsity in the context of economic networks

In this second Open Risk White Paper on Connecting the Dots we examine measures of concentration, diversity, inequality and sparsity in the context of economic systems represented as network (graph) structures. We adopt a stylized description of economies as property graphs and illustrate how relevant concepts can represented in this language. We explore in some detail data types representing economic network data and their statistical nature which is critical in their use in concentration analysis. We proceed to recast various known indexes drawn from distinct disciplines in a unified computational context.

## Motivation

Economic networks of interconnected agents engaging in the exchange of goods, services and contracts of various forms are the defining attribute of human economies. While this is intuitively obvious, the quantification and analysis of economic networks has not played an important role in the historical development of economic and financial theory and practice. Yet in recent times, supported by developments in information technology both academics and practitioners connect the dots by analyzing economic phenomena making use of graph theory and network analysis. Such developments help create more detailed understanding of the structures and interactions between economic agents, ultimately guiding towards better policies and risk management.

Simply put, what underlies concentration (or diversity or sparsity or inequality measurement) is the assessment of the degree to which a particular property or properties are distributed across an extended system. In a previous paper (OpenRiskWP01_032705) we reviewed the definitions of widely used concentration metrics such as the concentration ratio, the HHI index and the Gini and clarified their meaning and relationships in a probabilistic context. This analytic framework helped clarify the apparent arbitrariness of simple concentration indexes and brings to the fore underlying unifying concepts behind these metrics, thereby enabling their more informed use in portfolio and risk management applications. Expanding the scope of concentration risk analysis we here ask the question: How do the objectives and tools of concentration risk analysis translate in the context of economic networks?

The question broadens the scope of what we consider relevant input data and analytic tools so it is beneficial to take a look at the diverse approaches developed for (broadly defined) concentration measurement in various disciplines over the past decades:

• Inequality Measures: Inequality indexes are constructed and studied in sociology, economics and policy work. This is maybe the most developed domain from a theory perspective as it incorporates utility preferences and selects suitable functional forms on the basis of rigorous axiomatic frameworks. On the other hand the focus is on important but rather specific numerical variables such as income and wealth distributions.
• Concentration Risk Measures: Concentration enters in various areas of finance and economics when assessing industial, market share concentration, market risk or credit portfolio risk concentration. A wide range of approaches is employed in practice: from simple numerical measures to sophisticated simulation based risk measures that are only computable through Monte Carlo simulations. This integration of concentration metrics with modeled risk measures is rather unique in this domain.
• Sparsity Measures: In signal processing sparsity means that a small number of (spectral) coefficients contains a large proportion of the energy. The concept of sparsity is very flexible and is also applicable in machine learning. This is possibly the most context-agnostic application domain, focusing on an information theoretic assessment without additional constraints or insights.
• Diversity Measures: These are common tools in ecology when assessing biodiversity. Diversity indexes focus on species abundances (thus mainly categorical data) instead of numerical data that are more common in inequality and concentration analysis. A unique aspect of biodiversity measures are multi-scale considerations.
• Spatial Concentration: In various domains utilizing geospatial data there is a need for indexes that express spatial concentration. Here the concentration measurement is intrinsically multi-dimensional. This introduces an expanded toolkit which aims among others to identify spatially close entities.
• Clustering and Centrality Measures: In network theory a distinct category of metrics aims to characterise the clustering of network connections in a graph. This domain too, requires highly specialized tools to extract usable information from graph structures.

To see how these diverse view points can contribute to answering our main question we develops first a bit further the quantitative framework that was first introduced in (OpenRiskWP08_131219). As a reminder the approach stylizes the description of economic networks as contractual relationships between agents described as a property graph.

## Economic Network Data Classification

The precise nature of node and edge attributes used to represent economic networks is open ended and left to the analyst to design as required. The structure enables abstractions that represent many of the diverse forms of human economic activity, different types of transactions, contracts, accounting approaches etc. In general economic agents might be mapped into nodes but not every node need be an economic agent). Other important entities or concepts may also be usefully considered as a network node - ultimately nodes are book-keeping devices. For example an asset may be considered a node property or a standalone node with its own attributes depending on the fidelity required. Transactions, contracts or other relations between nodes will be mapped into edges (links). More concretely for our current purposes of integrating the diverse range of concentration indexes we will idealize the following:

• Individual economic agents, both physical persons and legal only entities will be thought as nodes of the network.
• Nodes have individual properties, from within a vast variety of types, which are associated with each agent (node). They represent for example ownership of assets including cash / money.
• Agents may engage in exchanges (property transfers, service provision etc.) that are abstracted as transactions.
• Agents may also enter into contracts (that basically govern future transactions) and may have finite duration (such as debt) or perpetual nature (such as equity).

For our purposes we will assume that the economic network expressed as property graph contains all the information of interest in the set $(\mathbf{x}^p_n, \mathbf{y}^q_m, A^q)$, where $\mathbf{x}^p_n$ is a dataframe characterising nodes, $\mathbf{y}^q_m$ a dataframe characterising edges and $A^q$ various types of adjacency matrices capturing the network connectivity (or topology). Some important considerations when characterising such input data with a view towards creating concentration measures are as follows:

• Intrinsic data types and in particular numerical $w$ versus categorical $c$ data
• The special case of spatio-temporal numerical variables
• Extensive versus Intensive Variables
• Stock versus Flow Variables

## Index Catalog

The index catalog is a list of commonly used metrics across a variety of domains, cast in a uniform notation and with reference to the network data structures. The list is not exhaustive but aims to cover the major families. The popular indexes can be classified according to various attributes.

• General Purpose Indexes using on any one of the $w, c$ data vectors. This category forms the majority of common indexes. In this use case, some property of the system is studied in isolation from other properties and/or the network topology. The most developed theory and practice of concentration indexes concerns the one-dimensional case but some applications require higher dimensional considerations.
• Diversity Indexes are distinct as they use exclusively categorical $c$ vector data. While their form is quite close to general purpose indexes, they are special in that both the population size $N$ and the number of categories $S$ may be relevant and used in the construction of the index.
• Temporal Clustering which is a special univariate numerical analysis.
• Spatial Concentration. Multivariate Concentration Indexes. Two or more properties from the $(w, c)$ set are studied jointly. This segment includes Spatial Concentration Indexes which capture the density of objects (or object properties) in two or three-dimensional space.
• Network Concentration Indexes that are calculated using the adjacency matrices $A$.

The catalog includes circa 40 different indexes, listed in the white paper.