# Input-Output Models as Graph Networks

We discuss the relation of economic input-output models with graph theory and networks

## Motivation

The economy is a complex tangle of various agents that interact via transactions (sales and purchases) and contracts (lending, investing). In recent times more and more techniques from **graph theory** and **network science** are brought to bear on economic analysis. On the other hand, ever since the seminal contributions of Leontief, **Input-Output Models** (IO) have been widely used to describe economic *relationships* between economic actors (e.g. industrial sectors) operating within and between regions. IO models are central tools of Input-Output Analysis
which aims to answer fundamental economic questions such as: what level of production outputs is required by industrial sectors (interacting in complex supply chains) if there is specific change in demand by consumers.

How are IO models related to Graph Theory? In this post we explore this relationship in some detail and show that both conceptually and practically it is a connection that it is useful to be aware of.

## Elements of Graph Theory

Let us first start with the minimal amount of graph theory that we will need to explore the relation with economic IO tables. The definition of **mathematical graphs** goes back already to 18th century mathematics and the work of *Leonhard Euler* and the famous Seven Bridges of Königsberg
problem.

Graphs are *extremely* versatile objects. Nowadays, they are used in many areas of both abstract and applied mathematics, computer science, physics, biology and more scientific fields. In a previous post we enumerated no less than nine distinct ways in which graphs play an essential role!

In this post we narrow our focus to a specific graph subcategory:

*directed graphs*(all edges have a defined direction from one node to another)- which allow
*self loops*(a node can have an edge to itself) - which only have one edge between nodes (simple as opposed to multigraphs)
- have a single number associated with every edge (weighted graphs)

**Definition**: A *mathematical graph* is the tuple $G = (V, E, \partial^{+}, \partial^{-})$ where $V$ is a set fo vertices, $E$ is a set of edges and $\partial^{+}, \partial^{-}: E \rightarrow V$ are maps called *incident relations*.

The above definition introduces the absolutely necessary elements. In the common visual depiction of a graph the vertices and edges (incident relations) of the graph are shown as a collection of nodes and links between the nodes. A graph with the constraints introduced above (Some authors name these digraphs with loops or loop-digraphs) would be illustrated as follows:

In Fig 2 we visualize the type of graph we will discuss. We have a number of nodes (which we label by letters), they are connected with edges that are directed (indicated via the arrows) and weighed (indicated by the numbers attached to each arrow). In addition, one of the nodes (labelled H) has a loop onto itself. It is important to keep in mind that this is only a *visualization* of the underlying abstract graph concept. A visual representation may not faithfully represent the precise mathematical properties of the objects. For example in the above, the labels are for convenience only. The mathematical object itself does not include the labels.

## Matrix - Graph Duality

The duality between graphs and matrix representations has been a part of graph theory since early on and *Matrix Algebra* has been recognized as a useful tool in graph theory. Lets explore how this works with a very simple graph. We start with the notion of an **incidence matrix**.

### Incidence Matrices

The **incidence matrix** of a directed graph is an $n\times m$ matrix $B$ where n and m are the number of vertices and edges respectively, such that $b_{ij}=-1$ if the edge $e_j$ leaves vertex $v_i$, 1 if it enters vertex $v_i$ and 0 otherwise (these are sign conventions and ofcourse there is no unanimity!).

In the simple graph the number of nodes n is 4. The number of edges m is 3 (here the labels enumerate the edges, are not the weights!). So the matrix $B$ is a 4 x 3 matrix and looks (in general) as follows:

$$
B =
\begin{bmatrix}
b_{11} & b_{12} & b_{13} \\
b_{21} & b_{22} & b_{23} \\
b_{31} & b_{32} & b_{33} \\
b_{41} & b_{42} & b_{43}

\end{bmatrix}
$$

and the specific values given the graph edges look like:

$$ B = \begin{bmatrix} -1 & 0 & 0 \\ 1 & -1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} $$

Incidence matrices are useful because they can easily represent multi-graphs (graphs where there are multiple edges between nodes). Sometimes it is useful to further decompose the graph information into an out-vertex incidence matrix $B_{out}$ (which collects the negative values) and an in-vertex incidence matrix $B_{in}$ (which collects all positive values). Clearly the above example can be constructed as the element-wise of these two decompositions $B = B_{out} + B_{in}$.

On the other hand, notice that the self-loop is not captured in the incidence matrix as it does not connect two distinct nodes. That is why another matrix that is used in connection with graphs is the adjacency matrix.

### Adjacency Matrices

An Adjacency Matrix is a boolean matrix (composed exclusively of zeros and ones) representing the connectivity between nodes in a graph. More precisely, for a graph with n nodes it is an n x n matrix whose (i,j)-th entry $A_{ij}$ is 1, if vertex $v_i$ is connected with an edge to vertex $v_j$ and 0 otherwise.

For our example graph we have

$$ A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} $$

so for example the element $a_{12}$ in the first row and second column indicates that there is in edge from node $v_1$ to node $v_2$. Notice that even though the adjacency matrix is *square* (its two dimensions coincide), for a directed graph will not in general be *symmetric*.

The adjacency and incidence matrices are obviously related (but we need to be careful about self-loops!). In the notation we used here we have via matrix multiplication:

$$ A = - B_{out} B_{in}^{T} + S $$

where $B^{T}$ denotes matrix transpose and $S$ would be a diagonal n x n matrix that keeps track of self-loops.

### In-degrees and Out-degrees

For each node $v_i$, the **out-degree** and the **in-degree** are defined by, respectively as the number of edges leaving or entering a node. In terms of the adjacency matrix and incidence matrix these are given respectively by the row-sum and column-sum expressions:

$$ \begin{align} d_{out, i} & = \sum_{j=1, i \neq j}^n a_{ij} = - \sum_{j=1}^n b_{out, ij} \\ d_{in, i} & = \sum_{j=1, i \neq j}^n a_{ji} = \sum_{j=1}^n b_{in, ij} \end{align} $$

These values can be combined into n x n out and in-degree diagonal matrices $D_{out}, D_{in}$. NB: Notice again that care must be exercise to keep proper count of self-loops.

### Breadth First Search (BFS)

A fundamental operation involving a graph is *breadth first search* (BFS). A breadth-first-search algorithm returns the *set* of all nodes reachable from node i, following in the direction of the links (edges) of the graph. The corresponding matrix operation is to multiply the adjacency matrix $A$ of the graph by a vector $u$ of size n that has a single entry (unity) corresponding to the node i.

$$ S_k = \sum_{k , k \neq l} a_{kl} u_l $$

The resulting vector $S_k$ will have values unity to indicate the nodes $v_j$ that are linked to the node $i$. For example if we use $u = (1, 0, 0, 0)$ to select node 1 in our simple graph, the outcome will be $S = (0, 1, 0, 0)$, indicating that node $v_1$ is connected to node $v_2$.

That much we knew! But proceeding with the next iterations of the BFS, finding the set of *once-removed* nodes from a given node is achieved with a repeat multiplication of the adjacency matrix.

$$ S^2_{k} = \sum_{k, k \neq l} a_{kl} S_l $$

The repeat application amounts to computing the matrix - vector product $S^2 = A^2 u$. In our example case we will obtain $S^2 = (0, 0, 1, 0)$, which captures the fact that $v_3$ is a node reachable from node $v_1$ in two steps (via node $v_2$).

### The Reachability matrix

If we keep applying the procedure we can obtain the final collection of nodes from the infinite sum.

$$ R = (I + A + A^2 + A^3 + A^4 + \ldots) u $$

There is one catch: we need to control for nodes that have been already flagged as connected from the previous round. This can be done computationally in various ways, e.g. by applying a max function or an “if” condition.^{1}

## Qualitative Input-Output Analysis

We now know enough to discuss the first linkage of Input-Output Economic models with Graph Theory which goes the name of **Qualitative Input-Output analysis** (QIOA) ^{2}. But before we do that we need for completeness to introduce the basics of Input-Output analysis.

### The Input-Output Matrix

The Industry Transaction Matrix
(or Transactions Table) is the fundamental quantitative information used in Input-Output Analysis. It concerns the flow of *products* from each industrial *sector* (considered as a producer) to each of the other sectors (considered as consumers) and potentially also itself.

Usually denoted as Z, if there are n sectors in an economy the matrix reads:

$$ \begin{align} Z & = \begin{pmatrix} Z_{11} & Z_{12} & \cdots & Z_{1n} \\ Z_{21} & Z_{22} & \cdots & Z_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ Z_{n1} & Z_{n2} & \cdots & Z_{nn} \end{pmatrix} \end{align} $$

The entries $Z_{ij}$ of the matrix may denote either monetary values (in some defined currency) or physical (activity) values, e.g. volumes of production. The matrix is a flow matrix. The values represent activity during a particular time period.

### A Stylized Example with Two Sectors

(This example is from the online Open Risk Academy course Introduction to Input-Output Models using Python.)

Practically every corner of the modern economy involves various forms of *supply chains*. Namely, the delivery of goods and services to the final consumers requires the interaction of independent economic actors who specialize in a particular production area and depend on others to deliver a number of essential components of their final output.

For example, almost all economic activity depends to some degree on *energy* and most economic agents do not procure their own energy requirements but outsource that requirement to others. Imagine, therefore, that we have a “Food” sector F and an “Energy” sector E. Producing energy requires some energy and some food. Producing food also requires some energy and some food - but in different proportions. (The relative proportions are not particularly relevant the numbers are all artificial).

Finally, demand for food and energy from non-producing “consumers” might also be variable (e.g. different types of populations denoted as G and A respectively, which consumer different proportions of energy and food). Pictorially this is presented as follows:

The Transactions Matrix in this case is a 2 x 2 matrix Z:

$$ Z = \begin{bmatrix} 200 & 100 \\ 80 & 50 \end{bmatrix} $$

What do the numbers mean? This requires some interpretation. The typical way IO models are setup is that the numbers express the *monetary values* of the products produced (in some currency units).

The ultimate *consumption* of energy and food is captured as the matrix of Final Demand
.

$$ F = \begin{bmatrix} 300 & 100 \\ 200 & 150 \end{bmatrix} $$

In graph terms the “finality” of consumption is captured by nodes that have outdegree zero (only incoming edges). The adjacency matrix of this graph is a 6 x 6 matrix:

$$ A = \begin{bmatrix} 200 & 100 & 300 & 100 & 0 & 0\\ 80 & 50 & 0 & 0 & 200 & 150 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} $$

The large number of zeros reflects the fact that we have many nodes that are sinks.

### From a Table to a Model

The above numbers are simply a *catalog* of exchanges. In Input-Output analysis one talks about **models**. A model involves a postulated structure of a system, which may for example have predictive power about how it might behave in the future (and is therefore falsifiable). We will not dwell deeply into the economic theories underpinning IO models (See the Miller reference) but the core exercise is simple enough:

We start with the Total Output vector $x$ which denotes the total production of a particular sector (in monetary terms). This is simply the sum of everything the sector produces, for itself, for other sectors and for final consumption. Let $z_{ij}$ be the elements of the n x n matrix Z of intermediate demands of the j-th node (for j=1,2,…, n) from the i-th node (for i=1,2,…, n) and $y_i$ be an element of a n x l vector y of final demands from the i-th node. Then the total output $x_i$ of the i-th sector can then be written as

$$ x_i = \sum_j z_{ij} + \sum_l y_{il} $$

In our example we have $x = (700, 480)$, indicating 700 total accounting units of Food and 480 units of energy.

In terms of the graph representation we see those values are simply the sum of the values of the weighted edges that emanate from the two sectors (the diagonal out-degree matrix $D_{out}$).

With the total output at hand, we can formulate our simple IO model. This is based on first dividing “row-wise” all figures by the total output vector. In terms of the graph this is equivalent to *normalizing* the individual values (weights) of the adjacency matrix by the out-degree matrix.

### The Technical Coefficient Matrix

The Technical Coefficient Matrix is often in IO literature denoted as the A matrix and it is given by multiplication of Z with the diagonalised and inverted total industry output x (restricted to the non-zero rows). In order not to confuse with the adjacency matrix, we will denote is a $T$:

$$ \begin{align} T & = Z\hat{x}^{-1} = Z D_{out, r}^{-1}\\ T_{ij} & = \frac{Z_{ij}}{x_j} \end{align} $$

In our example this leads to:

$$ T = \begin{bmatrix} 0.286 & 0.208 \\ 0.114 & 0.104 \end{bmatrix} $$

Which allows us to formulate a “Leontief style” model as the *linear system*:

$$ x_i = \sum_j \tau_{ij} x_j + \sum_l y_{il} $$

Much of IO analysis focuses on analysing solutions to this system for different $y$ and exploring the structure of $A$. But before we turn to solutions, let us complete the description in terms of graphs and in particular introduce balance equations and conservation laws.

## Sources, Sinks and Conservation Laws

Discussing sources, sinks and conservation laws in graph theory is usually done in the context of **network flows**. In graph theory, a flow network (also known as a transportation network) is a directed graph where each edge has a capacity and each edge receives a flow (its weight). A flow must satisfy the restriction that *the amount of flow into a node equals the amount of flow out of it, unless it is a source, which has only outgoing flow, or sink, which has only incoming flow*. (NB: in typical formulations of flow problems the amount of flow on an edge cannot also not exceed the capacity of the edge but we ignore this constraint here.)

We already defined the total output $x_i$ of the i-th sector as

$$ x_i = \sum_j z_{ij} + y_i $$

Let $v_j$ be an element of the n x 1 vector $v$ of Value Added to each sector. The value added is defined as the difference between total sector output $x$ and intermediate consumption as implied by $Z$. In the context of Input-Output Analysis, Value Added is an additional row added to Input-Output Matrix to account for other (non-industrial) inputs to production, such as labor, depreciation of capital, indirect business taxes, and imports. As total inputs and outputs of a production sector must be equal, total inputs $x_i$ of the i-th sector can be written as the column sum:

$$ x_i = \sum_j z_{ji} + v_i $$

In terms of our graph representation the above decompositions are expressed naturally in terms of **sink and source nodes**, namely nodes that have respectively out-degrees and in-degrees equal to zero. The intuition is that what goes into a node i from sources nodes and other nodes is equal to what goes to other nodes and sink nodes.

### Balance Equations (Kirchhoff’s Law)

The balance equation for node i reads

$$ v_i + \sum_j z_{ji} = \sum_j z_{ij} + y_i $$

This equation is similar to Kirchhoff’s Law for electic circuits, where it reflects the *conservation of charge*.

## Solving the IO Model

We now discuss and interpret in graph terms the typical question one wants to answer with an IO model: what happens if there is new set of final demands? As we saw, final demands are “sink” nodes that absorb production. We might ask, e.g., in our simple example what happens in the demand nodes (G, A) change their demand of energy / food. Clearly the production sectors (E, F) will need to rebalance in response, but given production is interconnected how exactly do we get the new configuration of weights?

In other words, we have a new demand matrix $Y_1$ and we want to find the new output vector $x_1$:

$$ x_{1, i} = \sum_j \tau_{ij} x_{1, j} + \sum_l y_{1, il} $$

This is ofcourse a classic linear system problem that requires, one way or another, to invert a matrix. Which brings us to the famous Leontief Inverse.

### The Leontief Matrix and Graph Laplacians

The Leontief Matrix comes about by rearranging the above as:

$$ x_{i} - \sum_j \tau_{ij} x_{j} = \sum_j (\delta_{ij} - \tau_{ij}) x_{j} = \sum_l y_{il} $$

what is the meaning of $L = I - T$ in graph terms? The **Graph Laplacian** $L_p$, the central object in graph theory is defined in terms of the outdegree matrix and the weighted adjacancy matrix as:

$$ L_{p} = D_{out} - A $$

keeping in mind that the outdegree of sink nodes is zero and $D_{out}$ is diagonal, we have

$$ L_{p} = \begin{bmatrix} D_{out, r} - Z & F \\ 0 & 0 \end{bmatrix} $$

or

$$ L_{p} = \begin{bmatrix} D_{out, r} (I - T) & F \\ 0 & 0 \end{bmatrix} $$

where $D_{out, r}$ has dimension equal with the rank of D. Thus, the Leontief Matrix is the *submatrix* of a normalized graph Laplacian that corresponds to the interconnected production sectors:

$$ L = L_{p,r} D_{out, r}^{-1} $$

### The Leontief Inverse

Solving the IO model involves finding the Leontief Inverse $L^{-1}$. It is well known that The Leontief inverse can be expressed as an Power Series Approximation (an infinite sum of matrix multiplications):

$$ L = (I - T)^{-1} = (I + T + T^2 + T^3 + T^4 \ldots) $$

You would probably not be surprised by now that this expression is intimately linked with the Reachability formula we discussed above. The economic interpretation is that if we have new demand vector $y$, the required total production vector for each sector can be approximated by first considering direct demands, then the impact of demands to first neighboring sectors, then second neighbors and so forth.