Data Science

Representing Matrices as JSON Objects: Part 2 - Sparse Matrices

Representing Matrices as JSON Objects: Part 2 - Sparse Matrices

Representing a Sparse Matrix as a JSON object is a task that appears in many modern data science contexts. While there is no universally agreed way to achieve this task, in this post we discuss a number of options and the associated tradeoffs.

Reading Time: 11 min.

Recap of Part 1 of the Matrix-to-JSON Post Series

In the first installment of this series, Part 1 we discussed the motivation behind representing and serializing matrices as JSON objects. We defined relevant concepts and in particular the concept of unrolling the matrix into a one-dimensional array and the notion of Column and Row Major orders. We outlined some use cases of interest and initiated a benchmarking exercise that looks into various R and Python JSON serialization utilities (available at the matrix2json repository).

Mathematical Representations of Credit Portfolio Data

Mathematical Representations of Credit Portfolio Data

What do we mean by credit data? This post is a discussion around mathematical terminology and concepts that are useful in the context of working with credit data, taking us from network graph representations of credit systems to commonly used reference data sets

Reading Time: 1 min.

Course Objective

Digging into the meaning of credit data collections, the logic that binds them together towards understanding what they can be used for and what limitations and issues they may be affected by, this new course in the Credit Portfolio Management category explores a new angle to look at an old practice.

Exploring Ten Years of FOSDEM talks

Exploring Ten Years of FOSDEM talks

We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools. In the process we identify the imprint of the pandemic on attendance, the longest ever title, the distribution of mindshare of time and some notable newcomers.

Reading Time: 12 min.

FOSDEM is a non-commercial, volunteer-organized, two-day conference celebrating free and open-source software development. The conference has a geographic focus on European open source ecosystems and projects. FOSDEM is primarily aimed at developers, across the entire range of software and aims to enable them to meet and discuss the status of projects.

Representing Matrices as JSON Objects: Part 1 - General Considerations

Representing Matrices as JSON Objects: Part 1 - General Considerations

Representing a matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online. While there is no universally agreed way to achieve this task in all circumstances, in this series of posts we discuss a number of options and the associated tradeoffs.

Reading Time: 17 min.

Motivation and Objective

Representing a Matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online in a portable manner. There is no universally agreed way to achieve this task and various options are available depending on the matrix data characteristics and the programming tools and computational environment one has available.

Class Inheritance in Data Science

Class Inheritance in Data Science

Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but don't travel well outside computer memory. When considering data science tasks and objectives the transition from object hierarchies to data structures (and vice versa) is not always straightforward. In this short course we explore how some programming languages, data formats, database API's and web frameworks handle hierarchical classes.

Reading Time: 3 min.

Summary

In this short course we explore how some programming languages, data formats, database API’s and web frameworks handle hierarchical classes.

Content

Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but alas don’t “travel well” outside computer memory. The potentially intricate relationships of objects (both the data they hold and the meaning and possible uses of the data) are not easy to transfer (except of-course by full replication of code and data). Hence when considering data science tasks and objectives that involving exchange of data, the transition from object hierarchies that live inside memory, to data structures that can be exchanged with another computer is not straightforward.

Open Risk Hydra GSOC 2021 Credit Risk Project Wrap Up

Open Risk Hydra GSOC 2021 Credit Risk Project Wrap Up

Reading Time: 5 min.

NPLO Visualization

The GSOC 2021 collaboration between Open Risk and the Hydra Ecosystem - Project Wrap-Up

Google Summer of Code 2021 came and went amid the still ongoing worldwide pandemic experience. Open Risk was happy to join forces with the Hydra Ecosystem in exploring a proof-of-concept for next generation API’s using Hydra.

Open Risk Mentoring GSOC 2021 Hydra Nextgen API Project

Open Risk Mentoring GSOC 2021 Hydra Nextgen API Project

For the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem to mentor a student project that aims to build a hypermedia enabled REST service around standardized credit portfolio data

Reading Time: 4 min.

NPLO Visualization

A GSOC 2021 summer project collaboration between Open Risk and the Hydra Ecosystem

Summer is underway and for the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem. The project aims to guide students to build a hypermedia enabled REST service around standardized credit portfolio data. More specifically the project will build a REST service as backend for a hypothetical banking entity that collects and disseminates credit portfolio data conforming to an established public standard (the EBA NPL templates, see below).

Risk Function Ontology

Risk Function Ontology

The Risk Function Ontology (RFO) is a new ontology describing risk management roles (posts) and functions.

Reading Time: 3 min.

RFO Visualization

The Risk Function Ontology

The Risk Function Ontology is a framework that aims to represent and categorize knowledge about risk management functions using semantic web information technologies. Codenamed RFO codifies the relationship between the various components of a risk management organization. Individuals, teams or even whole departments tasked with risk management exist in some shape or form in most organizations. The ontology allows the definition of risk management roles in more precise terms, which in turn can be used in a variety of contexts: towards better structured actual job descriptions, more accurate description of internal processes and easier inspection of alignement and consistency with risk taxonomies. See also live version and the white paper OpenRiskWP04_061415.

Making Open Risk Data easier

Making Open Risk Data easier

We introduce an online database that allows the (relatively) easy publication of structured risk data

Reading Time: 1 min.

Making Open Risk Data easier

In an earlier blog post we discussed the promise of Open Risk Data and how the widespread availability of good information that is relevant for risk management can substantially help mitigate diverse risks.

The list of Open Risk Data providers, particularly from public sector, keeps increasing and we are aiming to document all available datasets in the dedicated page of the Open Risk Manual.

Risk Model Ontology

Risk Model Ontology

Reading Time: 2 min.

Semantic Web Technologies

DOAM Graph

The Risk Model Ontology is a framework that aims to represent and categorize knowledge about risk models using semantic web information technologies.

In principle any semantic technology can be the starting point for a risk model ontology. The Open Risk Manual adopts the W3C’s Web Ontology Language (OWL). OWL is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can be exploited by computer programs, e.g., to verify the consistency of that knowledge or to make implicit knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide Web and may refer to or be referred from other OWL ontologies. OWL is part of the W3C’s Semantic Web technology stack, which includes RDF, RDFS, SPARQL, etc

Overview of the Julia-Python-R Universe

Overview of the Julia-Python-R Universe

We introduce a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, the trio sometimes abbreviated as Jupyter

Reading Time: 3 min.

Overview of the Julia-Python-R Universe

Jupyter

A new Open Risk Manual entry offers a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.

Motivation

A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science ). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.

Data Quality and Exploratory Data Analysis using Python

Data Quality and Exploratory Data Analysis using Python

Reading Time: 0 min.

Data Quality and Exploratory Data Analysis using Python

In two new Open Risk Academy courses we figure step by step how to use python to work to review risk data from a data quality perspective and how to perform exploratory data analysis with pandas, seaborn and statsmodels:

Data Scientists Have No Future

Data Scientists Have No Future

Reading Time: 1 min.

Data Scientists Have No Future

Data Scientists Have No Future

The working definition of a Data Scientist seems to be in the current overheated environment:

doing whatever it takes to get the job done in a digital #tech domain that we have long neglected but which is now coming back to haunt us!

The Promise of Open Risk Data

The Promise of Open Risk Data

Reading Time: 3 min.

The Promise of Open Risk Data

There is a legend that every time a data set is released into the open, somewhere dies a black swan

Black Swan

Well, it is not a true legend. Legends take centuries of oral storytelling to form. In our frantic age, dominated by the daily news cycle and viral twitter storms, legends have been replaced by the rather more short-lived memes and #hashtags.

Can accounting ever be sexy? From IFRS 9 to Sustainability

Can accounting ever be sexy? From IFRS 9 to Sustainability

Reading Time: 1 min.

Accounting probably would not count among the more glamorous of professions. The reasons for that status and whether it is justified are beyond the scope of this brief commentary.

What is interesting to note, though, is that the relative attractiveness of accounting is arguably improving, driven by a number of systemic societal developments:

What Inka quipus teach us about data management

What Inka quipus teach us about data management

Reading Time: 3 min.

What Inka quipus teach us about data management

Inka Quipu

Chances are that your knowledge of ancient Peruvian culture is a bit rusty. Maybe you have some vague high-school memories of an extensive but backward empire that was conquered and then asset-stripped by a handful of Spanish conquistadores. Or maybe your best preserved memory is the excitement of reading von Daniken’s speculations that the Nazca lines are extraterrestrial spaceports. But unless you happened at some point later in life to hear about the work of Prof. Urton or his collaborators, most likely you have no idea what a quipu is (see image above).

Open Source Risk Data with MongoDB and Python

Open Source Risk Data with MongoDB and Python

Reading Time: 3 min.

Open Source Risk Data with MongoDB and Python

Swiss Knife

Open source software is all the rage those days in IT and the concept is making rapid inroads in all parts of the enterprise. An earlier comprehensive survey by Gartner, Inc. found that by 2011 more than half of organizations surveyed had adopted open-source software (OSS) solutions as part of their IT strategy. This percentage may have currently exceeded the 75% mark according to open source advisory firms.

Open Risk API

Open Risk API

Reading Time: 3 min.

Open Risk API

Components_Diagram

If you work in financial risk management you will most likely recognize where the following sentence is coming from:

One of the most significant lessons learned from the global financial crisis that began in 2007 was that banks information technology (IT) and data architectures were inadequate to support the broad management of financial risks. This had severe consequences to the banks themselves and to the stability of the financial system as a whole

03, Introducing the Open Risk API

03, Introducing the Open Risk API

Reading Time: 1 min.

Open Risk White Paper 3: Introducing the Open Risk API

Linked Models

We develop a proposal for an open source application programming interface (API) that allows for the distributed development, deployment and use of financial risk models. The proposal aims to explore the following key question: how to integrate in a robust and trustworthy manner diverse risk modeling and risk data resources, contributed by multiple authors, using different technologies, and which very likely will evolve over time.