Representing a Sparse Matrix as a JSON object is a task that appears in many modern data science contexts. While there is no universally agreed way to achieve this task, in this post we discuss a number of options and the associated tradeoffs.
Recap of Part 1 of the Matrix-to-JSON Post Series In the first installment of this series, Part 1 we discussed the motivation behind representing and serializing matrices as JSON objects. We defined relevant concepts and in particular the concept of unrolling the matrix into a one-dimensional array and the notion of Column and Row Major orders. We outlined some use cases of interest and initiated a benchmarking exercise that looks into various R and Python JSON serialization utilities (available at the matrix2json repository).
Course Objective The objective of the course is to provide an introduction to using Eigen::Tensor as a high-level library for using Tensors in C++ projects.
We learn the concept and techniques of the Eigen Tensor class How to declare, initialize Tensors of various ranks and types and how to access Tensor elements Elementary unary and binary operations involving Tensors More complex operations (reductions, contractions) Modifying the shape of Tensors The course is now live at the Academy, the github repository hosts C++ scripts used in the course.
What do we mean by credit data? This post is a discussion around mathematical terminology and concepts that are useful in the context of working with credit data, taking us from network graph representations of credit systems to commonly used reference data sets
Course Objective Digging into the meaning of credit data collections, the logic that binds them together towards understanding what they can be used for and what limitations and issues they may be affected by, this new course in the Credit Portfolio Management category explores a new angle to look at an old practice.
The course is now live at the Academy.
Pre-requisites Familiarity with credit provision in general (lending products, banking processes and credit risk) is required for getting the most out of the course.
We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools. In the process we identify the imprint of the pandemic on attendance, the longest ever title, the distribution of mindshare of time and some notable newcomers.
FOSDEM is a non-commercial, volunteer-organized, two-day conference celebrating free and open-source software development. The conference has a geographic focus on European open source ecosystems and projects. FOSDEM is primarily aimed at developers, across the entire range of software and aims to enable them to meet and discuss the status of projects.
We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools.
Representing a matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online. While there is no universally agreed way to achieve this task in all circumstances, in this series of posts we discuss a number of options and the associated tradeoffs.
Motivation and Objective Representing a Matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online in a portable manner. There is no universally agreed way to achieve this task and various options are available depending on the matrix data characteristics and the programming tools and computational environment one has available.
Matrices are not, in general, native structures in general purpose computing environments.
Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but don't travel well outside computer memory. When considering data science tasks and objectives the transition from object hierarchies to data structures (and vice versa) is not always straightforward. In this short course we explore how some programming languages, data formats, database API's and web frameworks handle hierarchical classes.
Summary In this short course we explore how some programming languages, data formats, database API’s and web frameworks handle hierarchical classes.
Content Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but alas don’t “travel well” outside computer memory. The potentially intricate relationships of objects (both the data they hold and the meaning and possible uses of the data) are not easy to transfer (except of-course by full replication of code and data).
The GSOC 2021 collaboration between Open Risk and the Hydra Ecosystem - Project Wrap-Up Google Summer of Code 2021 came and went amid the still ongoing worldwide pandemic experience. Open Risk was happy to join forces with the Hydra Ecosystem in exploring a proof-of-concept for next generation API’s using Hydra.
The project aimed to guide students (here and here) to build a hypermedia enabled REST service that can serve standardized credit portfolio data.
For the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem to mentor a student project that aims to build a hypermedia enabled REST service around standardized credit portfolio data
A GSOC 2021 summer project collaboration between Open Risk and the Hydra Ecosystem Summer is underway and for the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem. The project aims to guide students to build a hypermedia enabled REST service around standardized credit portfolio data. More specifically the project will build a REST service as backend for a hypothetical banking entity that collects and disseminates credit portfolio data conforming to an established public standard (the EBA NPL templates, see below).
The Risk Function Ontology The Risk Function Ontology is a framework that aims to represent and categorize knowledge about risk management functions using semantic web information technologies. Codenamed RFO codifies the relationship between the various components of a risk management organization. Individuals, teams or even whole departments tasked with risk management exist in some shape or form in most organizations. The ontology allows the definition of risk management roles in more precise terms, which in turn can be used in a variety of contexts: towards better structured actual job descriptions, more accurate description of internal processes and easier inspection of alignement and consistency with risk taxonomies.
Making Open Risk Data easier In an earlier blog post we discussed the promise of Open Risk Data and how the widespread availability of good information that is relevant for risk management can substantially help mitigate diverse risks.
The list of Open Risk Data providers, particularly from public sector, keeps increasing and we are aiming to document all available datasets in the dedicated page of the Open Risk Manual.
The trailblazing Wikidata project In this post we want to introduce another facility, an online database that allows the (relatively) easy publication of structured risk data.
Semantic Web Technologies The Risk Model Ontology is a framework that aims to represent and categorize knowledge about risk models using semantic web information technologies.
In principle any semantic technology can be the starting point for a risk model ontology. The Open Risk Manual adopts the W3C’s Web Ontology Language (OWL). OWL is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things.
We introduce a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, the trio sometimes abbreviated as Jupyter
Overview of the Julia-Python-R Universe A new Open Risk Manual entry offers a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.
Motivation A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science ). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.
Data Quality and Exploratory Data Analysis using Python In two new Open Risk Academy courses we figure step by step how to use python to work to review risk data from a data quality perspective and how to perform exploratory data analysis with pandas, seaborn and statsmodels:
Introduction to Risk Data Review Exploratory Data Analysis using Pandas, Seaborn and Statsmodels
Data Scientists Have No Future The working definition of a Data Scientist seems to be in the current overheated environment:
doing whatever it takes to get the job done in a digital #tech domain that we have long neglected but which is now coming back to haunt us! That is nice urgency while it lasts, but it is not a serious job description for the future.
You will always find entrepreneurial institutions to offer degrees and certifications on the latest trending hashtag.
The Promise of Open Risk Data There is a legend that every time a data set is released into the open, somewhere dies a black swan Well, it is not a true legend. Legends take centuries of oral storytelling to form. In our frantic age, dominated by the daily news cycle and viral twitter storms, legends have been replaced by the rather more short-lived memes and #hashtags.
Black Swans need no introductions The whole informal theory of black swans concerns improbable events (low likelihood events) that come as a nasty surprise and have large impact.
Accounting probably would not count among the more glamorous of professions. The reasons for that status and whether it is justified are beyond the scope of this brief commentary.
What is interesting to note, though, is that the relative attractiveness of accounting is arguably improving, driven by a number of systemic societal developments:
the need for more proactive assessment of the state of the world, eliminating the infamous “rear-view mirror” pathology.
What Inka quipus teach us about data management Chances are that your knowledge of ancient Peruvian culture is a bit rusty. Maybe you have some vague high-school memories of an extensive but backward empire that was conquered and then asset-stripped by a handful of Spanish conquistadores. Or maybe your best preserved memory is the excitement of reading von Daniken’s speculations that the Nazca lines are extraterrestrial spaceports. But unless you happened at some point later in life to hear about the work of Prof.
Open Source Risk Data with MongoDB and Python Open source software is all the rage those days in IT and the concept is making rapid inroads in all parts of the enterprise. An earlier comprehensive survey by Gartner, Inc. found that by 2011 more than half of organizations surveyed had adopted open-source software (OSS) solutions as part of their IT strategy. This percentage may have currently exceeded the 75% mark according to open source advisory firms.
Open Risk API If you work in financial risk management you will most likely recognize where the following sentence is coming from:
One of the most significant lessons learned from the global financial crisis that began in 2007 was that banks information technology (IT) and data architectures were inadequate to support the broad management of financial risks. This had severe consequences to the banks themselves and to the stability of the financial system as a whole For those lucky few risk managers not being affected by inadequate IT systems, the excerpt is from the Basel Committee’s Principles for effective risk data aggregation and risk reporting (2013).
Open Risk White Paper 3: Introducing the Open Risk API We develop a proposal for an open source application programming interface (API) that allows for the distributed development, deployment and use of financial risk models. The proposal aims to explore the following key question: how to integrate in a robust and trustworthy manner diverse risk modeling and risk data resources, contributed by multiple authors, using different technologies, and which very likely will evolve over time.