Definition of Credit Data What do we mean by credit data? For our purposes Credit Data is any well-defined dataset that has direct applications in the assessment of the Credit Risk of an individual or an organization, or, more generally, a dataset that allows the application of data driven Credit Portfolio Management policies. The appearance of credit data is quite familiar to practitioners: A spreadsheet, or a table in a database, with a number of columns and rows full of all sorts of information about borrowers and loans.
FOSDEM is a non-commercial, volunteer-organized, two-day conference celebrating free and open-source software development. The conference has a geographic focus on European open source ecosystems and projects. FOSDEM is primarily aimed at developers, across the entire range of software and aims to enable them to meet and discuss the status of projects. We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools.
Motivation and Objective Representing a matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online. There is no universally agreed way to achieve this task and various options are available depending on the matrix type and the programming tools and environment one has available. Matrices are in general not “native” structures in computing environments but are handled with speficic packages (modules, extensions or libraries).
Summary In this short course we explore how some programming languages, data formats, database API’s and web frameworks handle hierarchical classes. Content Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but alas don’t “travel well” outside computer memory. The potentially intricate relationships of objects (both the data they hold and the meaning and possible uses of the data) are not easy to transfer (except of-course by full replication of code and data).
The GSOC 2021 collaboration between Open Risk and the Hydra Ecosystem - Project Wrap-Up Google Summer of Code 2021 came and went amid the still ongoing worldwide pandemic experience. Open Risk was happy to join forces with the Hydra Ecosystem in exploring a proof-of-concept for next generation API’s using Hydra. The project aimed to guide students (here and here) to build a hypermedia enabled REST service that can serve standardized credit portfolio data.
A GSOC 2021 summer project collaboration between Open Risk and the Hydra Ecosystem Summer is underway and for the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem. The project aims to guide students to build a hypermedia enabled REST service around standardized credit portfolio data. More specifically the project will build a REST service as backend for a hypothetical banking entity that collects and disseminates credit portfolio data conforming to an established public standard (the EBA NPL templates, see below).
The Risk Function Ontology The Risk Function Ontology is a framework that aims to represent and categorize knowledge about risk management functions using semantic web information technologies. Codenamed RFO codifies the relationship between the various components of a risk management organization. Individuals, teams or even whole departments tasked with risk management exist in some shape or form in most organizations. The ontology allows the definition of risk management roles in more precise terms, which in turn can be used in a variety of contexts: towards better structured actual job descriptions, more accurate description of internal processes and easier inspection of alignement and consistency with risk taxonomies (See also live version and white paper (OpenRiskWP04_061415)
Making Open Risk Data easier In an earlier blog post we discussed the promise of Open Risk Data and how the widespread availability of good information that is relevant for risk management can substantially help mitigate diverse risks. The list of Open Risk Data providers, particularly from public sector, keeps increasing and we are aiming to document all available datasets in the dedicated page of the Open Risk Manual. The trailblazing Wikidata project In this post we want to introduce another facility, an online database that allows the (relatively) easy publication of structured risk data.
Semantic Web Technologies The Risk Model Ontology is a framework that aims to represent and categorize knowledge about risk models using semantic web information technologies. In principle any semantic technology can be the starting point for a risk model ontology. The Open Risk Manual adopts the W3C’s Web Ontology Language (OWL). OWL is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things.
Overview of the Julia-Python-R Universe A new Open Risk Manual entry offers a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter. Motivation A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science ). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.
Data Quality and Exploratory Data Analysis using Python In two new Open Risk Academy courses we figure step by step how to use python to work to review risk data from a data quality perspective and how to perform exploratory data analysis with pandas, seaborn and statsmodels: Introduction to Risk Data Review Exploratory Data Analysis using Pandas, Seaborn and Statsmodels
Data Scientists Have No Future The working definition of a Data Scientist seems to be in the current overheated environment: doing whatever it takes to get the job done in a digital #tech domain that we have long neglected but which is now coming back to haunt us! That is nice urgency while it lasts, but it is not a serious job description for the future. You will always find entrepreneurial institutions to offer degrees and certifications on the latest trending hashtag.
There is a legend that every time a data set is released into the open, somewhere dies a black swan The Promise of Open Risk Data Well, it is not a true legend. Legends take centuries of oral storytelling to form. In our frantic age, dominated by the daily news cycle and viral twitter storms, legends have been replaced by the rather more short-lived memes and #hashtags. Black Swans need no introductions The whole informal theory of black swans concerns improbable events (low likelihood events) that come as a nasty surprise and have large impact.
Accounting probably would not count among the more glamorous of professions. The reasons for that status and whether it is justified are beyond the scope of this brief commentary. What is interesting to note, though, is that the relative attractiveness of accounting is arguably improving, driven by a number of systemic societal developments: the need for more proactive assessment of the state of the world, eliminating the infamous “rear-view mirror” pathology.
What Inka quipus teach us about data management Chances are that your knowledge of ancient Peruvian culture is a bit rusty. Maybe you have some vague high-school memories of an extensive but backward empire that was conquered and then asset-stripped by a handful of Spanish conquistadores. Or maybe your best preserved memory is the excitement of reading von Daniken’s speculations that the Nazca lines are extraterrestrial spaceports. But unless you happened at some point later in life to hear about the work of Prof.
Open Source Risk Data with MongoDB and Python Open source software is all the rage those days in IT and the concept is making rapid inroads in all parts of the enterprise. An earlier comprehensive survey by Gartner, Inc. found that by 2011 more than half of organizations surveyed had adopted open-source software (OSS) solutions as part of their IT strategy. This percentage may have currently exceeded the 75% mark according to open source advisory firms.
Open Risk API If you work in financial risk management you will most likely recognize where the following sentence is coming from: One of the most significant lessons learned from the global financial crisis that began in 2007 was that banks information technology (IT) and data architectures were inadequate to support the broad management of financial risks. This had severe consequences to the banks themselves and to the stability of the financial system as a whole For those lucky few risk managers not being affected by inadequate IT systems, the excerpt is from the Basel Committee’s Principles for effective risk data aggregation and risk reporting (2013).
Open Risk White Paper 3: Introducing the Open Risk API We develop a proposal for an open source application programming interface (API) that allows for the distributed development, deployment and use of financial risk models. The proposal aims to explore the following key question: how to integrate in a robust and trustworthy manner diverse risk modeling and risk data resources, contributed by multiple authors, using different technologies, and which very likely will evolve over time.
Risk modeling is as much art as it is science The Zen of Modeling aims to capture the struggle for risk modeling beauty An undocumented risk model is only a computer program A risk model that cannot be programmed is only a concept A risk model only comes to life with empirical validation Correct implementation of an imperfect model is better than wrong implementation of a perfect model In complex systems there is always more than one path to a risk model There are no persistently true models but there are many persistently wrong models Correlation is imperfectly correlated with causation Nirvana is the simplest model that is fit for purpose Hierarchical systems lead to hierarchical models.