Representing a Sparse Matrix as a JSON object is a task that appears in many modern data science contexts. While there is no universally agreed way to achieve this task, in this post we discuss a number of options and the associated tradeoffs.
Recap of Part 1 of the Matrix-to-JSON Post Series In the first installment of this series, Part 1 we discussed the motivation behind representing and serializing matrices as JSON objects. We defined relevant concepts and in particular the concept of unrolling the matrix into a one-dimensional array and the notion of Column and Row Major orders. We outlined some use cases of interest and initiated a benchmarking exercise that looks into various R and Python JSON serialization utilities (available at the matrix2json repository).
We are very happy to announce that our EU Datathon 2022 proposal based on the equinox platform has been pre-selected to enter the formal stage of the competition
What is the EU Datathon? The EU Datathon is an annual Open Data competition organised by the Publications Office of the European Union since 2017. The competitions are organised to create new value for citizens through innovation and promoting the use of open data, in particular the datasets available on the official portal for European data.
Every year, EU Datathon calls for innovators from around the world to come up with new ways of using open data to address important societal and environmental challenges, with the condition that they use at least one of the thousands of data sets published on data.
Equinox 0.4 Release Equinox is an open source platform that supports holistic risk management and reporting in the context of Sustainable Portfolio Management. The platform integrates geospatial information with applicable regulatory and industry standards, for example the GHG Protocol (accounting for Project based, Corporate and City-Wide greenhouse gas emissions), the IPCC Emissions Factor database and further reference data, the PCAF attribution methodologies (and more) to provide a holistic view of the footprint of both individual projects and portfolios.
What are Input-Output Models? Environmentally Extended Multi-Regional Input-Output (EE-MRIO) tables describe economic relationships of economic actors (e.g. industrial sectors) operating within and between regions and their environmental repercussions.
An EE MRIO augments the more basic and historically first proposed Input-Output Models (IO) with additional datasets and/or modeling assumptions in order to provide insights into the environmental foorprint of economic activity. Presently, the emphasis on negative externalities of economic activity (e.g., climate change, biodiversity loss) turns EE MRIO models into a useful conceptual and analytic tool.
We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools. In the process we identify the imprint of the pandemic on attendance, the longest ever title, the distribution of mindshare of time and some notable newcomers.
FOSDEM is a non-commercial, volunteer-organized, two-day conference celebrating free and open-source software development. The conference has a geographic focus on European open source ecosystems and projects. FOSDEM is primarily aimed at developers, across the entire range of software and aims to enable them to meet and discuss the status of projects.
We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools.
Representing a matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online. While there is no universally agreed way to achieve this task in all circumstances, in this series of posts we discuss a number of options and the associated tradeoffs.
Motivation and Objective Representing a Matrix as a JSON object is a task that appears in many modern data science contexts, in particular when one wants to exchange matrix data online in a portable manner. There is no universally agreed way to achieve this task and various options are available depending on the matrix data characteristics and the programming tools and computational environment one has available.
Matrices are not, in general, native structures in general purpose computing environments.
In the latest update of the Equinox Project we discuss the integration of reference data an in particular greenhouse gas emissions factors as catalogued in the IPCC Emissions Factors database (EFDB).
Equinox is an open source platform that supports the holistic risk management and reporting of major sustainable finance projects (the financing of projects with material physical footprint) such as project finance. Equinox aims to integrate in the database a number reference databases that facilitate tasks of sustainable portfolio management. In the current focus such reference material concerns the emissions factors for various processes and activities. In the latest (Solstice Day!
Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but don't travel well outside computer memory. When considering data science tasks and objectives the transition from object hierarchies to data structures (and vice versa) is not always straightforward. In this short course we explore how some programming languages, data formats, database API's and web frameworks handle hierarchical classes.
Summary In this short course we explore how some programming languages, data formats, database API’s and web frameworks handle hierarchical classes.
Content Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but alas don’t “travel well” outside computer memory. The potentially intricate relationships of objects (both the data they hold and the meaning and possible uses of the data) are not easy to transfer (except of-course by full replication of code and data).
The GSOC 2021 collaboration between Open Risk and the Hydra Ecosystem - Project Wrap-Up Google Summer of Code 2021 came and went amid the still ongoing worldwide pandemic experience. Open Risk was happy to join forces with the Hydra Ecosystem in exploring a proof-of-concept for next generation API’s using Hydra.
The project aimed to guide students (here and here) to build a hypermedia enabled REST service that can serve standardized credit portfolio data.
For the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem to mentor a student project that aims to build a hypermedia enabled REST service around standardized credit portfolio data
A GSOC 2021 summer project collaboration between Open Risk and the Hydra Ecosystem Summer is underway and for the Google Summer of Code 2021 season Open Risk is happy to join forces with the Hydra Ecosystem. The project aims to guide students to build a hypermedia enabled REST service around standardized credit portfolio data. More specifically the project will build a REST service as backend for a hypothetical banking entity that collects and disseminates credit portfolio data conforming to an established public standard (the EBA NPL templates, see below).
Equinox is an open source platform that supports holistic risk management and reporting of Sustainable Finance (Sustainable Portfolio Management). The platform integrates geospatial information with applicable regulatory and industry standards from EBA, PCAF and Equator Principles to provide a holistic view of the footprint of both individual projects and portfolios, in particular of project finance investments. Motivation Sustainability (understood in environmental, economic and social terms) is emerging as an undisputed constraint that will shape future human activity and more specifically how the financial system facilitates and empowers economic life.
Data Types are a fundamental building block of data science Data science is about data, but data are not simple and tame beasts. They have character and attitude, which can cause a lot of friction between them and the data scientist. There is a lot of sweat and tears involved when confronting data, but data scientists can do worse than know how to handle in particular Data Type quirks. Namely, a good fraction of data science involves not modelling data, not transforming data, not even cleaning data but simply goading data around the right containers, providing them with the right stage that fits their character.
Course Content This CrashCourse is an introduction to semantic data using Python. It covers the following topics:
We learn to work with RDF graphs using rdflib We explore the owlready package and OWL ontologies We look into json-ld serialization of RDF/OWL data We try data validation using pySHACL We use throughout a realistic data set based on the Credit Ratings Ontology Who Is This Course For The course is useful to:
Introduction What is FOSDEM? FOSDEM is a non-commercial, volunteer-organized event centered on free and open-source software development (with a geographic focus on the European open source ecosystems / projects). FOSDEM is aimed at developers and anyone interested in the free and open-source software movement. It aims to enable developers to meet and to promote the awareness and use of free and open-source software.
FOSDEM is held annually since 2001, usually during the first weekend of February, at the Université Libre de Bruxelles Solbosch campus in the southeast of Brussels, Belgium.
Sankey diagrams are very useful for the visualization of flows, especially when there is a conserved quantity. They can be tricky when some of the flows are much smaller than others. In the latest release of transitionMatrix we include an example of a log-scale version of Sankey
Using Sankey Diagrams Sankey Diagrams are a type of flow diagram composed of interconnected arrows. The width of the arrows is proportional to the flow rate. Sankey diagrams are often used in physical sciences (physics, chemistry, biology) and engineering but also in economics. They can be used to represent the relative role and significance of various inputs and outputs in a given process.
Sankey diagrams emphasize the major transfers within a system.
In the Back-to-School for 2020 we have more ways to access the Academy, new functionalities and more courses. In the rest of this post you will find a summary of the changes with pointers to further information where required
Risk Management will not be the same going forward: too much is at stake The summer is over in the Northern Hemisphere - and what an unusual summer has it been! Worldwide the implications and challenges of adjusting to a Covid-19 pandemic are still a major issue, affecting individuals, companies and governments.
At Open Risk we have been tracking and will continue to interpret the impact of the pandemic via a number of projects:
We explore a variety of distinct ways to visualize the same simple dataset. The post is an excursion into the fundamentals of visualization - a partial deconstruction of the process that highlights some common techniques and associated issues.
Course Objective This course is a deep-dive into the structure of visualizations, in particular visualizations of timeseries data. The course is now live at the Academy.
Pre-requisites Knowledge of basic visualization techniques and mathematical notation of functions and maps. Familiarity with data series and their usage in data science.
Summary of the Course What we aim to achieve in this course is to “deconstruct” how typical and less common visualization of timeseries work.
Non-Performing Loans The covid-19 crisis will certainly impact the concentration of Non-Performing Loans but given the special nature of this economic crisis compared (in particular) with the 2008 financial crisis it is unclear how precisely things will evolve.
In a previous post and white paper (OpenRiskWP07_022616) we discussed the importance of advancing open and transparent methodologies for managing the risks associated with such credit portfolios. Effective management of NPL is also a top regulatory priority.
Course Content This course is a CrashProgram (short course) introducing the GeoJSON specification for the encoding of geospatial features. The course is at an introductory technical level. It requires some familiarity with data specifications such as JSON and a very basic knowledge of Python
Who Is This Course For The course is useful to:
Any developer or data scientist that wants to work with geospatial features encoded in the geojson format How Does The Course Help Mastering the course content provides background knowledge towards the following activities:
What do people talk about at FOSDEM 2020 Introduction FOSDEM is a non-commercial, volunteer-organized European event centered on free and open-source software development. It is aimed at developers and anyone interested in the free and open-source software movement. It aims to enable developers to meet and to promote the awareness and use of free and open-source software. FOSDEM is held annually since 2001, usually during the first weekend of February, at the Université Libre de Bruxelles Solbosch campus in the southeast of Brussels, Belgium.