Exploratory Data Analysis

Connecting the Dots, Tensor Representations of Activitypub Networks

Connecting the Dots, Tensor Representations of Activitypub Networks

Connecting the Dots, Tensor Representations of Activitypub Networks

Reading Time: 4 min.
What are ActivityPub Networks? ActivityPub is a technical specification towards decentralized (more precisely, federated) social networking (termed the Fediverse) based upon the exchange of ActivityStreams messages that follow the Activity Vocabulary. The ActivityPub proposal has been standardized and published by the W3C and has motivated the design of several federated social networking systems. There are presently several concrete ActivityPub compliant implementations and the protocol sees meaningful adoption, primarily in the domain of federated social networks.
Exploring Ten Years of FOSDEM talks

Exploring Ten Years of FOSDEM talks

We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools. In the process we identify the imprint of the pandemic on attendance, the longest ever title, the distribution of mindshare of time and some notable newcomers.

Reading Time: 12 min.
FOSDEM is a non-commercial, volunteer-organized, two-day conference celebrating free and open-source software development. The conference has a geographic focus on European open source ecosystems and projects. FOSDEM is primarily aimed at developers, across the entire range of software and aims to enable them to meet and discuss the status of projects. We look into ten years of FOSDEM conference data to start getting to grips with the open source phenomenon and also explore techniques for data review and exploratory data analysis using (of course) open source python tools.
Connecting the Dots: Concentration, diversity, inequality and sparsity in economic networks

Connecting the Dots: Concentration, diversity, inequality and sparsity in economic networks

In this second Open Risk White Paper on "Connecting the Dots" we examine measures of concentration, diversity, inequality and sparsity in the context of economic systems represented as network (graph) structures.

Reading Time: 6 min.
Concentration, diversity, inequality and sparsity in the context of economic networks In this second Open Risk White Paper on Connecting the Dots we examine measures of concentration, diversity, inequality and sparsity in the context of economic systems represented as network (graph) structures. We adopt a stylized description of economies as property graphs and illustrate how relevant concepts can represent in this language. We explore in some detail data types representing economic network data and their statistical nature which is critical in their use in concentration analysis.
9 Ways Graphs Show Up in Data Science

9 Ways Graphs Show Up in Data Science

We explore a variety of distinct uses of graph structures in data science. We review various important graph types and sketch their linkages and relationships. The review provides an operational guide towards a better overall understanding of those powerful tools

Reading Time: 1 min.
Course Objective Graphs (and the related concept of Networks) have emerged from a relative mathematical and physics niches to become mainstream models for describing and interpreting various phenomena. The objective of the course is to review various important graph types as they are increasingly explored in data science and sketch their linkages and relationships (a graph of graphs!). It is not meant to be a rigorous mathematical or computer science classification of graphs but rather an operational guide towards a better overall understanding of those powerful tools.
21 Ways to Visualize a Timeseries

21 Ways to Visualize a Timeseries

We explore a variety of distinct ways to visualize the same simple dataset. The post is an excursion into the fundamentals of visualization - a partial deconstruction of the process that highlights some common techniques and associated issues.

Reading Time: 1 min.
Course Objective This course is a deep-dive into the structure of visualizations, in particular visualizations of timeseries data. The course is now live at the Academy. Pre-requisites Knowledge of basic visualization techniques and mathematical notation of functions and maps. Familiarity with data series and their usage in data science. Summary of the Course What we aim to achieve in this course is to “deconstruct” how typical and less common visualization of timeseries work.
Data Quality and Exploratory Data Analysis using Python

Data Quality and Exploratory Data Analysis using Python

Reading Time: 0 min.
Data Quality and Exploratory Data Analysis using Python In two new Open Risk Academy courses we figure step by step how to use python to work to review risk data from a data quality perspective and how to perform exploratory data analysis with pandas, seaborn and statsmodels: Introduction to Risk Data Review Exploratory Data Analysis using Pandas, Seaborn and Statsmodels