Data Engineering

Working with Large Matrices using Command Line Tools

Reading Time: 3 min.

Working with Large Matrices using Command Line Tools

CLI Tools for Data Science

This course is an Open Risk Academy CrashCourse introduction to using Linux command line tools to work (in particular) with large text files encoding numerical data in matrix format. A central role is played by awk, the venerable UNIX pattern-matching language and tool.

An Introduction to the Copernicus Satellite Data Ecosystem

Reading Time: 2 min.

Copernicus

Course Content:

This course is an introduction the Copernicus Satellite Data Ecosystem. It covers the following topics:

  • Getting to know the Copernicus Programme
    • Overview of the Copernicus Programme
    • Further Resources
  • The Copernicus Data Ecosystem
    • Copernicus Data Resources
  • Tools and Resources
    • Platforms, Tools and API’s
  • A worked out example
    • Using the Python OpenEO API

Who Is This Course For:

Data Scientists / Data Engineers in any Domain that need to use satellite data

Deep Dive Course on Tensor calculations with Eigen

Reading Time: 2 min.

Tensor

Course Content:

This course is an introduction to Tensor calculations with Eigen, a popular C++ library for working with numerical arrays and linear algebra. It covers the following topics:

  • We learn the concept and techniques of the Eigen Tensor class
  • How to declare, initialize Tensors of various ranks and types and how to access Tensor elements
  • Elementary unary and binary operations involving Tensors
  • More complex operations (reductions, contractions)
  • Modifying the shape of Tensors

Who Is This Course For:

Developers in any Domain that need to use higher-dimensional numerical data containers

An introduction to Semantic Data with Python

Reading Time: 1 min.
Python is the swiss knife of modern programming languages and a prime candidate to be also the swiss knife for risk modelling

Summary

This course is a CrashProgram (short course) in the use of Python to work with Semantic Data (RDF / OWL)

Requirements

The course is at a medium technical level. It requires some familiarity with python (and a working installation). On the semantic data side it requires knowledge of basic concepts around files and representation formats for data.

Intro to GeoJSON

Reading Time: 1 min.
Geographical features on a map

Summary

This course is a CrashProgram (short course) introducing the GeoJSON specification for the encoding of geospatial features.

Course objectives

  • You will be able to confidently discuss the geojson standard
  • You will be able to dive into geojson related development projects with confidence

The course is live at the Open Risk Academy, this repository hosts the python scripts used in the course. The scripts can be used standalone but documentation is minimal.

Loan Level Templates Using Python

Reading Time: 0 min.
Python is the swiss knife of modern programming languages and a prime candidate to be also the swiss knife for risk modelling

Summary

This course is a CrashProgram in the use of python for credit portfolio modelling purposes, in particular working with data templates and spreadsheets.

Content

The course covers the following topics:

Managing Loan Portfolios Using MongoDB

Reading Time: 1 min.
Python is the swiss knife of modern programming languages and a prime candidate to be also the swiss knife for risk modelling

Summary

This course is a CrashProgram in the use of the MongoDB database in conjunction with Python for credit portfolio management purposes.

Content

The course covers the following topics:

Class Inheritance in Data Science

Class Inheritance in Data Science

Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but don't travel well outside computer memory. When considering data science tasks and objectives the transition from object hierarchies to data structures (and vice versa) is not always straightforward. In this short course we explore how some programming languages, data formats, database API's and web frameworks handle hierarchical classes.

Reading Time: 3 min.

Summary

In this short course we explore how some programming languages, data formats, database API’s and web frameworks handle hierarchical classes.

Content

Object-oriented programming and techniques (OOP) such as using classes and inheritance are common in many application programming environments but alas don’t “travel well” outside computer memory. The potentially intricate relationships of objects (both the data they hold and the meaning and possible uses of the data) are not easy to transfer (except of-course by full replication of code and data). Hence when considering data science tasks and objectives that involving exchange of data, the transition from object hierarchies that live inside memory, to data structures that can be exchanged with another computer is not straightforward.

Introduction to the EBA NPL Templates

Introduction to the EBA NPL Templates

Reading Time: 3 min.

Summary

The Open Risk Academy course NPL270672 is a CrashCourse introducing the EBA NPL Templates.

Content

We start with the motivation for the templates and the domain of credit data (to which NPL data belongs). We discuss three core classes that capture the essence of lending operations from a lenders point of view (Counterparty, Loan, Collateral). Next we explore classes that capture events in the lending relationship lifecycle (which we term NPL Scenarios). We look into the main data types: elementary data types, choice lists, arrays and unstructured text. We close with discussing some more complex issues involving graph and timeseries data.