Overview of the Julia-Python-R Universe

Jupyter: The open source ecosystem for data science

Overview of the Julia-Python-R Universe

A new Open Risk Manual entry offers a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.

Motivation

A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The Overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems. The comparison of the three ecosystems aims:

  • To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool
  • To promote interoperability, cross-validation and overall best-practices
  • To be factual as much as possible without drifting to judgement / opinions
  • To cover use cases relevant for the implementation of quantitative risk models

The comparison does not aim:

  • To be a detailed / comprehensive catalog of all available libraries (which count to many thousands!)
  • To cover use cases very removed from quantitative risk models
  • To be totally exhaustive (e.g to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)

Disclaimers

The comparison does absolutely not provide an assessment of which system is “better”. The proper way to use the comparison is to start with one’s objectives, knowledge level, use case.

The comparison attempted here is not entirely appropriate as the three systems have quite different origins and architectural design choices. For example, strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Yet despite the disclaimer a comparison is justified because in very large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so)

Structure

The comparison data are provided in tabular format in several distinct tables. Each table documents a relevant language or ecosystem subdomain. The number and focus areas of the different table are somewhat arbitrary and may expand in the future. The order is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperatibility.

Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and (where applicable) there is commentary. Reference links are included when useful.

At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP – contributors welcome, see below)

Getting Involved

You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. Alternatively you can become an Open Risk Manual author and actively edit the page. If you are more comfortable using github / markdown, there is a mirror page available here, which can be seen as a web page here. Please note that the tables are in html format as they are generated automatically.

People interested in developing the Python Task Views can do so via the github repo.