Overview of the Julia-Python-R Universe
We introduce a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, the trio sometimes abbreviated as Jupyter
Overview of the Julia-Python-R Universe
A new Open Risk Manual entry offers a side-by-side review of the main open source ecosystems supporting the Data Science domain: Julia, Python, R, sometimes abbreviated as Jupyter.
Motivation
A large component of Quantitative Risk Management relies on data processing and quantitative tools (aka Data Science ). In recent years open source software targeting Data Science finds increased adoption in diverse applications. The overview of the Julia-Python-R Universe article is a side by side comparison of a wide range of aspects of Python, Julia and R language ecosystems.
The comparison of the three ecosystems aims:
- To be useful for people that are somewhat familiar with programming and want to inspect options and use the most appropriate tool
- To promote interoperability, cross-validation and overall best-practices across the three pillars
- To be factual, as much as possible, without drifting to judgement / opinions
- To cover the use cases relevant for the implementation of quantitative risk models
This comparison does not aim:
- To be a detailed / comprehensive catalog of all available libraries (which by now count to many thousands!)
- To cover use cases that are very removed from quantitative risk models
- To be exhaustive (for example to identify all the possible computer systems one can run a Python interpreter on, or count all the possible ways one can perform linear regression in R)
Topics covered
- History and Community
- Devices and Operating Systems
- Package Management
- Package Documentation
- Language Characteristics
- Development Environment
- Files, Databases and Data Manipulation
- Data Quality and Data Validation
- Workflow Management
- General Purpose Mathematical Libraries
- Core Statistics Libraries
- Stochastic Processes
- Econometrics / Timeseries Libraries
- Machine Learning Libraries
- GeoSpatial Libraries
- Visualization
- Web, Desktop and Mobile Deployment
- Privacy-Preserving Computation
- Semantic Web / Semantic Data
- Bindings to Other Languages
- High Performance Computing
- Using R, Python and Julia together
Disclaimers
The comparison does absolutely not provide an assessment of which system is better. The proper way to use the comparison is to start with one’s objectives, knowledge level, use case and figure out what other components to add to their toolkit.
Remark: The comparison attempted here is not entirely appropriate, as the three systems for computing have quite different origins and, therefore, quite diverging architectural design choices.
For example, strictly speaking R is not a general programming language. R is a system for statistical computation and graphics. It consists of a sufficiently general language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.
Yet despite this disclaimer we hope you agree with us that a comparison is justified because in very large domain of applications and use cases the three frameworks can be used interchangeably (or nearly so).
Structure
- The comparison data are provided in tabular format in several distinct tables.
- Each table documents a relevant language or ecosystem subdomain.
- The number and focus areas of the different tables are somewhat arbitrary and may expand in the future.
- The order of the topics is roughly from more generic aspects towards more specialized / advanced areas, concluding with interoperatibility.
- Each table entry (row) highlights key functionality within the subdomain. The language columns point to information or packages and, where applicable, there might be additional commentary.
- Reference links are included when useful.
At the bottom of some tables there is a row indicated Package Review. This row has a collection of links to the CRAN Task Reviews that aim to summarize the large number of R packages available for some data science tasks. There are also links to a mirror effort to create Python Task Views (this content is still WIP - contributors welcome, see below)
Getting Involved
You can provide simple and anonymous feedback on the wiki version of the overview using the feedback button at the bottom of the page. Alternatively you can become an Open Risk Manual author and actively edit the page.
If you are more comfortable using github / markdown, there is a mirror page available here, which can be seen as a web page here. Please note that the tables are in html format as they are generated automatically.
People interested in developing the Python Task Views can contribute via the github repo.
Comment
If you want to comment on this post you can do so on Reddit or alternatively at the Open Risk Commons. Please note that you will need a Reddit or Open Risk Commons account respectively to be able to comment!