Motivation for the comparison
A large component of risk management relies on data processing and quantitative tools. In turn, such information processing pipelines and numerical algorithms must be implemented in computer systems.
Computing systems come in an extraordinary large variety but in recent years open source software finds increased adoption for diverse applications (machine learning, data science, artificial intelligence). In particular cloud computing environments are primarily based on open source projects at the systems level. This facilitates (but does not require) the use of open source computational tools such as python or R.
A new entry at the Open Risk Manual The Python versus R Language article is a side by side comparison of a wide range of aspects of the python and R language ecosystems.
The comparison aims:
To cover most common use cases that are relevant for the implementation of quantitative risk models (please provide feedback for additions)
Be useful for people that are at least somewhat familiar with programming (and optionally one or both of the two languages)
Be fact oriented (please provide feedback if you spot errors)
The comparison is not aimed to:
Be a detailed / comprehensive catalog of libraries (which count to thousands)
Cover use cases that are far removed from quantitative risk models
Be exhaustive (e.g identify all the possible computer systems one can run python or R)
Structure of the comparison
The comparison is structured around a number of sections as per the list below. For each meaningful element there is reference to the Python / R element and (where appropriate) relevant comments.
- History and Community: An overall comparison of the history of the two ecosystems, towards answering the question: who is really behind python and R?
- Devices and Operating Systems: Where (as in what kind of device and operating system) can I use Python or R.
- Package Management: How can I extend the Python or R functionality with existing libraries. The ease of installing packages is a very important aspect of the popularity of both and in marked contrast e.g. to languages like C++
- Language Characteristics: What does code in Python or R look like from a programming perspective? Many standard aspects of programming languages are available in both so are not included
- Development Environment: How can I develop and test code / applications written in Python or R
- Files, Databases and Data Manipulation: What direct connectors to disk files and databases are available for Python and R respectively. Once I have connected to a data source, how can I store and do preliminary work with imported data?
- General Purpose Mathematical Libraries: What basic building blocks are available for undertaking quantitative work in Python and R respectively?
- Statistics Libraries: What libraries are available for undertaking standard statistical studies in Python or R? There is a huge number of packages / modules with significant duplication / overlap, especially for the R system, hence only the major / indicative ones are considered.
- Econometrics Libraries: What libraries are available for undertaking econometric (timeseries) studies in Python or R?
- Machine Learning Libraries: What libraries are available for machine learning projects in Python or R? The term machine learning is not too specific so we use this category to group various advanced / specialized libraries (of use in quantitative risk management)
- Visualization: What functionality is available to produce data driven visualization in Python or R?
- Web, Desktop and Mobile Deployment: What tools does each language ecosystem provide for the deployment of applications, whether this is via the web, desktop or mobile apps
- High Performance Computing: What are my options if I have performance bottlenecks in terms of CPU, memory or disk
As with all Open Risk Manual articles, this is a living entry. Contributions, corrections, additions are welcome either with feedback on this post or at the Manual page.