What do people talk about at FOSDEM 2020

FOSDEM means Free and Open Source Software Developers European Meeting


FOSDEM is a non-commercial, volunteer-organized European event centered on free and open-source software development. It is aimed at developers and anyone interested in the free and open-source software movement. It aims to enable developers to meet and to promote the awareness and use of free and open-source software. FOSDEM is held annually since 2001, usually during the first weekend of February, at the Université Libre de Bruxelles Solbosch campus in the southeast of Brussels, Belgium. The history of FOSDEM is neatly available at Wikipedia, while the current conference (2020) website is available here.

FOSDEM is a large event (2 days, 8K+ participants, 800+ talks) which makes it a reasonable snapshot of the current state of the open source universe. The schedule of all talks (including abstracts) is available here, so we can attempt to get a glimpse at what the open source community is up to those days (or at least what they want to talk about)

Data and Methodology

The data set consists of the 835 talk descriptions (Available at the above linked xml file). The text of each talk description is pre-processed using NLTK to extract the unique keywords used in each talk. Once this is done we simply look at the frequency with which these unique keywords occur across multiple talks. Each keyword is thus assigned a frequency of occurence across the conference, while controlling for any speaker being particularly fond and loquacius in their use of a keyword in any particular description. To avoid spurious accuracy the frequency is rounded to no decimal digits. Given the nature of the distibution this means it is very crowded at the 0% and 1% buckets.

So, what are people talking about at FOSDEM 2020?

Here are the main (unashamedly subjective) observations, roughly in order of decreasing frequency. But do read till the end, because frequency is not the only measure that matters!

  • Perhaps not surprisingly for a conference, people talk alot about talking. Talk is the top keyword (59% of talks)
  • We dominates I, (34% vs 19%), clearly reflecting that open source is a collaborative endeavour (but maybe contaminated by a modicum of Royal We / Majestic Plural?)
  • Open Source is a top keyword (31%), while Proprietary is mentioned in 4% of talks (no surprise here either).
  • Next we have a cluster of keywords that define how people frame their contribution: Project, System, Software, Tool, Feature, Application, Code appear 32%, 28%, 27%, 24%, 23%, 22%, 22% respectively.
  • Data comes next at 21%, signalling maybe the growing link of software development with data science.
  • The first named piece of software is Linux at 12%, confirming what everybody knows (or should know), namely that open source is really built on top of linux.
  • The first more specific hints at architectural designs come as with the API, Network, Library, Cloud, Web keywords with 13%, 12%, 12%, 10%, 10% scores respectively.
  • The Ecosystem word, love it or hate it, is thrown out a lot (8%)
  • Container is the first generic technology buzzword (7%)
  • Python is the first programming language and the second named piece of software after linux (6%) confirming its deep integration and relevance to the open source movement
  • Kubernetes is the third named technology (at 6%) and for many people maybe the first unknown word. Take notice!
  • IoT (internet of things) is the second notable generic buzzword at 5%
  • Back to the staples, the Database concept is mentioned in 4% of talks. In terms of actual databases, the top keywords are Mysql (2%) and Postgresql (1%)
  • Google is the company mentioned most frequently (4%), Intel the second (2%)
  • Machine Learning is a thing (4%) but AI not so much (1%). To wrap up the Gartner hypecycle buzzwords, Quantum Computing much lower at 12 mentions and Blockchain is only mentioned twice (which probably means something but not clear what)
  • In terms of open source focused organizations, Mozilla is at 2%, same as Apache
  • In the rest of the language wars (after Python) we have: Java (4%), C/C++, Rust (3%), R, Ada, Kotlin (2%). The rising popularity of Rust is clear. No idea about Ada.
  • In the repository preferences, Github (3%) versus Gitlab (1%)
  • Libreoffice at 2% is the first named software that is purely oriented towards the end-user. Turns out 2020 will again not be the year of the open source desktop
  • Along the end-user theme, the Firefox browser at 1% beats Chrome at 0% (12 versus 3 mentions) revealing where the open source crowd’s allegiance lie, but the market share in the general population is a very different story.
  • How is the rest of big tech doing? For what it is worth: Facebook, Amazon, IBM, Microsoft, Apple receive 9, 8, 7, 6 and 4 mentions respectively (all below 1%)
  • Finally, last but not least, 63% of all keywords are only mentioned once or twice. People talk alot about alot of things! Long live the long tail of open source!

Disclaimers and Further Work

Well, there are many, but this is blog post meant to highlight the vibrancy and fun of the open source world, not an article to be published in the esteemed Journal of Open Sourceology. So we leave it at that!