Implementing a Data Science workflow tool

In 2016 and 2017 I taught a class in Advanced Scientific Programming at the TU Berlin and the Bernstein Centre for Computational Neuroscience, Berlin. The course is interesting in its format, consisting of a weeklong intensive session on software carpentry culminating in a pull request to a major scientific computing project (eg. numpy). This is followed by a semester long group project, in which the students must work together to produce a relatively large piece of scientific computing software.

In 2017 the project was to implement a Data Science workflow tool. The overt goal was to build a tool, capable of hosting multiple models, each of which attempts to predict the outcome of the 2017 German federal elections. I added constraints, such as the data must be automatically scraped and cleaned (rather than manually), and it should be cached locally rather than downloaded on each run.

The full code is available on the Foxy Predictor GitHub page. In the run-up to the elections an implementation of the code, with some models in action, is running on an Amazon AWS instance.

I will continue to add some details of the implementation and what the secret design goals were, here, in the days to come.