Skip to content

add post about the pipeline approach #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions learn/pipelines/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: "Outbreak analytics Pipelines"
author:
- name: "Andree Valle-Campos"
orcid: "0000-0002-7779-481X"
- name: " Carmen Tamayo Cuartero"
orcid: "0000-0003-4184-2864"
- name: "Anna Carnegie"
orcid: "0000-0002-6385-7795"
- name: "Sebastian Funk"
orcid: "0000-0002-2842-3406"
- name: "Adam Kucharski"
orcid: "0000-0001-8814-9421"
- name: "Rosalind M Eggo"
orcid: "0000-0002-0362-6717"
date: last-modified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't know about this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly I spend too much time looking at the quarto documentation, hehe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realize this may come with issues as we sometimes need to update already published posts (typos, breaking changes in quarto, broken URLs, etc.)

categories: [outbreak analytics, pipelines, tasks, packages]
bibliography: pipelines.bib
image: "sigmund-4CNNH2KEjhc-unsplash.jpg"
format:
html:
toc: true
---

## The Pipeline approach

We can solve Outbreak Analytics *tasks* connecting multiple packages in *pipelines*.

## Outbreak analytics

*Outbreak analytics* is a specialized field within data science that focuses on the technological and methodological aspects of the outbreak data pipeline. This includes the systematic collection, analysis, modeling, and reporting of data to inform outbreak response [@polonsky2019outbreak].

### Tasks

We can view Outbreak analytics as a set of related data analysis __Tasks__. In @fig-tasks we represent this in a directed graph, where each *node* is a Task and each *directed edge* represents the flow of input and output data. Tasks are connected similarly to the [tidyverse](https://r4ds.hadley.nz/whole-game.html) diagram for exploratory data analysis.

![Task for outbreak analytics](task_pipeline-minimal.svg){#fig-tasks fig-alt="Directed graph where tasks are nodes and data flows are directed edges like arrows. One task connect with multiple other tasks."}

In @fig-tasks-detailed we have a summarized detail of data inputs and outputs between Tasks. For example, for the first task on the left called *Read case data* we need a data input called *Case data* to get two data outputs called *Linelist* and *Contact data*.

![Detailed task paths](task_pipeline-detailed.svg){#fig-tasks-detailed}

One Task can contain different methods and packages for similar data inputs and outputs.

### Pipelines

We defined a __Pipeline__ as a set of connected Tasks required to obtain an informative outcome for decision-making purposes.

For example, to quantify the time-varying reproduction number we can follow the *Transmissibility pipeline* (@fig-pipe-01). First, we *Read case data* to generate a linelist. Then, we *Describe case data*, using the linelist as inputs to generate delay distributions and epicurves. Finally, we use both outputs as inputs to *Quantify transmission* and generate an estimate of transmission. This output allows us to determine the intensity of interventions needed to achieve epidemic control [@cori2017key].

![Transmissibility pipeline](task_pipeline-pipe_01.svg){#fig-pipe-01}

Similarly, to simulate the final size of an epidemic we can follow the *Scenarios pipeline* (@fig-pipe-02). First, we *Read population data* to obtain its demographic distribution and social contact matrix. Next, we collect the estimate of transmission data output, ideally from the *Transmissibility pipeline*. Finally, we use these three data as inputs to *Simulate transmission scenarios* and determine the proportion of the population infected. This output allows us to assess the long-term impact of the outbreak and evaluate intervention choices [@cori2017key].

![Scenarios pipelines](task_pipeline-pipe_02.svg){#fig-pipe-02}

## How we use the Pipelines?

We use the Pipeline approach to connect multiple packages in the design of:

- Reproducible report templates per Pipeline stored in the [`{episoap}`](https://epiverse-trace.github.io/episoap/) package,
- Code scripts stored in the [`{howto}`](https://epiverse-trace.github.io/howto/) repository, and
- [New](https://github.com/orgs/epiverse-trace/discussions/87) packages in relation to other upstream packages and tasks.

## Attributions

- The image of this feed is from [Unsplash](https://unsplash.com/photos/4CNNH2KEjhc), provided by [Sigmund](https://unsplash.com/@sigmund), free to use under the [Unsplash License](https://unsplash.com/license).
26 changes: 26 additions & 0 deletions learn/pipelines/pipelines.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
@article{cori2017key,
doi = {10.1098/rstb.2016.0371},
url = {https://doi.org/10.1098/rstb.2016.0371},
year = {2017},
month = apr,
publisher = {The Royal Society},
volume = {372},
number = {1721},
pages = {20160371},
author = {Anne Cori and Christl A. Donnelly and Ilaria Dorigatti and Neil M. Ferguson and Christophe Fraser and Tini Garske and Thibaut Jombart and Gemma Nedjati-Gilani and Pierre Nouvellet and Steven Riley and Maria D. Van Kerkhove and Harriet L. Mills and Isobel M. Blake},
title = {Key data for outbreak evaluation: building on the Ebola experience},
journal = {Philosophical Transactions of the Royal Society B: Biological Sciences}
}

@article{polonsky2019outbreak,
doi = {10.1098/rstb.2018.0276},
url = {https://doi.org/10.1098/rstb.2018.0276},
title={Outbreak analytics: a developing data science for informing the response to emerging pathogens},
author={Polonsky, Jonathan A and Baidjoe, Amrish and Kamvar, Zhian N and Cori, Anne and Durski, Kara and Edmunds, W John and Eggo, Rosalind M and Funk, Sebastian and Kaiser, Laurent and Keating, Patrick and others},
journal={Philosophical Transactions of the Royal Society B},
volume={374},
number={1776},
pages={20180276},
year={2019},
publisher={The Royal Society}
}
Binary file added learn/pipelines/sigmund-4CNNH2KEjhc-unsplash.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions learn/pipelines/task_pipeline-detailed.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions learn/pipelines/task_pipeline-minimal.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions learn/pipelines/task_pipeline-pipe_01.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions learn/pipelines/task_pipeline-pipe_02.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.