# Guide to Creating New Extrapolation Models

This article will describe how you can use our open-source gitlab project to develop your own extrapolation models from the COVID-19 dataset.

We carefully chose our tools to make it fast & easy to get started developing your own models. It should take less than 10 minutes to:

- clone our repo,
- generate your first graph from our example models, and
- begin writing your own models in python

If you encountered issues and managed to use our toolset on a non-Debian system, please open a ticket on our gitlab and we'll add it to the wiki

## Bootstrap

Coviz requires `python3`

and `python3-pip`

. It also requires `plotly`

and `numpy`

that will be installed from `pip3`

.

Execute the following commands from a terminal to install these requirements

sudo apt-get -y install git python3-pip # use --recurse-submodules so our <1M submodule repo with the CSV dataset is # also fetched: https://gitlab.com/coviz-org/data-jhcsse git clone --recurse-submodules [email protected]:coviz-org/coviz-models.git cd coviz-models pip3 install requirements.txt

## Generate graphs

You should now have everything you need to generate your first graph.

Execute the following commands from the `coviz-models`

directory to generate the graphs using our example extrapolation models.

./generateGraphs.py ls

The above `./generateGraphs.py`

should have created a new directory called `output`

. Inside the `output`

directory you'll see three more directories--one for each of the different extrapolation models:

user@coviz:~/sandbox/coviz-models$ ls -1 output/ my-new-model projections-based-on-last-three-days projections-based-on-last-seven-days user@coviz:~/sandbox/coviz-models$

Inside of each of those models' directories, you'll an html file for each of the dataset's regions (ie: Afghanistan, US, Diamond Princess, etc).

user@coviz:~/sandbox/coviz-models$ ls output/projections-based-on-last-three-days/ us.html user@coviz:~/sandbox/coviz-models$

By default (for faster execution), only the US graphs are created. If you'd like to generate graphs for additional regions, specify one or more regions with `--region`

. If you'd like to generate the graphs for all the regions, use `--earth`

.

`--earth`

will take a long time!
You can also specify `--help`

for a list and description of all the options to `generateGraphs.py`

.

./generateGraphs.py --help

## Viewing Graphs

To view the graphs generated above from `./generateGraphs.py`

, open the html file `output/<model>/<region>.html`

(eg `output/projections-based-on-last-three-days/us.html`

) in your browser to see the graphs.

firefox output/projections-based-on-last-three-days/us.html

## Creating your own model

If you'd like to create your own extraplation model, you can edit the `models/my_new_model.py`

file to your liking.

You just need its `make_extrapolate`

function to return a function that will take a single argument (the x value on the graph) and return another number (that x's cooresponding y value on the graph).

By default, the `models/my_new_model.py`

script is just a copy of the `e2a_seven`

extrapolation model, which does a simple curve fit against the most recent seven days of data using a second-degree polynomial with numpy's `poly1d()`

function.

Let's change the `models/my_new_model.py`

to do a curve fit against the most recent thirty days instead of seven.

Execute the following command to edit the file `models/my_new_model.py`

in `gedit`

.

gedit models/my_new_model.py

The first thing you'll notice is how short the script is! Python's `numpy`

module is fantastic, and it does most of the heavy lifting. Here's the entire file.

import numpy def make_extrapolate(data): x = [i for i in range(len(data))] y = data # fit exponential curve to last seven days of data curve = numpy.polyfit(x[-7:], y[-7:], 2) # create function to be applied for extrapolation extrapolate = numpy.poly1d(curve) return extrapolate def meta(): return { 'title': 'My New Model' }

The important line that does the curve-fit is this one:

curve = numpy.polyfit(x[-7:], y[-7:], 2)

You can find the python documentation on numpy's `polyfit()`

function here.

- The first argument to the
`polyfit()`

function is`x`

, which is a list of x coordinates - The second argument to the
`polyfit()`

function is`y`

, which is a list of y coordinates - The third argument to the
`polyfit()`

function is`deg`

, which is an`int`

that defines the degree of the fitting polynonomial. In all our example models, we use a second-degree fit.

Go ahead and change this line to the following, which will increase the set of data passed to the `polyfit()`

function from the most-recent 7 days of the data set to the most-recent 30 days of the dataset.

curve = numpy.polyfit(x[-30:], y[-30:], 2)

Save an close the file `my_new_model.py`

, and re-generate the graphs.

./generateGraphs.py

Now open the `output/my-new-model/us.html`

file in your browser.

firefox output/my-new-model/us.html

Your browser will now show you the second-degree polynomial curve fit changed to fit against the most-recent 30 days.

You can confirm this by looking at the difference between the output of the other two models

firefox output/projections-based-on-last-three-days/us.html firefox output/projections-based-on-last-seven-days/us.html

## make_extrapolate()

Now you can make whatever modifications you'd like to the `my_new_model.py`

file's `make_extrapolate()`

function and follow the above procedure to generate its graph and check the result in your web browser.

In fact, you're not constrained to using `numpy`

. The only constraint is that your `make_extrapolate()`

function should return a function that takes `x`

values and returns `y`

values. Simple, right?

## Submitting Extrapolation Models

Did you build an awesome extrapolation model from this guide and want to share it with the world? Great!

You can submit a ticket on our gitlab group for this. Make sure to include:

- The python code to produce the model (ie: the contents of
`my_new_model.py`

) - A human-readable name for the model (< 45 characters)
- A short id for the model (3-5 characters)
- A short description of the model (~1-10 sentences)
- A list of the pros of the model
- A list of the cons of the model
- A statement that by submitting their model to us, you are releasing the model copyleft under the CC-BY-SA license and its implementation code AGPLv3. You will need to state that [a] you are the original author and [b] that you permit anyone to use the model and any content produced by it under the terms of those licenses
- A list of the names of the authors & contributors for developing the model (optional)
- For each author/contributor, a hyperlink to a URL of their choice, such as a website or social media account for attribution (optional)

Please create a new ticket on our github with the above information, and we'll work on integrating the model into our website. Thank you!

## Updating the Dataset

As the days pass, your data will become stale. The dataset should be updated once-a-day.

Execute the following command from the `coviz-models`

directory to update the repo and its dataset submodule.

git submodule foreach git pull origin master git pull