mlops-covid

My capstone project for the MLOps Zoomcamp The project consists on implementing the MLOps environment for a COVID Predictor. Examples of predictions are:

Predict the cases for today for a given location
Predict the cases for following days for the whole world.
...

Problem description

The goal will be predicting the cumulative number of confirmed COVID19 cases in various locations across the world, as well as the number of resulting fatalities, for future dates.
The input data is available here rom the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The problem is inspired by the Kaggle competition https://www.kaggle.com/competitions/covid19-global-forecasting-week-2

Note: There is a README.md file in each folder with instructions.

Solution description for V1:

Experiment tracking and model registry:
- Experiment tracking with MLflow: several items will be tracked:
  - Parameters used in the models like the maximal degree of the polynomial features, ...
  - Metric: the main metric will be root mean squared logarithmic error (RMSLE)
  - Artifacts: here mainly we will work with pickled models for the different models tested
  - Source will not be tracked
- Model management and Model registry with MLflow: for model versioning, stage transitions,... and mainly to have a model in production that will be used in the pipeline
Workflow orchestration
- The workflow will be orchestrated using Prefect
Model monitoring
- The model monitoring will consist in the supervision of the RMSLE also with Prefect

For more details see the section below "Explanation of the folders and files in order"

Solution description for V2:

Added folders /deployment_PrefectFargate (to deploy Prefect Agent and my code on Fargate containers) and /exp-track-mod-reg-mlflowFargate (to deploy mlflow infrastructure on AWS). See the folders and their READMEs for more details. A diagram can be found here.

Prerequisite

You need to have installed aws cli in your machine. See here for more details.

Init

/mlops-covid$ pip install pipenv && pipenv install --dev

Explanation of the folders and files in order

/EDA: here some basic EDA takes place on the covid dataset.
/exp-track-mod-reg: working with the Experiment tracking and Model Registry in MLflow. A Model is selected for the deployment. Different strategies are considered for deployment.

(v2) /exp-track-mod-reg-mlflowFargate: to deploy mlflow infrastructure on AWS, and in the notebooks I work with this mlflow server online.

/deployment: basic deployment of the model in a batch mode scheduled with Prefect running the Agent Locally
/deployment_PrefectRemoteAgent: deployment of the model in a batch mode scheduled with Prefect running the Agent remotely in a EC2 instance started manually.

(V2) /deployment_PrefectFargate: to deploy Prefect Agent and my code on Fargate containers.

/monitoring: added monitoring via another Prefect flow that stores monitored metric in S3 bucket.
/deployment+monitoring: putting together the both Prefect flows (prediction done in deployment_PrefectRemoteAgent and monitoring flow).
/testing: unit test, linting and formating to get a code.
File .pre-commit-config.yaml with pre-commit hooks for testing, linting and formating.
Makefile that runs some checks and moves file folder that can be delivered. Instructions on how to have it running are added manually in the README.md
Github actions in workflows. CI implemented in .github/workflows/ci-tests.yml for the setup, installing, testing, linting... For the Github actions to work you need to set the corresponding Secrets: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, PREFECT_API_KEY, PREFECT_WS and PREFECT_S3BLOCK_NAME. CD not implemented since atm there is no Docker nor Terraform.

Reproduce

You can choose to reproduce all the steps of the process following the READMEs on the previously mentioned folders or just focus on the folder /delivery for the end result.

Github Versions for different Versions of the project

Project V1 - CU Task: 2u9yn99 - Github sha: 3a7197265fad1212399b61dac5219e5cf5a84874
Project V2 - CU Task: 34aghxe - Github sha: 38f6af5836474cc518679b577bb2148926d1457c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlops-covid

Problem description

Solution description for V1:

Solution description for V2:

Prerequisite

Init

Explanation of the folders and files in order

Reproduce

Github Versions for different Versions of the project

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.github/workflows		.github/workflows
.vscode		.vscode
EDA		EDA
delivery		delivery
deployment+monitoring		deployment+monitoring
deployment		deployment
deployment_PrefectFargate		deployment_PrefectFargate
deployment_PrefectRemoteAgent		deployment_PrefectRemoteAgent
exp-track-mod-reg-mlflowFargate		exp-track-mod-reg-mlflowFargate
exp-track-mod-reg		exp-track-mod-reg
monitoring		monitoring
testing		testing
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
backend.db		backend.db
mlflow_db.db		mlflow_db.db
pyproject.toml		pyproject.toml

jralduaveuthey/mlops-covid

Folders and files

Latest commit

History

Repository files navigation

mlops-covid

Problem description

Solution description for V1:

Solution description for V2:

Prerequisite

Init

Explanation of the folders and files in order

Reproduce

Github Versions for different Versions of the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages