Wanlin Li
PORTFOLIO

Data Analyst skilled in Python, R, SQL, Tableau @WanlinLi

Snakemake workflow: aPhyloGeo

Using the workflow management system Snakemake, we have developed a new phylogeographic pipeline, aPhyloGeo. aPhyloGeo is a user-friendly, fast, efficient, and comprehensive pipeline that can find relationships between a reference tree (i.e., a tree of geographic species distributions, a temperature tree, a habitat precipitation tree or others) with their genetic compositions.

Project: aPhyloGeo

Quantitative structure-activity relationship (QSAR)

project: QSAR

As a mother, you do everything possible to protect your baby during the early stages of pregnancy. Despite the protective role of the placenta, many molecules, including those from drugs and even the environment, are present in your bloodstream and manage to cross the placenta. QSAR aims to use machine learning to build models that predict the ability of chemical molecules to cross the placenta.

Plotly/Dash
display cases of SARS-COV-2 in smooth animation

project: dashboard animation

Plotly Dash is a Python framework for building web-based data visualization applications. The "animation" feature in Plotly Dash utilizes the Plotly JavaScript library to create dynamic and responsive animations that can be easily integrated into web applications.
Thanks to Plotly, we created an animation dashboard to show confirmed cases of SARS-COV-2 in different regions of Quebec.

House price prediction with R

Projet: House Price

This project covers the linear regression model. It allows us to assess the relationship between variables in a data set and a continuous response variable.
The project consists of three parts:
(1) Exploratory data analysis (EDA) of the Ames Housing dataset.
(2) Model assumptions, selection, and interpretation.
(3) Model validation and out-of-sample prediction.


AI for Medicine

Project: AI for medicine

Create convolutional neural network models to make diagnoses of lung and brain disorders.
Build risk models and survival estimators for heart disease.
Build a treatment effect predictor, apply model interpretation techniques, and use natural language processing to extract information from radiology reports.

ATAC-Seq with Genrich

projet: ATAC_Genrich

ATAC-seq (Assay for Transposase-Accessible Chromatin with high-throughput sequencing) is a method for determining chromatin accessibility across the genome.
We analyze publicly available transcriptomic and epigenetic data to better describe the origin and role of a CD8+ T cell subpopulation.
This workflow used Genrich as the peak-caller.

ATAC-Seq with MACS2

projet: ATAC_MACS2

ATAC-seq (Assay for Transposase-Accessible Chromatin with high-throughput sequencing) is a method for determining chromatin accessibility across the genome.
We analyze publicly available transcriptomic and epigenetic data to better describe the origin and role of a CD8+ T cell subpopulation.
This workflow used MACS2 as the peak-caller.