Pillar Biosciences
Senior Computational Biologist
Natick, MA
November 2021-March 2024
- Optimized hyperparameters for machine learning algorithms to suit different panels, using a random search of the parameter space and custom loss functions.
- Adapted machine learning based CNV caller from proof of concept code to fully operational pipeline component, which reduced false positive calls by ten fold.
- Refactored code bases which had diverged for years of independent development into a coherent whole which met the needs of both. Integrated the git history to reflect the development of both repositories.
- Streamlined the flow of application data between microservices to enforce loose coupling. This helped enable more flexible application execution and faster run times.
- Implemented a BWA based NGS contamination filter to remove known contaminating sequences before they could confound important results.
- Developed an NGS QC application which enabled the production team to spot reagent contamination before it caused costly errors.
- Created a binary parser for raw Illumina outputs to provide data for NGS QC.
Merck Research Laboratories, IT
Senior Specialist, Scientific Solutions Engineering
Boston, MA
April 2020-October 2021
- Drove product discovery for a data visualization tool for gene expression analysis
- Led an international engineering team in implementing an R-Shiny data visualization application from the ground up
- Instituted a Scrum based workflow to maintain feature delivery
- Architected a data model for storing and displaying complex scientific data across several experimental domains: single cell RNA sequencing (scRNA-Seq), CRISPR, bulk RNA-Seq, microarray
- Constructed efficient data queries for a database including tables hundreds of millions of rows in length
- Integrated R-Shiny with a highly specialized scientific Python library (ScanPy) using containerization (Singularity)
- Deployed R-Shiny application to multiple environments automatically using Jenkins
- Designed informative and attractive scientific visualizations and user interfaces used by dozens of biologists to facilitate drug discovery
Merck Research Laboratories
Senior Scientist, Computational Biology
Boston, MA
February 2016-March 2020
- Provisioned and managed AWS resources for computationally intensive bioinformatics applications
- Developed and maintained QC analysis pipeline for next generation sequencing (NGS) datasets on a high performance computing cluster (HPC)
- Ported bioinformatics processing pipelines from on-prem HPC infrastructure to proprietary cloud computational biology platforms Seven Bridges and DNAnexus, as well as to AWS resources using the Cromwell and Nextflow pipeline DSLs
- Productized a single-cell RNA-Seq pipeline using Sun Grid Engine
- Facilitated the transfer and storage of hundreds of TBs of NGS data using Aspera, AWS, Seven Bridges, DNAnexus, and even one time a bunch of hard drives through the mail
Merck Research Laboratories
Genetics Intern
West Point, PA
June 2015-August 2015
- Created visualization workflows and SQL queries to aid drug candidate safety evaluation
- Analyzed and integrated data from next generation sequencing, microarray, nanostring and qPCR experiments
- Developed power analyses to inform genomics experimental design
University of Pennsylvania School of Medicine
Technical Consultant
Philadelphia, PA
June 2015-September 2015
- Instructed PhD level researchers in laser capture microdissection
Merck Research Laboratories
Genetics Intern
West Point, PA
June 2014-August 2014
- Designed workflow to integrate several Matlab data sources into SQL Server database
- Created statistical data visualizations of gene expression data in Tibco Spotfire, and Matlab
- Analyzed and integrated data from next generation sequencing, microarray, and qPCR experiments
Drexel University
Research Technician
Philadelphia, PA
February 2014-June 2014
- Designed flexible command line interfaces for custom tools, accommodating a variety of workflows and standards
- Deployed self-contained Python packages for easy distribution and installation on high-performance computing platforms
- Programmed Python application software to analyze and visualize phylogenetic trees of sequencing data
University of Pennsylvania School of Medicine
Research Specialist
Philadelphia, PA
May 2009-June 2013
- Managed dozens of data sets and thousands of measurements from hundreds of samples
- Coordinated and performed data processing and analysis for high-throughput next-generation sequencing experiments (RNA-Seq)
- Designed a high throughput qPCR experiment which required 3 technicians, handled data from dozens of patients, and used hundreds of samples
- Engineered computational image analysis techniques based on OpenCV for characterization of histology image data, used to automatically quantify hundreds of photomicrographs
- Overhauled gene expression quantitation (qPCR) procedures in laboratory, as well as accompanying data management and statistical analysis strategy
- Programmed custom data acquisition software for a 3rd party sensor with no published interface
- Devised a technique for manual single cell sorting for qPCR experiments with which I was able to collect high quality RNA from as few as 100 cells. I then instructed graduate level researchers in its execution
- Learned flow cytometry, and conducted flow cytometry experiments
- Engineered techniques for extracting high quality RNA from hard to isolate tissue samples
- Performed and instructed PhD researchers in laser capture microdissection