Hi, I'm Arthur.

I'm a data scientist with 7 years of experience collaborating with executive-level leadership and 14+ years in quantitative fields who specializes in experimental design and custom programming that transforms complex data challenges into measurable business impact. I excel at building cross-functional relationships that uncover hidden value, using machine learning and statistical analysis to inform data-driven decision making and solve problems. I am also a STEM educator and have regularly mentored data scientists new to the field.

Examples of Work Impact

Penn Cobalt

Penn Cobalt is a mental health platform for Penn employees. I validated the entire database to ensure the data accuracy and expected functionality of the site's diagnostic and self-help tools. 

After the initial site setup, I worked with the project director to continuously improve engagement with the site through A/B testing and other statistical techniques. This included identifying user segments for specialized targeting and resulted in adding new features to the site based on testing results. 

Blue Coats

The goal of Blue Coats was to improve the well-being and financial health of the Emergency Department. 

Ultimately, I identified an untapped data source and tens of thousands of dollars in monthly losses due to inefficiencies traced to a faulty supply closet scanner that was not being reviewed. 

To do this, we spoke to staff directly in order to identify problems that were buried within a complex hierarchy. By combining staff knowledge with novel datasets, I quickly identified the issue after designing a data collection, processing, and storage pipeline and reviewed its outcomes. The project was so successful that the Department of Medicine purchased its own iteration of the project for the upcoming business cycle.

Yelp Healthcare Facility Data Curation

The goal of the project was to get a full snapshot of all of Yelp's entire healthcare database. This required that I process the raw data into validated, analysis-ready files for facilities, specific facility categories, and facility reviews. Data modalities included structured, unstructured, and geographical data points.

Over the course of six years, I maintained the project at low cost in a production environment on AWS indefinitely without a single day of data loss. I developed an automated weekly data monitoring and reporting system to ensure data quality over time.

This data served as a foundation for over twenty publications in top-tier medical journals. It also served as a high-value source of patient perceptions of care delivery for health systems looking to improve their facilities. 

Relevant Work Experience

Data Scientist
University of Pennsylvania Healthcare System
Center for Healthcare Transformation and Innovation

Philadelphia, PA
February 2018-August 2024 

Machine Learning and Data Scientist Intern
Aramark Corporation
Philadelphia, PA
May-August 2016

Physics Educator
Paul VI High School
Haddonfield, NJ
September 2010-June 2014

Education

MS, Scientific Computing
Rutgers University–Camden

MA, Economics
Concentration in Applied Econometrics
University of Delaware

BA, Physics
University of Delaware

Key Skills

Programming
Python, SQL, Java, C, PySpark,
API, geospatial, git

Statistics
bayesian, descriptive, inferential, predictive, mixed-effects modeling

Business Platforms
MS - Power Automate, Apps, BI;
docker, github, gitlab, GA4, GTM

Visualization
matplotlib, seaborn, plotly,
dash, panel, flask

Machine Learning
     classification, clustering, RL, DL,      time series, causal inference

Cloud Platforms
AWS - S3, EC2, EFS, R53, IAM, 
Athena, Redshift; Azure, GCP

High-Performance Computing
parallel / distributed, CPU / GPU,
data mining, algorithm optimization

NLP
text mining, topic modeling, LDA, sentiment, text summarization

Data Platforms
JupyterLab, Databricks, Stata,
Hugging Face, Keras, PyTorch

Select Publications

Association Between Online Reviews of Substance Use Disorder Treatment Facilities and Drug-Induced Mortality Rates: Cross-Sectional Analysis

Conclusions: Lower online ratings of SUD treatment facilities were associated with higher drug-induced mortality at the state level. Elements of patient experience may be associated with state-level mortality. Identified themes from online, organically derived patient content can inform efforts to improve high-quality and patient-centered SUD care.

State and Federal Legislators’ Responses on Social Media to the Mental Health and Burnout of Health Care Workers Throughout the COVID-19 Pandemic: Natural Language Processing and Sentiment Analysis

Conclusions: State and federal legislators use social media to share opinions and thoughts on key topics, including burnout and mental health strain among health care workers. Variations in the volume of posts indicated that a focus on burnout and the mental health of the health care workforce existed early in the pandemic but has waned. Significant differences emerged in the content posted by the 2 major US political parties, underscoring how each prioritized different aspects of the crisis.