Alex Hayes

Statistician | open source R developer | data scientist

I develop new statistical methods to solve complex problems, write code for production, and turn data into understandable insights. My specialties are causal inference and network data. I have a track record of clearly communicating with technical and non-technical audiences, working well with others, and succeeding in remote environments.

alexpghayes@gmail.com
www.alexpghayes.com
Madison, WI

Experience

Research Assistant

UW-Madison, Statistics Department

August 2018 - Present
Madison, WI

Understanding the causal interpretation of network regression

Invented statistical method to estimate causal effects of group membership in social networks
Enabled novel social science research by developing technique for network regression. Technique allows assessment of how individual outcomes vary with nodal information and network position and is simple to implement and scales to network with millions of nodes
Characterized the interpretation of network regressions by developing structural causal models and directed acyclic graphs (DAGs). Characterized mediating and confounding effects of group membership in social networks
Disseminated results by presenting to fellow researchers and at academic conferences. Implemented new methods in an R package. Authored an academic manuscript explaining the technical details of our method

Efficient data fusion to detect harmful interventions without observing an outcome

Developed method to estimate when product changes have harmful side-effects on unmeasured outcomes by combining experimental and observational datasets
Increased precision of estimates by deriving efficient semi-parametric estimators using causal machine learning techniques. Reduced computational overhead by a factor of 5000.
Determined when causal inference combining multiple datasets is possible by deriving identification conditions using transportability DAGs

Network clustering when edges are missing via interpretable embeddings

Designed method to embed and cluster networks when edges in the network are missing, by combining matrix completion and factor analysis techniques
Solved computational challenges by implementing specialized matrix completion methods for structured forms of missing data in R and C++. Scaled implementation from networks with thousands of nodes to networks with millions of nodes
Investigated the structure of the academic statistics literature. Detected new, sociologically interesting sub-fields by interpreting spectral embeddings using varimax rotation

Assessing trustworthiness on Twitter via Personalized PageRank

Assessed trustiworthiness and community membership of Twitter accounts by approximating Personalized PageRank using public information about who follows who
Implemented a flexible and generic Personalized PageRank approximation using a combination of generic-function and encapsulation object-oriented programming. Managed unreliable Twitter API behavior by caching retrieved data in a Neo4J database running on Docker

Regularizing networks to obtain useful network embeddings

Devised effective strategies for regularizing network data to obtain high quality spectral embeddings that can be used for real world data analysis
Developed and theory and conducted extensive simulations and experiments on network data to understand and interpret impact of regularization methods

Statistical Consultant

Self-employed

August 2018 - Present
Remote

Advised psychologists, biologists and neuroscientists on experimental design, experimental data analysis, data visualization, project management and statistical communication
Created reports to clearly communicate key takeaways from experimental data, using reproducible analysis practices to allow for easy extension and collaboration

Research Intern

Facebook, Core Data Science

May 2021 - August 2021
Remote

Experimented with hyperbolic graph embeddings as a method to suggest related entities in large knowledge graphs. Used Python, PyTorch and SQL
Synthesized academic results on hyperbolic embeddings and experiments on internal datasets. Advise research teams on future use of hyperbolic methods
Prototyped pipeline to automatically suggest content tags, using neural networks to embed co-occurrence datasets

Research Intern

Facebook, Core Data Science

May 2020 - October 2020
Remote

Designed a novel metric to understand the reliability of classifiers trained on a rolling basis
Used Python, scikit-learn and SQL to implement calibration techniques for machine learning models to compute reliability
Communicated across departments to help other teams understand and trust estimated prevalence of objectionable content
Illustrated uses of the reliability metric and explained computational details by authoring a conference manuscript

Intern

RStudio, tidymodels team

June 2018 - October 2018
Remote

Improved API consistency and reduced maintenance burden of broom package (750K+ downloads/month, part of the tidyverse) by re-factoring thousands of lines of R code and developing a new test suite
Authored contribution guidelines and coordinated 40+ pull requests from open source contributors, resolving 80+ open Github issues and bugs
Managed major new release (broom 0.5.0) by giving presentations and authoring blog posts to pro-actively communicate with stakeholders

Education

Ph.D. Statistics
University of Wisconsin-Madison (anticipated May 2024)

M.S. Statistics
University of Wisconsin-Madison, 2020

B.A. Statistics with Distinction in Research and Creative Work
Rice University, 2018

Skills

Human

Explaining technical concepts and challenges to non-technical audiences

Specialized skills

Network analysis
Causal machine learning
Interference & mediation
Factor analysis
Multivariate analysis & dimension reduction
Network embeddings
Matrix completion
Missing data in networks
Dependent data

Statistics

Quantitative data analysis
Data collection and management
Data visualization
Causal inference
A/B testing
Experiments
Regression
Clustering
Methods development
Hypothesis testing

Software engineering

R package development
Python
SQL
Reproducible data analysis
Linux
Git and version control
Some C++, Docker, AWS

Statistical software and projects

broom (CRAN/tidyverse)
distributions3 (CRAN)
aPPR (Github)
vsp (CRAN)
fastadi (CRAN)
Scientific software reviewer for ROpenSci and the Journal of Open Source Software
Collaborated with ROpenSci to established standards for statistical software
Co-taught machine learning workshop based on tidymodels at rstudio::conf(2019)
Mentored new open source contributors

Last updated 2023-11-12.