Alex Hayes
Statistician | open source R developer | data scientist
I develop new statistical methods to solve complex problems, write code for production, and turn data into understandable insights. My specialties are causal inference and network data. I have a track record of clearly communicating with technical and non-technical audiences, working well with others, and succeeding in remote environments.
alexpghayes@gmail.com
www.alexpghayes.com
Madison, WI
Experience
Research Assistant
UW-Madison, Statistics Department
August 2018 - Present
Madison, WI
Understanding the causal interpretation of network regression
- Invented statistical method to estimate causal effects of group membership in social networks
- Enabled novel social science research by developing technique for network regression. Technique allows assessment of how individual outcomes vary with nodal information and network position and is simple to implement and scales to network with millions of nodes
- Characterized the interpretation of network regressions by developing structural causal models and directed acyclic graphs (DAGs). Characterized mediating and confounding effects of group membership in social networks
- Disseminated results by presenting to fellow researchers and at academic conferences. Implemented new methods in an R package. Authored an academic manuscript explaining the technical details of our method
Efficient data fusion to detect harmful interventions without observing an outcome
- Developed method to estimate when product changes have harmful side-effects on unmeasured outcomes by combining experimental and observational datasets
- Increased precision of estimates by deriving efficient semi-parametric estimators using causal machine learning techniques. Reduced computational overhead by a factor of 5000.
- Determined when causal inference combining multiple datasets is possible by deriving identification conditions using transportability DAGs
Network clustering when edges are missing via interpretable embeddings
- Designed method to embed and cluster networks when edges in the network are missing, by combining matrix completion and factor analysis techniques
- Solved computational challenges by implementing specialized matrix completion methods for structured forms of missing data in R and C++. Scaled implementation from networks with thousands of nodes to networks with millions of nodes
- Investigated the structure of the academic statistics literature. Detected new, sociologically interesting sub-fields by interpreting spectral embeddings using varimax rotation
Assessing trustworthiness on Twitter via Personalized PageRank
- Assessed trustiworthiness and community membership of Twitter accounts by approximating Personalized PageRank using public information about who follows who
- Implemented a flexible and generic Personalized PageRank approximation using a combination of generic-function and encapsulation object-oriented programming. Managed unreliable Twitter API behavior by caching retrieved data in a Neo4J database running on Docker
Regularizing networks to obtain useful network embeddings
- Devised effective strategies for regularizing network data to obtain high quality spectral embeddings that can be used for real world data analysis
- Developed and theory and conducted extensive simulations and experiments on network data to understand and interpret impact of regularization methods
Statistical Consultant
Self-employed
August 2018 - Present
Remote
- Advised psychologists, biologists and neuroscientists on experimental design, experimental data analysis, data visualization, project management and statistical communication
- Created reports to clearly communicate key takeaways from experimental data, using reproducible analysis practices to allow for easy extension and collaboration
Research Intern
Facebook, Core Data Science
May 2021 - August 2021
Remote
- Experimented with hyperbolic graph embeddings as a method to suggest related entities in large knowledge graphs. Used Python, PyTorch and SQL
- Synthesized academic results on hyperbolic embeddings and experiments on internal datasets. Advise research teams on future use of hyperbolic methods
- Prototyped pipeline to automatically suggest content tags, using neural networks to embed co-occurrence datasets
Research Intern
Facebook, Core Data Science
May 2020 - October 2020
Remote
- Designed a novel metric to understand the reliability of classifiers trained on a rolling basis
- Used Python, scikit-learn and SQL to implement calibration techniques for machine learning models to compute reliability
- Communicated across departments to help other teams understand and trust estimated prevalence of objectionable content
- Illustrated uses of the reliability metric and explained computational details by authoring a conference manuscript
Intern
RStudio, tidymodels team
June 2018 - October 2018
Remote
- Improved API consistency and reduced maintenance burden of
broom
package (750K+ downloads/month, part of thetidyverse
) by re-factoring thousands of lines ofR
code and developing a new test suite - Authored contribution guidelines and coordinated 40+ pull requests from open source contributors, resolving 80+ open Github issues and bugs
- Managed major new release (
broom 0.5.0
) by giving presentations and authoring blog posts to pro-actively communicate with stakeholders
Education
Ph.D. Statistics
University of Wisconsin-Madison (anticipated May 2024)
M.S. Statistics
University of Wisconsin-Madison, 2020
B.A. Statistics with Distinction in Research and Creative Work
Rice University, 2018
Skills
Human
- Explaining technical concepts and challenges to non-technical audiences
Specialized skills
- Network analysis
- Causal machine learning
- Interference & mediation
- Factor analysis
- Multivariate analysis & dimension reduction
- Network embeddings
- Matrix completion
- Missing data in networks
- Dependent data
Statistics
- Quantitative data analysis
- Data collection and management
- Data visualization
- Causal inference
- A/B testing
- Experiments
- Regression
- Clustering
- Methods development
- Hypothesis testing
Software engineering
- R package development
- Python
- SQL
- Reproducible data analysis
- Linux
- Git and version control
- Some C++, Docker, AWS
Statistical software and projects
broom
(CRAN/tidyverse)distributions3
(CRAN)aPPR
(Github)vsp
(CRAN)fastadi
(CRAN)- Scientific software reviewer for ROpenSci and the Journal of Open Source Software
- Collaborated with ROpenSci to established standards for statistical software
- Co-taught machine learning workshop based on
tidymodels
atrstudio::conf(2019)
- Mentored new open source contributors
Last updated 2023-11-12.