Papers
-
Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse
arXiv Β· arXiv -
Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge
arXiv Β· arXiv -
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
EMNLP 2025 Β· arXiv -
Stronger Neyman Regret Guarantees for Adaptive Experimental Design
ICML 2025 Β· arXiv code -
Improving LLM Group Fairness on Tabular Data via In-Context Learning
AIES 2025 Β· arXiv -
Precise Model Benchmarking with Only a Few Observations
EMNLP 2024 Β· arXiv -
A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
ECCV 2024 Β· arXiv ECCV code -
Multicalibration for Confidence Scoring in LLMs
ICML 2024 Β· arXiv code -
Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations
International Journal of Computer Vision Β· arXiv (w/ power analysis in C.5) IJCV code PyPI -
Estimating the Likelihood of Arrest from Police Records in Presence of Unreported Crimes
Annals of Applied Statistics Β· arXiv AOAS code -
The Progression of Disparities within the Criminal Justice System: Differential Enforcement and Risk Assessment Instruments
FAccT 2023 Β· arXiv -
Homophily and Incentive Effects in Use of Algorithms
CogSci 2022 Β· arXiv -
Human Discernment of Algorithmic Errors: A Case Study in Child Welfare
SSRN -
Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging
FAccT 2022 Β· arXiv platform -
Racial Disparities in the Enforcement of Marijuana Violations in the US
AIES 2022 Β· arXiv code -
On the Validity of Arrest as a Proxy for Offense: Race and the Likelihood of Arrest for Violent Crimes
AIES 2021 (oral) Β· arXiv ACM code -
The Impact of Algorithmic Risk Assessments on Human Predictions and its Analysis via Crowdsourcing Studies
CSCW 2021 Β· arXiv ACM data+code -
maars: anRimplementation of Models As Approximations
Β· arXiv code talk -
Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty
AIES 2021 Β· arXiv ACM -
Lessons from the Deployment of an Algorithmic Tool in Child Welfare
Fair & Responsible AI Workshop, CHI 2020 Β· workshop -
A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores
CHI 2020 Β· arXiv ACM Medium post -
Fairness Evaluation in the Presence of Biased Noisy Labels
AISTATS 2020 Β· arXiv PMLR -
TRAP: A Predictive Framework for Trail Running Assessment of Performance
Journal of Quantitative Analysis in Sports Β· arXiv JQAS Talk @ MIT SSAC
β Best poster award at NESSIS 2019 and at CMSAC 2019 (1 of 4) Β· poster -
Trajectories of Prescription Opioids Filled Over Time
PLOS ONE 2019 Β· PLOS
Misc
-
Why PATTERN Should Not Be Used: The Perils of Using Algorithmic Risk Assessment Tools During COVID-19
Issue brief of the Partnership on AI Β· issue brief