Twenty-nine teams involving 61 analysts used the same dataset to address the same research question: whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players. Analytic approaches varied widely across teams, and estimated effect sizes ranged from 0.89 to 2.93 in odds ratio units, with a median of 1.31. Twenty teams (69%) found a statistically significant positive effect and nine teams (31%) observed a non-significant relationship. Overall 29 different analyses used 21 unique combinations of covariates. We found that neither analysts’ prior beliefs about the effect, nor their level of expertise, nor peer-reviewed quality of analysis readily explained variation in analysis outcomes. This suggests that significant variation in analysis of complex data may be difficult to avoid, even by experts with honest intentions.