When I was in college, I got BA degrees in psychology from the University of California at Irvine (Go Anteaters!) and applied mathematics from California State University at Fullerton (Go Titans!) I learned pretty quickly that applying statistics to people usually doesn’t end well. Applying machine learning to people has pitfalls too.
Many research studies have shown that Black sounding names like LaKeisha and Rasheed are perceived to be associated with people who have lower intelligence, higher likelihood of being violent, higher likelihood of drug use and criminal behavior, and so on. White sounding names like Emily and Greg are perceived to be associated with higher intelligence and so on. These negative perceptions of Black attributes are shared by both Black and white observers.
There are many variations of a study where researchers will create two identical resumes but one for “Greg Baker” and one for “Rasheed Jefferson”. The Greg Baker resume is always received more favorably. For example, the research paper “Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination” by M. Bertrand and S. Mullainathan was widely publicized in the media.
Well, this bias is not rocket science and rather expected. And in the job/resume scenario, employers would also wrestle with the idea that hiring a minority employee creates a significant risk of an eventual spurious discrimination lawsuit. The problem of course is that these statistics are just aggregate metrics that describe a group of people, not a specific person.
Google image search for “murder arrest”.
According to Wikipedia articles, the statistics are pretty stark. Black people as a whole have an average IQ that is about one standard deviation (15 points) less than the average for white people. Black students score much lower on average on the SAT (177 points) and ACT tests. By age 23, half of all Black males have been arrested and convicted. Black males commit over half of all murders even though Black males are less than seven percent of the population. In short, Black sounding names are associated with a long laundry list of negative attributes.
High IQ scores are positively correlated with many positive attributes such as career success and income. Low IQ scores are correlated with violence and criminal behavior. See “Thirty Years of Research on Race Differences in Cognitive Ability”, Rushton and Jenson.
Worse perhaps than the cold statistics are the constant barrages of negative media information. News feeds seem to continuously feature a story of a Black person committing a heinous crime. And a Google image search for almost anything related to serious crime turns up images of almost entirely Black people. Media may be reflecting reality but it is likely producing negative associations too.
So, the cautionary tale for machine learning is that if an ML prediction system is created that applies to people (for example, an automated resume scanning system for a human resources department), it’s possible to unintentionally include name information and then the prediction system could learn to associate Black names with all the bad statistics associated with Black people as a group, rather than evaluating the person-input independently. This is why I only use machine learning for things like predicting sports scores, and not for systems that deal with people.
The moral of the story is simple: be cautious when combining people and numbers. I evaluate people using my head and my heart, not numbers.
Update: Literally minutes after I wrote this blog post a news story appeared that described the conviction of a teen named Dawnta who murdered a female police officer in Baltimore. ML systems that train using news feeds could find hundreds of stories like this and be influenced by them.
Fractals combines art and numbers. I remember programming an image of the Mandelbrot set (center image) years ago using 8086 assembly language on an IBM PC.