Hidden in the Data: Algorithms and Bias: Q. and A. With Cynthia Dwork

Strict Standards: Only variables should be assigned by reference in /home/noahjames7/public_html/modules/mod_flexi_customcode/tmpl/default.php on line 24

Strict Standards: Non-static method modFlexiCustomCode::parsePHPviaFile() should not be called statically in /home/noahjames7/public_html/modules/mod_flexi_customcode/tmpl/default.php on line 54

Strict Standards: Only variables should be assigned by reference in /home/noahjames7/public_html/components/com_grid/GridBuilder.php on line 29

Hidden in the Data: Algorithms and Bias: Q. and A. With Cynthia Dwork

Cynthia Dwork, a computer scientist at Microsoft Research, discussed how algorithms learn to discriminate and the trade-offs between fairness and privacy. Thor Swift for The New York Times

Algorithms have become one of the most powerful arbiters in our lives. They make decisions about the news we read, the jobs we get, the people we meet, the schools we attend and the ads we see.

Yet there is growing evidence that algorithms and other types of software can discriminate. The people who write them incorporate their biases, and algorithms often learn from human behavior, so they reflect the biases we hold. For instance, research has shown that ad-targeting algorithms have shown ads for high-paying jobs to men but not women, and ads for high-interest loans to people in low-income neighborhoods.

Cynthia Dwork, a computer scientist at Microsoft Research in Silicon Valley, is one of the leading thinkers on these issues. In an Upshot interview, which has been edited, she discussed how algorithms learn to discriminate, who’s responsible when they do, and the trade-offs between fairness and privacy.

Q: Some people have argued that algorithms eliminate discrimination because they make decisions based on data, free of human bias. Others say algorithms reflect and perpetuate human biases. What do you think?

A: Algorithms do not automatically eliminate bias. Suppose a university, with admission and rejection records dating back for decades and faced with growing numbers of applicants, decides to use a machine learning algorithm that, using the historical records, identifies candidates who are more likely to be admitted. Historical biases in the training data will be learned by the algorithm, and past discrimination will lead to future discrimination.

Q: Are there examples of that happening?

A: A famous example of a system that has wrestled with bias is the resident matching program that matches graduating medical students with residency programs at hospitals. The matching could be slanted to maximize the happiness of the residency programs, or to maximize the happiness of the medical students. Prior to 1997, the match was mostly about the happiness of the programs.

This changed in 1997 in response to “a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the expense of applicants, and whether applicants could ‘game the system,’ ” according to a paper by Alvin Roth and Elliott Peranson published in The American Economic Review.

Q: You have studied both privacy and algorithm design, and co-wrote a paper, “Fairness Through Awareness,” that came to some surprising conclusions about discriminatory algorithms and people’s privacy. Could you summarize those?

A: “Fairness Through Awareness” makes the observation that sometimes, in order to be fair, it is important to make use of sensitive information while carrying out the classification task. This may be a little counterintuitive: The instinct might be to hide information that could be the basis of discrimination.

Q: What’s an example?

A: Suppose we have a minority group in which bright students are steered toward studying math, and suppose that in the majority group bright students are steered instead toward finance. An easy way to find good students is to look for students studying finance, and if the minority is small, this simple classification scheme could find most of the bright students.

But not only is it unfair to the bright students in the minority group, it is also low utility. Now, for the purposes of finding bright students, cultural awareness tells us that “minority+math” is similar to “majority+finance.” A classification algorithm that has this sort of cultural awareness is both more fair and more useful.

Fairness means that similar people are treated similarly. A true understanding of who should be considered similar for a particular classification task requires knowledge of sensitive attributes, and removing those attributes from consideration can introduce unfairness and harm utility.

Q: How could the university create a fairer algorithm? Would it mean more human involvement in the work that software does, collecting more personal data from students or taking a different approach when the algorithm is being created?

A: It would require serious thought about who should be treated similarly to whom. I don’t know of any magic bullets, and it is a fascinating question whether it is possible to use techniques from machine learning to help figure this out. There is some preliminary work on this problem, but this direction of research is still in its infancy.

Q: Another recent example of the problem came from Carnegie Mellon University, where researchers found that Google’s advertising system showed an ad for a career coaching service for “$200k+” executive jobs to men much more often than to women. What did that study tell us about these issues?

A: The paper is very thought-provoking. The examples described in the paper raise questions about how things are done in practice. I am currently collaborating with the authors and others to consider the differing legal implications of several ways in which an advertising system could give rise to these behaviors.

Q: What are some of the ways it could have happened? It seems that the advertiser could have targeted men, or the algorithm determined that men were more likely to click on the ad.

A: Here is a different plausible explanation: It may be that there is more competition to advertise to women, and the ad was being outbid when the web surfer was female.

Q: The law protects certain groups from discrimination. Is it possible to teach an algorithm to do the same?

A: This is a relatively new problem area in computer science, and there are grounds for optimism — for example, resources from the Fairness, Accountability and Transparency in Machine Learning workshop, which considers the role that machines play in consequential decisions in areas like employment, health care and policing. This is an exciting and valuable area for research.

Q: Whose responsibility is it to ensure that algorithms or software are not discriminatory?

A: This is better answered by an ethicist. I’m interested in how theoretical computer science and other disciplines can contribute to an understanding of what might be viable options.

The goal of my work is to put fairness on a firm mathematical foundation, but even I have just begun to scratch the surface. This entails finding a mathematically rigorous definition of fairness and developing computational methods — algorithms — that guarantee fairness.

Q: In your paper on fairness, you wrote that ideally a regulatory body or civil rights organization would impose rules governing these issues. The tech world is notoriously resistant to regulation, but do you believe it might be necessary to ensure fairness in algorithms?

A: Yes, just as regulation currently plays a role in certain contexts, such as advertising jobs and extending credit.

Q: Should computer science education include lessons on how to be aware of these issues and the various approaches to addressing them?

A: Absolutely! First, students should learn that design choices in algorithms embody value judgments and therefore bias the way systems operate. They should also learn that these things are subtle: For example, designing an algorithm for targeted advertising that is gender-neutral is more complicated than simply ensuring that gender is ignored. They need to understand that classification rules obtained by machine learning are not immune from bias, especially when historical data incorporates bias. Techniques for addressing these kinds of issues should be quickly incorporated into curricula as they are developed.

Find out more by searching for it!

Custom Search

CoalaWeb Traffic