The human side of data science can no longer be an afterthought
Data science is increasingly shaping our world, from the information we read to who is selected for a job interview, to the length of prison sentence. As the amount of data we collect and analyze grows exponentially, we are beginning to see disturbing consequences: algorithms that reinforce social biases, threats to our privacy, the spread of fake news, and even the weaponization of disinformation for use in cyber warfare.
So far, our approach to data science has dealt with social and ethical concerns after the fact. The “move fast and break things” mindset encourages people to tackle technical challenges first, then focus on human impact and unintended consequences later (if any). In recent years, tech companies have started hiring ethicists and social scientists, but they are often isolated from technical teams and their warnings ignored. Facebook whistleblower Frances Haugen has revealed that the company has failed to follow up on internal research identifying numerous harms its algorithms have done, from promoting hate speech around the world to increasing abuse issues. body image in adolescents.
Instead, we need to integrate human perspectives through data science. People touch every step of the data science process, from data captured to how it is categorized, labeled and manipulated, to what it is used to do. No part of data science is value neutral. This is why my colleagues and I are building human-centered data science, a new interdisciplinary field that combines computer science, social science, and human-computer interaction.
To make data science more human-centric, we need to train and promote π-shaped data scientists. In higher education, we talk about T-shaped scientists who have depth in one area and broad understanding in several other areas, including the social sciences and humanities. This is considered an improvement over I-shaped people, who only have a very narrow knowledge base.
But being T-shaped is not enough. We need π-shaped data scientists with a deep understanding of the technical and human aspects of their work. Just as a doctor cannot do their job effectively if they don’t know how to interact with patients, data scientists must have a thorough understanding of the social and ethical implications of what they are doing. This does not negate the value of having dedicated ethicists and social scientists on a team, but it does not replace the need for data scientists themselves to have a human-centered perspective.
There are many ways for π-shaped people to make data science more human-centric. One of my co-authors on the new book Human-Centered Data Science, Shion Guha, used this approach in his work with a large urban police department to identify and overcome the biases that affect crime maps. By combining information science and statistics and delving into the human side of the crime mapping tool, not just the technical components, Guha noticed that the model used an outdated legal definition of what constitutes crime. sexual assault. This produced inaccurate maps of sexual offenses in the city, which influenced how officers responded to complaints of sexual assault and how complaints were recorded in police databases. Once the error was identified, police were able to get a more accurate picture of sexual assaults in their city and respond accordingly.
While some critics describe the humanistic side as soft or imprecise, Guha’s work – along with that of many others – shows that these perspectives actually make data science more rigorous and precise. The idea that a human perspective is the opposite of technical rigor is a false dichotomy (something I will talk more about as a keynote speaker at the Women in Data Science (WiDS) Global Conference on March 7). In fact, being human-centric enhances our ability to accurately represent the world around us through data.
Human-centricity can make data science more rigorous by involving users and other stakeholders in the design process of technical tools. Understanding how users think can help us find innovative approaches to presenting data in ways that people can more easily understand and use. At the Human Centered Data Science Lab that I lead at the University of Washington, we started the Traffigram project to create more intuitive maps. Rather than showing you how far a place is in miles, these maps show how long it will take you to get there based on your location, current traffic, and public transit options.
Human-centered data science often incorporates quantitative and qualitative methods and approaches from computer science and social science. In my lab, we have developed several tools to help social scientists analyze qualitative data such as text chats and social media posts. Traditional methods take too long to organize and analyze large amounts of text, but our apps speed up the process by using visualization, one of the most efficient ways for humans to absorb large amounts of information. These hybrid conversation analysis tools were combined with insights from psychology to study issues such as how people collaborate and interact when working remotely by analyzing their chat logs.
We urgently need to rethink what it means to be a good data scientist and recognize that it’s not just about technical skills. Every company that uses data science – which now includes almost all of them – must prioritize hiring, promoting and supporting π-shaped people with technical and social science backgrounds, until C. Higher education institutions should integrate ethical perspectives and social science training into their data science and artificial intelligence curricula. It is increasingly irresponsible, both for students and for society as a whole, not to train the next generation of data scientists to have a nuanced understanding of the impact of their algorithms on society.
As data science exerts an ever-increasing influence on our lives, the societal consequences will only get more complicated and the stakes will rise. Pressure from the public and policy makers to address issues such as algorithmic bias and misinformation will continue to grow. Any forward-thinking institution that wants to be competitive in five to ten years needs to make human-centered data science a priority now.
Dr. Cecilia Aragon is a professor in the College of Engineering at the University of Washington and director of the Human-Centered Data Science Lab. She will deliver the keynote address, “The Rigorous and Human Life of Data”, at the World Conference on Women in Data Science (WiDS) which will take place on March 7 at Stanford University and online. His latest book is Human-Centered Data Sciencefrom MIT Press.
The opinions expressed in this article are those of the author.