What are the main concerns of data scientists? The 2022 State of Data Science Report has the answers
Couldn’t attend Transform 2022? Discover all the summit sessions now in our on-demand library! Look here.
Data science is a rapidly growing technology as organizations of all sizes embrace artificial intelligence (AI) and machine learning (ML), and along with this growth comes a lot of concern.
The 2022 State of Data Science report, released today by data science platform provider Anaconda, identifies key trends and concerns for data scientists and the organizations that employ them. Among the trends identified by Anaconda is that the open-source programming language Python continues to dominate the data science landscape.
Among the top concerns identified in the report were barriers to the adoption of data science in general.
“One area that surprised me was that 2/3 of respondents felt that the biggest barrier to successful adoption of data science by businesses was insufficient investment in data engineering and the tools to enable producing good models,” Peter Wang, CEO and co-founder of Anaconda, told VentureBeat. “We’ve always known that data science and machine learning can suffer from poor models and inputs, but it was interesting to see our respondents rank this even higher than the talent/workforce gap.”
MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.
AI bias in data science is far from a solved problem
The issue of AI bias is well known in data science. What isn’t as well known is exactly what organizations are actually doing to combat the problem.
Last year, Anaconda’s State of Data Science 2021 found that 40% of organizations were planning or doing something to address the bias problem. Anaconda didn’t ask the same question this year, opting instead for a different approach.
“Instead of asking whether organizations planned to address bias, we wanted to look at specific actions organizations are currently taking to ensure fairness and reduce bias,” Wang said. “We realized from our findings last year that organizations had plans underway to address this issue, so for 2022 we wanted to look at what actions they had taken, if any, and where their priorities lay. .”
As part of AI bias prevention efforts, 31% of respondents said they evaluate data collection methods against internally established fairness standards. In contrast, 24% indicated that they did not have fairness and bias mitigation standards in the datasets and models.
The explainability of AI is fundamental to help identify and prevent bias. When asked what tools are used for AI explainability, 35% of respondents indicated that their organizations perform a series of controlled tests to assess the interpretability of the model, while 24% have no no measurements or tools to guarantee the explainability of the model.
“Although each response metric has less than 50% of these efforts in place, the results here tell us that organizations are taking a varied approach to mitigating bias,” Wang said. “At the end of the day, organizations act, they are only at the beginning of their journey to fight against prejudice.”
How data scientists spend their time
Data scientists have a number of different tasks to perform as part of their job.
While deploying models is the desired end goal, it’s not where data scientists spend most of their time. In fact, the study found that data scientists only spend 9% of their time deploying models. Similarly, respondents said they spend only 9% of their time selecting models.
The biggest waste of time is preparing and cleaning data, which accounts for 38% of the time.
The relationship of love and fear with open source
The report also asked data scientists about how they use and view open source software.
Eighty-seven percent responded that their organizations allow open source software. Yet despite this use, 54% of respondents indicated that they were concerned about open source security.
“Today, open source is embedded in almost all software and technology, and it’s not just because it’s cheaper in the long run,” Wang said. “The innovation happening around AI, machine learning, and data science is happening within the open source ecosystem at a speed that cannot be matched by a closed system.”
That said, Wang said it’s understandable for organizations to be aware of the risks of open source and develop a plan to mitigate potential vulnerabilities.
“One of the benefits of open source is that patches and solutions are created in the open rather than behind closed doors,” he said.
The Anaconda report was based on a survey of 3,493 respondents from 133 countries.
VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Discover our Briefings.