Numbers Don’t Lie: Preserving Diversity in Data

A culture of diversity and inclusion can help tackle bias in data. It requires constant vigilance and being mindful of each dataset we employ for decision-making to ensure that there is no discrimination.

Last Updated: September 2, 2022

Our world runs on data. But, to ensure data quality and intelligent decision-making, we must tackle the prevalent bias in data today. Debbie Botha, chief partnership officer at Women in AI, is a strong voice in the diversity in data conversation. This is an article based on her conversation with Loris Marini, the founder and host of the podcast, Discovering Data.

Bias prevents data from serving its true purpose. A culture of diversity and inclusion can help tackle bias in data. It requires constant vigilance and being mindful of each dataset we employ for decision-making to ensure that there is no discrimination with regard to gender, age, industry or sector.

The present state of the industry warrants a closer look at impostor syndrome, the undeniable value of mentorship, the need to address toxic environments and managing the lack of standardization.

The Vision for Diversity

Statistics from the World Economic Forum point outOpens a new window that only 22% of workers in data analytics and AI are women, only 10% of all publications in the field are by women, and that half of the women working in AI and data quit their careers mid-trajectory. These are worrying numbers that highlight the gender inequality that persists in the sector.

Debbie statesOpens a new window that the aim “is to increase the numbers, the presence, and the imminence of Women in AI, from where they are educated in school, right through to getting their first job, right through to helping them grow in their careers and pave the way for these women through various partnerships in academia, government startups, and so on.” 

The vision to build more diversity in data thus spans beyond inclusion and mentorship at certain career junctures alone and requires a much more holistic, dedicated approach to create communities and processes within these communities where there aren’t people that are systemically left out of the conversation. That, in turn, will tackle the problems of small sample sizes and skewed data that essentially limit the scope of application for the data.

Keeping this challenge in mind, Loris Marini points out, “I do have memories of models that underperformed because the data sets we put in were not representative of the population of the real world. This is a very known problem in data science.”

It is imperative that businesses, governments and communities rally to forge data diversity, especially when it comes to decisions that affect people deeply. For example, medical decisions that are based on non-inclusive research could lead to severe bias and treatment impediments for minority groups who have been left out of studies.

Training People to Recognize Bias

Inclusivity and diversity in the data set not only ensure representation but also uphold the quality of the research and of the generalizations (if any) provided. We need to be trained in sniffing out, as it were, recognizing and managing biases – without the ability to do so, we are essentially studying blind data or data that can only see partially. 

Algorithms need to be built mindfully to attend to biases, weed them out and promote inclusivity. That’s also the key to richer data where complexities and nuances are not lost and where women and minorities are included in the planning stage, not merely as an afterthought. 

For organizations, this movement against bias and discrimination needs to begin with equitable diversity on the board, in the C-suite, and across all strategy tables, analytics and AI solutions.

Becoming bias-conscious requires training human and AI resources to look out for patterns of exclusion and discrimination. One way to do that is to have airtight systems and processes in place that allow zero oversight. The other way is to keep the conversation running, to understand that this is an ongoing journey. Protocols for flagging possible bias also need to be set up and communicated to make sure the next steps are understood and followed.

See More: Women In Tech: Battling All Odds with an Intrapreneur Mindset

Finding Our Tribe

It’s essential to build inclusive communities where data diversity is fostered and put to practical use. As humans, we didn’t evolve in isolation – every interaction along the way had a part to play. It’s the same with data – every node of the larger collective needs to feel seen and heard. In data diversity, we can find our tribes. 

When data is truly representative of a population, every individual can find a sense of belonging. Data analysis is not just about dry numbers and statistics – it’s about people. Organizations and communities need to ensure that no individual reality is left out or represented incorrectly. Whether it’s about effective corporate decision-making with AI, DevOps or MLOps, government strategies for future growth, or community planning, the richer the data, the smarter the decision. 

The Future through a Kaleidoscope

We live in an interesting world – one that runs on data and often challenges us to choose between privacy and personalization. While data security concerns naturally loom large, skewed data could provide its own set of challenges with certain sections of society being underrepresented or misrepresented.

Present generations may be keen to work with immaculate data, but in reality, all data can be quite messy, demanding care and attention. Information and conclusions may often need to be coaxed out. However, better that, than working with data that is only partially validated. We need to be able to look at data and our future through the kaleidoscope of diversity, inclusion and authentic representation.

Data is the greatest asset of our times, but it is only an asset when it provides true insight and serves its purpose. Without true diversity in data, we rob it of its potential to serve us with a holistic picture of things as they indeed are. Any research aims to delve into certain aspects of our environment, be it for developing marketing and advertising strategies, testing medical hypotheses, or studying geo-political relations, and therefore needs to have data that is both deep and wide. Our hope and effort for the near future should be to work towards more organic data diversity where every voice is heard.

What have you been doing differently to ensure data diversity and smarter decisions? Tell us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear all about it!

About Expert Contributors: The Expert Contributor program is designed to help kickstart meaningful conversations around the priorities and challenges most critical to C-level executives. The insights and perspectives will help CIOs tackle what’s most important to them. We are always looking for industry thinkers who can help set the narrative for our enterprise audience. To know more about this program, and submit your ideas, reach out to the Spiceworks News & Insights Editorial team at editorial-toolbox@ziffdavis.com.

MORE ON DATA DIVERSITY

Debbie Botha
Debbie Botha

Global Chief Partnership Officer, Women in AI

Certified IBM Thought Leader Information Architect, Distinguished Architect - TOGAF IT Architecture. IBM Subject Matter Expert (SME) for Information Architecture and establishing Chief Data Officer's Offices. Program Management and Architecture of Data Lake, Data Warehousing and Business Intelligence experience working on more than 20 Data Warehouses and Data Lakes. I develop the Data & BI Strategies, Data Warehouse Solution Architecture, Data modeling, Data Integration, Data mapping, Data Quality Assessments and Resolution Strategies, Cleansing of data. Debbie's area of expertise is Enterprise Information Architecture where she looks at Information across the enterprise to come up with a practical Information Architecture based on her experience and best practice. She develops Enterprise Cloud Strategies for Data / Analytics / Insights, develops Collaborative Self Service strategies, and Digital Intelligence Strategies, with a focus on the cognitive and data aspects.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.