Even in a simple development environment, machines and algorithms are still powered by human intelligence.

David Magerman, Co-Founder and Managing Partner, Differential Ventures

December 24, 2021

5 Min Read
human hand and robotic hand
Sergey via Adobe Stock

No-code, low-code (horizontal) machine learning platforms are useful at scaling data science in an enterprise. Still, as many organizations are now finding out, there are so many ways that data science can go wrong in solving new problems. Zillow experienced billions of dollars in losses buying houses using a flawed data-driven home valuation model. Data-driven human resources technology, especially when based off facial recognition software, has been shown to bias hiring decisions against protected classes.

While automation is a great tool to have in your arsenal, you need to consider the challenges before utilizing a horizontal ML platform. These platforms need to be flexible, configurable, and monitorable to be robust and consistently add value over time. They need to allow data to be weighted flexibly in user-controlled ways and have data visualization tools to detect outliers and contributors to noise. They also need automated model parameters and data drift monitors to alert users to changes. As you can see, we haven’t evolved beyond the point where algorithms outmatch human intelligence.

So, don’t be fooled by AI/ML/low code … you still need people. Let’s take a closer look at the reasons why.

Machines Learn from Humans

Trying to replace human data scientists, domain experts, and engineers with automation is a hit-or-miss proposition which could lead to disaster if applied to mission-critical decision-making systems. Why? Because human beings understand data in ways that automated systems still struggle with.

Humans can differentiate between data errors and just unusual data (e.g. Game/Stop/GME trading in February) and align unusual data patterns with real-world events (e.g. 9/11, COVID, financial crises, elections). We also understand the impact of calendar events such as holidays. Depending on the data used in ML algorithms and the data being predicted, the semantics of the data might be hard for automated learning algorithms to discover. Forcing them to uncover these hidden relationships isn’t necessary if they aren’t hidden to the human operator.

Aside from semantics, the trickiest part of data science is differentiating between statistically good results and useful results. It’s easy to use estimation statistics to convince yourself you have good results or that a new model gives you better results than an old model, when in fact neither model is useful in solving a real-world problem. However, even with valid statistical methodologies, there is still a component to interpreting modeling results that requires human intelligence.

When developing a model, you often run into issues about what model estimation statistics to measure: how to weight them, evaluate them over time, and decide which results are significant. Then there is the whole issue of over testing: If you test too frequently on the same data set, you eventually “learn” your test data, making your test results overly optimistic. Finally, you have to build models and figure out how to put all these statistics together into a simulation methodology that will be achievable in the real world. You also need to consider that just because a machine learning platform has been successfully deployed to solve a specific modeling and prediction problem doesn’t mean that repeating the same process on a different problem in that domain or in a different vertical is going to lead to the same successful outcome.

There are so many choices that need to be made at each step of the data science research, development, and deployment process. You need experienced data scientists for designing experiments, domain experts for understanding boundary conditions and nuances of the data, and production engineers who understand how the models will be deployed in the real world.

Visualization is a Data Science Gem

In addition to weighting and modeling data, data scientists also benefit from visualizing data, a very manual process, and more of an art than a science. Plotting raw data, correlations between data and quantities being predicted, and time-series of coefficients resulting from estimations across time can yield observations that can be fed back into the model construction process.

You might notice a periodicity to data, perhaps a day-of-week effect or an anomalous behavior around holidays. You might detect extreme moves in coefficients that suggest outlier data is not being handled well by your learning algorithms. You might notice different behavior across subsets of your data, suggesting that you might separate out subsets of your data to generate more refined models. Again, self-organizing learning algorithms can be used to try to discover some of these hidden patterns in the data. But a human being might be better equipped to find these patterns, and then feed insights from them back into the model construction process.

Horizontal ML Platforms Need Monitoring

Another important role people play in the deployment of ML-based AI systems is model monitoring. Depending on the kind of model being used, what it is predicting, and how those predictions are being used in production, different aspects of the model need to be monitored so that deviations in behavior are tracked and problems can be anticipated before they lead to degradation in real-world performance.

If models are being retrained on a regular basis using more recent data, it is important to track the consistency of the new data entering the training process with the data previously used. If production tools are being updated with new models trained on more recent data, it is important to verify that the new models are as similar to old models as one might expect, where expectation is model- and task-dependent.

There are clearly enormous benefits to applying automation to a broad set of problems across many industries, but human intelligence is still intrinsic to these developments. You can automate human behavior to a degree and, in controlled environments, replicate the power and performance of their work with no-code, low-code ML-based AI systems. But, in a world where machines are still heavily reliant on humans, never forget the power of people.

About the Author(s)

David Magerman

Co-Founder and Managing Partner, Differential Ventures

David Magerman is a co-founder and managing partner at Differential Ventures. Previously, he spent the entirety of his career at Renaissance Technologies, widely recognized as the world’s most successful quantitative hedge fund management company. He helped found the equities trading group at Renaissance, joining the group in its earliest days, playing a lead role in designing and building the trading, simulation, and estimation software. After a decorated career in quantitative finance, he is using his data science, software development, and statistical modeling expertise to help startups succeed in the global marketplace.

David holds a PhD in Computer Science from Stanford University where his thesis on Natural Language Parsing as Statistical Pattern Recognition was an early and successful attempt to use large-scale data to produce fully automated syntactic analysis of text. David also earned a Bachelor of Arts in Mathematics and a Bachelor of Science in Computer Sciences and Information from the University of Pennsylvania.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights