New Directions For Data Insight

Darin Briskman, director of technology, Imply discusses the history of database evolutions of relational CRUD (create, read, update, delete) from data warehousing into the modern era, and why there is a need for architecture to succeed beyond CRUD.

August 16, 2022

Organizations all over the world are racing to unlock the power of data. From analyzing customer behavior and accelerating R&D to delivering competitive advantage, investment in the tools and technologies required to realize its value is increasing at a pace.

But where are we heading in the quest for data-driven insight? Looking back reveals an interesting development timeline where, around half a century ago, traditional hierarchical and network models that required pens, paper, and mechanical calculators finally gave way to technology-led innovation.

In 1970, IBM’s Edgar CoddOpens a new window published the definitive Relational Database Management Systems (RDBMS) model, ushering in a new era for data. This approach formed the basis of a data revolution in the 1980s and 1990s, deriving the tables with rows and columns format that we still use today.

Codd’s work inspired another team at IBM to develop Structured Query Language (SQL), which made it much easier to manage data in an RDBMS. In practical terms, relational SQL’s ability to manage CRUD (Create, Read, Update, and Delete) data revolutionized how it made large data sets practical when computing and storage were very costly. As a result, worldwide organizations began using SQL, and a new wave of relational databases came to market.

CRUD Appliances

As analytics evolved, a new wave of appliances appeared that used relational CRUD but incorporated new software categories to extract data from transactional systems. These appliances proved very popular; by the early 2000s, nearly all Global 2000 firms and other similarly-sized non-commercial and government organizations were using them, along with extract-transform-load data integration tools and business intelligence tools to turn data into pictures and reports that people can more easily use.

As with almost every other aspect of the information economy, the internet radically transformed the data ecosystem and increased the amount of data being created and used. The pace of change has been remarkable: in the ’90s, for instance, a “big” application would have been made up of perhaps 5,000 users and a one-Terabyte data warehouse. By the early 2000s, “big” was defined by the social media giants who, back then, were already supporting millions of users and data warehouses of multiple Petabytes. These rapid advances also revealed that pushing this much data through CRUD pipelines was costly and limited.

From CRUD to Cloud

The cloud made unlimited cheap computing power and affordable storage on-demand available to everyone with an internet connection, and, as a result, a new era of analytics emerged. In the pre-cloud era, on-premises applications were limited, infrastructure and software licenses were expensive, and increasing capacity took time and money.

However, computing can be added and removed on-demand in the cloud, and storage is durable and cheap. Suddenly, analytics became significantly more scalable and less expensive than ever before. New ecosystems of cloud data warehouses, data pipelines, visualization tools, and data governance capabilities redefined analytics.

Moreover, cloud computing inspired rapid application growth, allowing the average business – not just internet giants – to operate applications that support millions of users… Once again, however, progress was hampered by the inefficiency and cost associated with pushing huge new volumes of data through CRUDdy.

At this stage, data engineers struggled to understand how to interact with high-volume data streams from the Internet and other applications. Why not analyze the data stream instead of applying it to relational CRUD?

Streaming Insights and Action with Modern Analytics Applications

In the 2010s, the emergence of microservices architecture at the scale required a robust and reliable method to pass data between pieces of applications and between organizations. The emerging data transfer technology is streaming, with several open source and commercial options to implement and manage streams that can contain anywhere from hundreds to millions of events every second. 

With streaming transactions came the need for streaming analytics. The first generation provided stream processors that can perform simple SQL queries on streams. But cannot scale across large numbers of streams and the functions to put streaming data (“what’s happening NOW”) into context (“what happened before”).

Organizations are looking for database technologies with sub-second response times for questions across billions of data points in response to a growing need for a technology that utilizes both streaming and historical data. Concurrency is also crucial because there may be hundreds of people asking questions at the same time.

This technology also needs to be affordable, as scaling up cloud data warehouses and other technologies not designed for high-performance with high concurrency is possible but expensive.  

The answer lies in developing real-time analytic databases that combine CRUD with streams for high concurrency and sub-second response times across billions of data points. In doing so, worldwide organizations are placed to embrace the new era of data insight, powered by technologies designed to meet their needs from the ground up.

Several real-time analytic databases are available, including both open source (Apache Druid, Apache Pinot, ClickHouse) and commercial options. While this segment of analytic databases, in 2022, is less than 2% of the total analytics data world, real-time analytics is more than doubling each year as the adoption of streams, and large-scale requirements grow.

Real-time analytics, building on six decades of CRUD and other technologies, is now going beyond insights to drive actions, enabling humans and machines to make the best choices to achieve their missions. Automating data-driven insights to drive actions is rapidly becoming the core of modern analytics applications, emerging from a niche approach to the “new normal” in a few years.  

How do you think the shift to modern database architecture can streamline the data analytics process? Share with us on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!

MORE ON REAL-TIME ANALYTICS

Darin Briskman
Darin Briskman is Director of Technology at Imply, where he helps developers create modern analytics applications. He began his career at NASA in the 1980s (ask him about rockets!), and has been working with large and interesting data sets ever since. Most recently, he's had various technical and leadership roles at Couchbase, Amazon Web Services, and Snowflake. When he is not writing code, Darin likes to juggle, blow glass (usually not while juggling), and working to help children on the autism spectrum learn to use their special abilities to work better with the neuronormative.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.