Skip to main content

Trifacta expands data preparation tools with Databricks integration

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Trifacta today announced it has integrated its data preparation tools with a data warehouse platform based on the open source Apache Spark framework provided by Databricks. This is in addition to repositories based on an open source data built tool (DBT) that is maintained by Fishtown Analytics.

In both cases, Trifacta is extending the reach of tools it provides for managing data pipelines to platforms that are widely employed in the cloud to process and analyze data, Trifacta CEO Adam Wilson said.

Trifacta traces its lineage back to a research project that involved professors from Stanford University and the University of California at Berkley and resulted in a visual tool that enables data analysts without programming skills to load data. In effect, Trifacta automated extract, transform, and load (ETL) processes that had previously required an IT specialist to perform.

There is no shortage of visual tools that let end users without programming skills migrate data. But Trifacta has extended its offerings to a platform that enables organizations to manage the data pipeline process on an end-to-end basis as part of its effort to meld data operations (DataOps) with machine learning operations (MLOps). The goal is to enable data analysts to self-service their own data requirements without requiring any intervention on the part of an IT team, Wilson noted.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

Google and IBM already resell the Trifacta data preparation platform, and the company has established alliances with both Amazon Web Services (AWS) and Microsoft. Those relationships enable organizations to employ Trifacta as a central hub for moving data in and out of cloud platforms. The alliance with Databricks and the support for DBT further extend those capabilities at a time when organizations have begun to more routinely employ multiple cloud frameworks to process and analyze data, Wilson said.

In general, data engineering has evolved into a distinct IT discipline because of the massive amount of data that needs to be moved and transformed. While visual tools make it possible for data analysts to self-service their own data requirements, organizations are now also looking to programmatically move data to clouds as part of a larger workflow. Many individuals that have ETL programming expertise, often referred to as data engineers, are now in even higher demand than data analysts, Wilson said.

Once considered the IT equivalent of a janitorial task that revolved mainly around backup and recovery tasks, data engineering is now the discipline around which all large-scale data science projects revolve, Wilson noted. In fact, IT professionals with ETL skills have reinvented themselves to become data engineers, Wilson added.

“In the last 12 months, data engineering has become the hottest job in all of IT,” Wilson said.

It remains to be seen just how automated data engineering processes can become in the months and years ahead. Not only is there more data to be processed and analyzed than ever, the types of data that need to be processed have never been more varied. Going forward, a larger percentage of data will be processed and analyzed on edge computing platforms, where it is created and consumed. But the aggregated results of all that data processing will still need to be shared with multiple data warehouse platforms residing in the cloud and in on-premises IT environments.

Regardless of where data is processed, the sheer volume of data moving across the extended enterprise will continue to increase exponentially. The issue now is figuring out how to automate the movement of that data in a way that scales much more easily.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.