DriveScale Says Big Data Needs a New Kind of Data Center Infrastructure

DriveScale, the Silicon Valley data center technology startup founded by a group of Sun and Cisco veterans who were behind some of the two iconic companies' core data center product lines, such as Sun's x86 servers and Cisco's Nexus Switches and Unified Computing System (Cisco UCS), has built a scale-out IT solution geared specifically for Big Data applications. The company, which recently came out of stealth and announced a $15 million funding round, is addressing a growing need in the data center and has a founding team whose technological abilities are undeniable, but its current product is only on its first generation and still has a ways to go before it is proven out in the market.

Let’s back up a little and discuss why a scale-out solution for Big Data is important. Creating virtual controllers which enable some kind of software-defined platform aren't anything new. In storage, we’ve seen this with Atlantis USX and VMware vSAN; in networking, it's Cisco ACI, Big Switch, and VMware NSX. The vast majority of these technologies however are designed for traditional workloads, such as virtual desktop infrastructure, databases, application virtualization, web portals, and so on.

What about managing one of the fastest-growing aspects of IT today? What about controlling a new critical source of business value? What about creating a virtual controller for Big Data management?

According to a recent survey by Gartner, investment in Big Data continued to increase in 2015. More than three-fourths of companies are investing or planning to invest in Big Data technologies in the next two years.

"This year begins the shift of big data away from a topic unto itself, and toward standard practices," Nick Heudecker, research director at Gartner, said in a statement. "The topics that formerly defined Big Data, such as massive data volumes, disparate data sources, and new technologies are becoming familiar as Big Data solutions become mainstream. For example, among companies that have invested in Big Data technology, 70 percent are analyzing or planning to analyze location data, and 64 percent are analyzing or planning to analyze free-form text."

According to Gartner, organizations typically have multiple goals for Big Data initiatives, such as enhancing the customer experience, streamlining existing processes, achieving more targeted marketing, and reducing costs. As in previous years, organizations are overwhelmingly targeting enhanced customer experience as the primary goal of Big Data projects (64 percent). Process efficiency and more targeted marketing are now tied at 47 percent. As data breaches continue to make headlines, enhanced security capabilities saw the largest increase, from 15 percent to 23 percent.

"As Big Data becomes the new normal, information and analytics leaders are shifting focus from hype to finding value," Lisa Kart, also a research director at Gartner, said in a statement. "While the perennial challenge of understanding value remains, the practical challenges of skills, governance, funding, and return on investment come to the fore."

Here are some more key Gartner forecasts on Big Data:

By 2020, information will be used to reinvent, digitalize or eliminate 80 percent of business processes and products from a decade earlier.
Through 2016, less than 10 percent of self-service business intelligence initiatives will be governed sufficiently to prevent inconsistencies that adversely affect the business.
By 2017, 50 percent of information governance initiatives will have incorporated the concept of information advocacy to ensure they are value-driven.

So where is DriveScale aiming to make a difference?

Scale-Out Rack Data Center Architecture

DriveScale was born because of three big trends:

Rise of the software scale-out stack and demands around Big Data. There is a clear need to make Big Data workloads a lot more resilient, available, and efficient. Most of all, these workloads need to be able to scale dynamically. Furthermore, there is a need for intelligent workload management with failure-prone hardware to ensure data sets are safe and available. Ultimately, DriveScale aims to create a more resilient ecosystem with greater provisioning capabilities for data.
Commodity and white box technologies. You have network, storage and compute already in your data center. Why replace it when you can just manage it more effectively for Big Data initiatives? A big challenge for organizations looking to deploy better Big Data management ecosystem is that they were using traditional means to manage large data sets. DriveScale comes in with a virtual controller, positioned as the software-defined management layer, which unifies critical resources for Big Data delivery.
The network layer has evolved. We’re far beyond the days of 1GbE connections.We’re seeing more connectivity capabilities and a lot more intelligence at the networking layer. DriveScale’s technology aims to exploit this to deliver Big Data workloads much faster. Tight awareness of the connectivity and topology within the rack allows DriveScale to get more information about the drives, the data they’re processing, and priority of the information. For example, they can see which drives are fewer hops away from the server; basically creating “Ethernet in the rack” as it relates to data management and resource distribution.

“Our observation is that networking at 10GbE and beyond was becoming less expensive and more available,” said Tom Lyon, DriveScale chief scientist and co-founder of DriveScale. “So, the increased amount of bandwidth and network controls allowed for new kinds of architectures to take place.”

In the past, Lyon held key engineering roles at Nuova Systems, a startup acquired by Cisco in 2008 whose technology became the basis of Cisco’s UCS servers and Nexus switches.

DriveScale didn’t set out to solve the world’s software-defined infrastructure and convergence problems. Rather, they focused their strategy on overcoming two big challenges:

Difficulties around managing large data sets and Big Data environments. Organizations were relegated to managing siloed Big Data operations, often with traditional compute, storage, and network mechanisms. DriveScale not only works to resolve these challenges, it specifically focuses on the scale-out application market. Platforms by Hadoop, MapR, Cloudera, and others can be integrated with a REST API architecture.
Server admins have real pain managing Hadoop and scale-out environments. Business are under pressure to get value from the data they process. Why? This data is becoming increasingly valuable to the entire business process. Rather than deploying traditional servers with “trapped” disks, DriveScale changes the way administrators control resources provisioned for Big Data workloads. Using the software, you can provide disk clusters (and servers) to a Big Data application set. And, these resources are indistinguishable from resources available from a regular rack-mount server. Basically, you’re no longer constrained by a server chassis and can literally scale out. Lyon calls it “software-defined sheet metal.”

To overcome these challenges, DriveScale had to create a new type of management architecture. “We invented a rack-scale architecture which maximized network, compute, and the storage environment,” said Tina Nolte, director of product management at DriveScale. “It’s a new type of logical layer which allows you to create software-defined nodes managing complex and scaling Big Data environments.”

The architecture, at a high-level, is fairly straightforward:

Storage, network, and compute is totally up to you. Have a favorite vendor? Great, DriveScale will likely work with them, no problem.
Your network layer acts as the connector. You use existing networking components to enable the communication between resources. From here, you can manage load between links in the nodes, create cluster-level management where you pick and create your own granular rules and create access controls based on rights and app-level policies.
The magic is in the software. The DriveScale software allows you to create the aforementioned software-defined Big Data nodes. Furthermore, it allows you to granularly rebalance the rations of compute to storage. Basically, as your as your Big Data environment evolves and grows with new business demands, it adjusts dynamically.

So, what’s the difference between DriveScale and other software-based hyperconverged-infrastructure solutions. Hyperconvergence focuses on traditional workloads with scale-out architecture and a virtual controller. DriveScale focuses on scale-out workloads (like big data), using commodity hardware, with scale-out software.

Final Thoughts

Again, DriveScale's product and business are still in their early stages. The company is still working on making strategic partnerships and creating validated reference architectures. Creating those references and alliances (with companies like Hewlett Packard Enterprise, Cisco, Dell, Super Micro, and others) will go a long way in enabling further adoption and greater validation. Furthermore, it will help with support should there be a problem. Many organizations like strategic partnerships, which allow them to have just one support line to reach out to.

The technology powering DriveScale is aiming at solving a growing problem in the industry. The scale-out application market is evolving very quickly and organizations need help in this area. Big Data is constantly getting bigger and changing the way business intelligence is shaping the modern organization. For now, DriveScale is the only company that's taking a specific “software-defined” aim at the scale-out application industry. Based on the trends, however, it isn't likely to stay lonely for long.

Comments

Plain text