Skip to main content

OpenAI releases Triton, a programming language for AI workload optimization

OpenAI booth at NeurIPS 2019 in Vancouver, Canada
Image Credit: Khari Johnson / VentureBeat

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

OpenAI today released Triton, an open source, Python-like programming language that enables researchers to write highly efficient GPU code for AI workloads. Triton makes it possible to reach peak hardware performance with relatively little effort, OpenAI claims, producing code on par with what an expert could achieve in as few as 25 lines.

Deep neural networks have emerged as an important type of AI model, capable of achieving state-of-the-art performance across natural language processing, computer vision, and other domains. The strength of these models lies in their hierarchical structure, which generates a large amount of highly parallelizable work well-suited for multicore hardware like GPUs. Frameworks for general-purpose GPU computing such as CUDA and OpenCL have made the development of high-performance programs easier in recent years. Yet GPUs remain especially challenging to optimize, in part because their architectures rapidly evolve.

Domain-specific languages and compilers have emerged to address the problem, but these systems tend to be less flexible and slower than the best handwritten compute kernels available in libraries like cuBLAS, cuDNN, or TensorRT. Reasoning about all these factors can be challenging even for seasoned programmers. The purpose of Triton, then, is to automate these optimizations, so that developers can focus on the high-level logic of their code.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

“Novel research ideas in the field of deep learning are generally implemented using a combination of native framework operators … [W]riting specialized GPU kernels [can improve performance,] but [is often] surprisingly difficult due to the many intricacies of GPU programming. And although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility, [or] generate code noticeably slower than our hand-tuned baselines,” Philippe Tillet, Triton’s original creator, who now works at OpenAI as a member of the technical staff, wrote in a blog post. “Our researchers have already used [Triton] to produce kernels that are up to 2 times more efficient than equivalent Torch implementations, and we’re excited to work with the community to make GPU programming more accessible to everyone.”

Simplifying code

According to OpenAI, Triton — which has its origins in a 2019 paper submitted to the International Workshop on Machine Learning and Programming Languages — simplifies the development of specialized kernels that can be much faster than those in general-purpose libraries. Its compiler simplifies code and automatically optimizes and parallelizes it, converting it into code for execution on recent Nvidia GPUs. (CPUs and AMD GPUs and platforms other than Linux aren’t currently supported.)

“The main challenge posed by our proposed paradigm is that of work scheduling — i.e., how the work done by each program instance should be partitioned for efficient execution on modern GPUs,” Tillet explains on Triton’s documentation website. “To address this issue, the Triton compiler makes heavy use of block-level data-flow analysis, a technique for scheduling iteration blocks statically based on the control- and data-flow structure of the target program. The resulting system actually works surprisingly well: our compiler manages to apply a broad range of interesting optimization automatically.”

The first stable version of Triton, along with tutorials, is available from the project’s GitHub repository.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.