FlexFlow
FlexFlow is a DNN framework that discovers fast parallelization strategies for distributed DNN training, utilizing novel algorithms and execution simulators.
Services
FlexFlow offers a Deep Neural Network (DNN) framework that specializes in automatically discovering fast parallelization strategies for distributed DNN training. It employs an automated search algorithm to find highly optimized strategies, often outperforming traditional manually designed approaches. The framework also includes an execution simulator to evaluate runtime performance of various strategies, enhancing its overall effectiveness in optimizing DNN training tasks.
Products
FlexFlow's primary products include the FlexFlow DNN framework and SpecInfer. The FlexFlow framework supports DNN training parallelization across Samples, Operators, Attributes, and Parameters dimensions. SpecInfer is an open-source distributed multi-GPU system that accelerates generative large language model (LLM) inference using speculative inference and token tree verification. SpecInfer reduces inference latency and computational requirements while maintaining model quality and offers pre-built Docker packages with CUDA and HIP-ROCM backends.
Parallelization Strategies
FlexFlow generalizes and surpasses current manually designed parallelization strategies by exploring opportunities across Samples, Operators, Attributes, and Parameters dimensions. The automated search algorithm discovers techniques that outperform traditional strategies. FlexFlow’s novel hierarchical search algorithm optimizes algebraic transformations and parallelization, ensuring scalable solutions that set it apart from other frameworks.
SpecInfer System
SpecInfer is FlexFlow's open-source solution designed to accelerate generative LLM inference. It uses speculative inference and token tree verification methods to reduce latency and computational requirements without sacrificing model quality. SpecInfer operates on small speculative models (SSMs) to predict LLM outputs collectively. The system builds on FlexFlow's framework and offers pre-built Docker packages for ease of installation, ready with CUDA and HIP-ROCM backends.