Pruna AI open-sources its AI model optimization framework

Pruna AI open-sources its versatile AI optimization framework.

: Pruna AI, a European startup, is open-sourcing its AI model optimization framework, enhancing AI models' efficiency by utilizing methods such as caching, pruning, quantization, and distillation. Co-founder John Rachwan highlights its capacity to standardize saving, loading, and evaluative processes while minimizing quality loss. The framework, akin to Hugging Face but for efficiency, aggregates multiple methods for ease of use. Additionally, Pruna AI has raised $6.5 million in funding from investors like EQT Ventures and Kima Ventures.

Pruna AI, based in Europe, has announced the open-source release of its AI model optimization framework, which was officially made available on Thursday. The framework is designed to integrate several efficiency methods, including caching, pruning, quantization, and distillation, to optimize AI models. The initiative aims to provide developers with a comprehensive solution, allowing them to compress AI models effectively without compromising on quality. Pruna AI's CTO, John Rachwan, explained that the framework standardizes the process of saving and loading compressed models, thus facilitating the evaluation process after compression.

John Rachwan emphasized that the framework can evaluate significant quality loss post-compression while assessing performance gains. He likened Pruna AI’s framework to Hugging Face, noting its role in standardizing and streamlining efficiency methods. This innovation addresses a gap in the open-source world, as most existing tools are based on single methods, like quantization or caching. The key value of Pruna AI's framework is its capability to aggregate various methods, simplifying their usage and combination, which Rachwan sees as a significant contribution.

The framework is already in use by notable entities, such as OpenAI, which employs distillation to accelerate its flagship models like GPT-4 Turbo, and Black Forest Labs, which applies the technique to models such as Flux.1-schnell. Distillation works through a "teacher-student" model, wherein a smaller student model is trained to replicate a larger teacher model’s behavior. This process involves output comparisons with a dataset for accuracy checks, enhancing the model's efficiency.

Pruna AI's solution supports a range of models, from large language and diffusion models to speech-to-text and computer vision models. Presently, the company focuses on optimizing image and video generation models, with users including Scenario and PhotoRoom. The enterprise version of the framework includes advanced optimization features, such as a compression agent that automates the process based on specified constraints, like maintaining accuracy while boosting speed.

Backing its expansion, Pruna AI raised a $6.5 million seed funding round, attracting investment from EQT Ventures, Daphni, Motier Ventures, and Kima Ventures. According to Rachwan, the pro version charges by the hour, offering cost savings akin to renting a GPU via cloud services. For instance, Pruna AI achieved an eightfold size reduction in the Llama model with minimal performance loss—an example of the potential cost-effectiveness of the framework.

Sources: TechCrunch, VentureBeat