Anyone see the TensorRT-LLM announcement from NVIDIA?

anon22939549 · September 10, 2023, 3:16am

Would be amazing if the end result of this is a near doubling of the available message cap!

curt.kennedy · September 10, 2023, 3:28am

Looks like they can quantize to FP8 pretty easily:

NVIDIA H100 GPUs with TensorRT-LLM give users the ability to convert their model weights into a new FP8 format easily and compile their models to take advantage of optimized FP8 kernels automatically. This is made possible through Hopper Transformer Engine technology and done without having to change any model code.

Also found this line interesting:

TensorRT-LLM improves ease of use and extensibility through an open-source modular Python API for defining, optimizing, and executing new architectures and enhancements as LLMs evolve, and can be customized easily.

_j · September 10, 2023, 9:23am

NVIDIA has been working closely with leading companies, including Meta, Anyscale, Cohere, Deci, Grammarly, Mistral AI, MosaicML, now a part of Databricks, OctoML, Tabnine, and Together AI, to accelerate and optimize LLM inference. (coming open-source software)

Who’s not mentiioned? The one that is likely already crushing quants to the limit.

Topic		Replies	Views
VentureBeat: Microsoft and Nvidia team up to train one of the world’s largest language models Community	5	697	December 19, 2023
Open Source is making rapid progress Community agi	21	1997	July 24, 2024
Next-Gen AI Computing: NVIDIA's DGX B200 Revealed Community in-the-news	3	934	March 23, 2024
Llama 3 from Meta AI Released Community open-source , competitor	0	4168	April 18, 2024
MT-NLG - Are we ever getting access to the 530 B parameters trained model? API	2	1497	September 8, 2022

Anyone see the TensorRT-LLM announcement from NVIDIA?

Related topics