Google DeepMind has released a new AI model called DiffusionGemma, which can produce entire blocks of text in parallel, making it faster and more efficient. This model is different from most AI models, which generate text linearly, one token at a time. Instead, DiffusionGemma uses a process similar to image generation models, starting with a field of placeholder tokens and refining them to create the desired content.
DiffusionGemma is a large model with 26 billion parameters, but only 3.8 billion are activated during inference, making it suitable for high-end GPUs. In testing, the model has shown impressive performance, producing around 700 tokens per second with an RTX 5090 and over 1,000 tokens per second with a single Nvidia H100 AI accelerator.
This represents a significant speed boost of about four times compared to similarly sized autoregressive Gemma models, making DiffusionGemma a notable development in the field of AI.



