Story: Meta has launched the largest-ever open-source AI model, Llama 3.1. The company made the model available in three different versions (sizes): 8B, 70B, and 405B and you can fine-tune, distill, and deploy them anywhere. The largest version has 405 billion parameters and was trained on 16.000 H100 GPUs (1x H100 GPU costs ~35.000€). These models aim at state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilangual translations; and Meta’s evaluations suggest that the largeet model is competitive with leading, proprietary, foundational models, including GPT-4, GPT-4o, and Claude 3.5 Sonnet
Key Findings:
The model has a context length of 128K (it’s like reading a book)
The model supports eight different languages (English, German, French, Italian, Portugese, Hindi, Spanish, and Thai.
Meta released a full reference system to improve responsibility and security, including components such as Llama Guard 3 (detecting violent content), a safety model, and Prompt Guard (detecting malicious prompts)
As usual, the model is available as open-source enabling developers to not only to use the model for free but also to use it’s output to improve other models.
Pixit‘s Two Cents: The largest-ever open source LLM for free? We absolutely love it. It is yet to be seen if the results are really as good as the once from closed, proprietary models such as GPT-4o or Claude 3.5 Sonnet. But we’re already amazed by the results so far. Most likely you won’t be able to run it on your local device (at least not the 405B model) but you can run test it’s abilities in Perplexity.
Story: Researchers from Stability AI and Northeastern University have introduced Stable Video 4D (SV4D), a groundbreaking latent video diffusion model that generates multi-frame and multi-view consistent dynamic 3D content. Unlike previous methods that rely on separate models for video generation and novel view synthesis, SV4D employs a unified diffusion model to generate novel view videos of dynamic 3D objects with temporal consistency.
Key Findings:
Unified Architecture: SV4D integrates Stable Video Diffusion (SVD) and Stable Video 3D (SV3D) models with attention mechanisms to ensure both spatial and temporal consistency in generated novel view videos.
Efficient 4D Optimization: The generated novel view videos are used to optimize an implicit 4D representation (dynamic NeRF) efficiently, without the need for cumbersome score-distillation sampling (SDS) based optimization used in prior works.
Superior Performance: Extensive experiments on multiple datasets (ObjaverseDy, Consistent4D, DAVIS) and user studies demonstrate SV4D's state-of-the-art performance in novel view video synthesis and 4D generation compared to existing methods.
Improved Consistency: SV4D achieves significant reductions in Frechet Video Distance (FVD) metrics, highlighting its superior temporal coherence and robustness in multi-frame and multi-view consistency.
Curated Dataset: To train the unified model, the researchers curated ObjaverseDy, a dynamic 3D object dataset derived from the Objaverse dataset, addressing the scarcity of large-scale 4D datasets.
Pixit‘s Two Cents: SV4D's novel approach to dynamic 3D content generation, which integrates image synthesis and video frame consistency within a single diffusion-based model, is particularly interesting due to its ability to generate temporally and spatially consistent novel view videos efficiently. The rapid creation of convincing 4D assets for various applications, such as AR/VR, gaming, and cinematic production, without relying on separate models. Since the models are available here we are eager to try out what’s possible.
Story: Mistral has released a new open-source language model, Mistral Large 2, with improved capabilities in code generation, mathematics, and reasoning. The 123 billion parameter model was explicitly designed to avoid hallucination and acknowledge when it cannot find solutions or does not have a sufficient information to provide a confident answer.
Key Findings:
The model has the same context length as LLama 3, that is, 128K (again, it’s like reading a book)
Mistral Large 2 supports lots of of languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi) as well as many coding languages (80+; e.g., Python, Java etc.)
Mistral Large 2 not only outperforms its predecessor but it also performs on par with leading models such as GPT-4o, Claude 3 Opus and Llama 3 405B.
As always, Mistral Large 2 was released under the Mistral Research License, that allows usage and modification for research and non-commercial usages. For commercial usages you have to acquire the Mistral Commercial License.
Pixit‘s Two Cents: Europe’s biggest LLM provider did it again. Just a couple of days after Meta released LLama 3, Mistral released a model with very similar performance - considering that they have much, much, fewer resources, an incredible undertaking. Similar to LLama 3, the AI community has to evaluate its performance and compare it to other, also proprietary, models. It has been a great week for the open-source community!
Stable Diffusion 3 License Revamped Amid Blowback, Promising Better Model: After facing a backlash from the AI community due to its restrictive licensing terms, Stability AI improved their Community License, enabling free use of SD3 for research, non-commercial, limited commercial purposes (< $1 million revenue you can use the model for free), and you can create custom SD3 models and improve on top of the base SD3 model.
AI trained on AI garbage spits out AI garbage: A new article published in Nature showed that the quality of AI models gradually degrades when they were trained on AI-generated data.
“Copyright traps” could tell writers if an AI has scraped their work: Creating and injecting into your text thousands of synthetic sentences helps to detect if your data is used to train models. Although this new technique looks promising, it is impractical right now.
Elon Musk Unveils World's Most Powerful AI Training Cluster at xAI: Elon Musk announces that xAI has built the world's most powerful AI training cluster, capable of 1 exaflop of compute power, surpassing the performance of existing supercomputers.
Google Introduces Gemini 1.5 Flash and New Features for Enterprise AI: Google unveils Gemini 1.5 Flash, a high-performance variant of its Gemini 1.5 model, offering faster inference speeds and improved efficiency for enterprise AI applications, along with new features such as enhanced data privacy controls and expanded language support.
OpenAI is testing SearchGPT: You can join the waitlist to test the new AI online search tool that gives you fast and timely answers with relevant sources. In the future, OpenAI plans to incorporate the AI online search tool into ChatGPT.
Video Generator Kling AI is now available everywhere: You can now use the text-to-video generator for free (at least to some degree) at klingai.com.