Skip to main content
Made with ❤️ by Pixit
Made with ❤️ by Pixit

LLamageddon: Meta released largest-ever open source AI model, LLama 3.1

llama3

Story: Meta has launched the largest-ever open-source AI model, Llama 3.1. The company made the model available in three different versions (sizes): 8B, 70B, and 405B and you can fine-tune, distill, and deploy them anywhere. The largest version has 405 billion parameters and was trained on 16.000 H100 GPUs (1x H100 GPU costs ~35.000€). These models aim at state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilangual translations; and Meta’s evaluations suggest that the largeet model is competitive with leading, proprietary, foundational models, including GPT-4, GPT-4o, and Claude 3.5 Sonnet

Key Findings:

  • The model has a context length of 128K (it’s like reading a book)

  • The model supports eight different languages (English, German, French, Italian, Portugese, Hindi, Spanish, and Thai.

  • Meta released a full reference system to improve responsibility and security, including components such as Llama Guard 3 (detecting violent content), a safety model, and Prompt Guard (detecting malicious prompts)

  • As usual, the model is available as open-source enabling developers to not only to use the model for free but also to use it’s output to improve other models. 

Pixit‘s Two Cents: The largest-ever open source LLM for free? We absolutely love it. It is yet to be seen if the results are really as good as the once from closed, proprietary models such as GPT-4o or Claude 3.5 Sonnet. But we’re already amazed by the results so far. Most likely you won’t be able to run it on your local device (at least not the 405B model) but you can run test it’s abilities in Perplexity.

SV4D Shows Dynamic 3D Content Creation with Unparalleled Consistency

sv4d

Story: Researchers from Stability AI and Northeastern University have introduced Stable Video 4D (SV4D), a groundbreaking latent video diffusion model that generates multi-frame and multi-view consistent dynamic 3D content. Unlike previous methods that rely on separate models for video generation and novel view synthesis, SV4D employs a unified diffusion model to generate novel view videos of dynamic 3D objects with temporal consistency.

Key Findings:

  • Unified Architecture: SV4D integrates Stable Video Diffusion (SVD) and Stable Video 3D (SV3D) models with attention mechanisms to ensure both spatial and temporal consistency in generated novel view videos.

  • Efficient 4D Optimization: The generated novel view videos are used to optimize an implicit 4D representation (dynamic NeRF) efficiently, without the need for cumbersome score-distillation sampling (SDS) based optimization used in prior works.

  • Superior Performance: Extensive experiments on multiple datasets (ObjaverseDy, Consistent4D, DAVIS) and user studies demonstrate SV4D's state-of-the-art performance in novel view video synthesis and 4D generation compared to existing methods.

  • Improved Consistency: SV4D achieves significant reductions in Frechet Video Distance (FVD) metrics, highlighting its superior temporal coherence and robustness in multi-frame and multi-view consistency.

  • Curated Dataset: To train the unified model, the researchers curated ObjaverseDy, a dynamic 3D object dataset derived from the Objaverse dataset, addressing the scarcity of large-scale 4D datasets. 

Pixit‘s Two Cents: SV4D's novel approach to dynamic 3D content generation, which integrates image synthesis and video frame consistency within a single diffusion-based model, is particularly interesting due to its ability to generate temporally and spatially consistent novel view videos efficiently. The rapid creation of convincing 4D assets for various applications, such as AR/VR, gaming, and cinematic production, without relying on separate models. Since the models are available here we are eager to try out what’s possible.


Europe’s Biggest LLM provider, Mistral AI, released Mistral Large 2

mistral-large-2

Story: Mistral has released a new open-source language model, Mistral Large 2, with improved capabilities in code generation, mathematics, and reasoning. The 123 billion parameter model was explicitly designed to avoid hallucination and acknowledge when it cannot find solutions or does not have a sufficient information to provide a confident answer.

Key Findings:

  • The model has the same context length as LLama 3, that is, 128K (again, it’s like reading a book)

  • Mistral Large 2 supports lots of of languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi) as well as many coding languages (80+; e.g., Python, Java etc.)

  • Mistral Large 2 not only outperforms its predecessor but it also performs on par with leading models such as GPT-4o, Claude 3 Opus and Llama 3 405B.

  • As always, Mistral Large 2 was released under the Mistral Research License, that allows usage and modification for research and non-commercial usages. For commercial usages you have to acquire the Mistral Commercial License.

Pixit‘s Two Cents: Europe’s biggest LLM provider did it again. Just a couple of days after Meta released LLama 3, Mistral released a model with very similar performance - considering that they have much, much, fewer resources, an incredible undertaking. Similar to LLama 3, the AI community has to evaluate its performance and compare it to other, also proprietary, models. It has been a great week for the open-source community!


Small Bites, Big Stories:

Tags:
Pix
Post by Pix
Jul 29, 2024 9:50:07 AM