Pixit Pulse: The Weekly Generative AI Wave

AI News #67

Geschrieben von Pix | Apr 22, 2024 7:54:10 AM

Meta Releases Llama 3 (partly): Claiming to be Among the Best Open Models Available

Story: Meta has released Llama 3, a new family of open-source language models that mark a strong advancement in performance compared to the previous Llama 2 models. The two models, Llama 3 8B and Llama 3 70B, boast impressive capabilities, with 8 billion and 70 billion parameters, respectively. Meta asserts that these models represent a massive leap forward in performance, challenging industry leaders like OpenAI's GPT-3.5 and Mistral's Mistral Medium. Trained on custom-built 24,000 GPU clusters, Llama 3 8B and Llama 3 70B are among the best-performing generative AI models currently available, according to Meta's internal testing. The biggest model (Llama 3 400B) is claimed to be still in training and be GPT-4 class.

Key Findings:

  • Significant Performance Boost: Llama 3 8B and Llama 3 70B demonstrate a substantial improvement in performance compared to their predecessors, with Llama 3 70B outperforming OpenAI's GPT-3.5, Mistral's Mistral Medium, and Claude Sonnet on Meta's custom test set.

  • Open-Source Availability: Meta plans to make the Llama 3 models available for download and is hosting them in managed form on various cloud platforms, including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM's WatsonX, Microsoft Azure, Nvidia's NIM, and Snowflake.

  • Hardware Optimization: Future releases of Llama 3 will include models optimized for hardware from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm, ensuring compatibility and performance across a wide range of systems.

  • Multilingual and Multimodal Capabilities: Meta is working on more capable Llama 3 models, with sizes exceeding 400 billion parameters, that will be multilingual and multimodal, capable of understanding longer context and improving overall performance.

  • Challenging Industry Leaders: With its impressive performance and open-source availability, Llama 3 poses a significant challenge to industry leaders like OpenAI and Google, potentially democratizing access to high-quality language models.

  • Driving Innovation and Collaboration: By releasing Llama 3 as open-source, Meta aims to foster innovation and collaboration within the AI community, enabling researchers and developers to build upon and improve the models for various applications.

Pixit‘s Two Cents: Meta's release of Llama 3 is a game-changer in the world of open-source language models. By providing powerful, high-performing models like Llama 3 8B and Llama 3 70B, Meta is democratizing access to cutting-edge AI technology and challenging the dominance of industry giants like OpenAI and Google. This is probably not altruism, but rather strategic. They probably intend to crush competitors somehow. It's important to note that Meta's performance claims are based on their own custom test set, which may raise questions about the results' objectivity. As the AI landscape continues to evolve rapidly, it will be fascinating to see how Llama 3 stacks up against other leading models and how it contributes to the advancement of natural language processing and generative AI. For us it’s extremely exciting what’s going to evolve from such powerful models available to everyone!

Adobe's 'Ethical' Firefly AI Trained on Rival-Generated Images, Raising Questions About Transparency

Story: Adobe has recently come under scrutiny for its lack of transparency regarding the training data used for its Firefly image-generating software. While the company initially promoted Firefly as a "commercially safe" alternative to competitors like Midjourney, which learns by scraping pictures from across the internet, it has now been revealed that Adobe also relied on AI-generated content from these same rivals (namely Midjourney) to train Firefly. This revelation has raised concerns about Adobe's commitment to ethical AI practices and its transparency in communicating the sources of its training data to the public.

Key Findings:

  • Contradictory Claims: Despite promoting Firefly as a safe alternative to competitors due to its usage on licensed images from Adobe Stock, Adobe was also using AI-generated content from rivals like Midjourney to train its model.

  • Lack of Transparency: In numerous presentations and public posts about Firefly's safety and training data, Adobe never made it clear that the model was using images from the same competitors it claimed to be a safer alternative to.

  • Ethical Concerns: The use of rival-generated images in Firefly's training data raises questions about Adobe's commitment to ethical AI practices and the potential for copyright infringement or misuse of intellectual property.

  • Industry-Wide Issue: This revelation highlights the need for greater transparency and accountability across the AI industry, as companies increasingly rely on vast amounts of data, including content generated by competitors, to train their models.

Pixit‘s Two Cents: Adobe's use of rival-generated images to train its supposedly "ethical" Firefly AI raises serious concerns about the company's commitment to transparency and responsible AI practices. At Pixit we also always promoted Adobe’s Firefly as a safe alternative to competitors, which we cannot claim anymore with a good conscious. Adobe's misstep serves as a reminder that the pursuit of advanced AI must be accompanied by a steadfast commitment to ethics, transparency, and the responsible use of technology.

Microsoft Unveils VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real-Time from a Single Photo

Story: Microsoft Research Asia has introduced VASA-1, a groundbreaking AI model capable of creating realistic, synchronized animated videos of a person talking or singing using just a single photograph and an existing audio track. VASA-1 (Visual Affective Skills Animator) analyzes a static image in conjunction with a speech audio clip to generate a lifelike video with precise facial expressions, head movements, and lip-syncing. The model significantly outperforms previous speech animation methods (for example EMO) in terms of realism, expressiveness, and efficiency, generating videos of 512x512 pixel resolution at up to 40 frames per second with minimal latency.

Key Findings:

  • Lifelike Animations from a Single Photo: VASA-1 can generate realistic, synchronized animated videos of a person talking or singing using just a single photograph and an existing audio track, without the need for voice cloning or simulation.

  • Precise Facial Expressions and Lip-Syncing: The AI model analyzes the static image and speech audio clip to create a video with accurate facial expressions, head movements, and lip-syncing, resulting in a highly lifelike animation.

  • Superior Performance: VASA-1 significantly outperforms previous speech animation methods in terms of realism, expressiveness, and efficiency, setting a new standard for audio-driven talking face generation.

  • Real-Time Applications: With the ability to generate videos of 512x512 pixel resolution at up to 40 frames per second with minimal latency, VASA-1 is suitable for real-time applications such as video conferencing and interactive experiences.

  • Trained on Extensive Dataset: Microsoft trained VASA-1 on the VoxCeleb2 dataset, which contains over 1 million utterances for 6,112 celebrities, extracted from YouTube videos, enabling the model to learn from a diverse range of facial features and expressions.

Pixit‘s Two Cents: Microsoft's VASA-1 is a remarkable achievement in the field of AI-generated talking faces, pushing the boundaries of what's possible with just a single photograph and an audio track. The lifelike animations produced by VASA-1 have the potential to revolutionize various industries, from entertainment and gaming to education and virtual assistants. However, as with any powerful technology, there are also concerns about potential misuse, such as the creation of deepfakes for malicious purposes, which is why Microsoft does not plan to release the model for the near future. Next to all the capabilities, we find the generation of completely new lifelike expressions and poses very interesting.

Small Bites, Big Stories: