Pixit Pulse: The Weekly Generative AI Wave

AI News #82

Geschrieben von Pix | Aug 5, 2024 8:14:54 AM

Original Stable Diffusion Creators Launch Black Forest Labs, Secure $31M for FLUX-1 AI Image Generator

Created with Flux1-Schnell and the prompt: "A tiny cute high quality dark blue robot looking speechless into the camera with a 3d cloud bubble above his head saying "Pixit says WOW!" and no other text"

Story: The creators of Stable Diffusion, the popular open-source AI image generation model, have founded a new company called Black Forest Labs and raised $31 million in a Series A funding led by Andreessen Horowitz. As their first offering, the company has released the FLUX.1 suite of text-to-image models, setting a new benchmark for image synthesis.

Key Findings:

  • Three Model Variants: FLUX.1 comes in three variants - FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell] - catering to different user needs and accessibility requirements.

  • State-of-the-Art Performance: FLUX.1 [pro] and [dev] surpass popular models like Midjourney v6.0, DALL·E 3 (HD), and SD3-Ultra in visual quality, prompt following, size/aspect variability, typography, and output diversity.

  • Hybrid Architecture: All FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion transformer blocks, scaled to 12B parameters, and incorporate flow matching, rotary positional embeddings, and parallel attention layers for improved performance and hardware efficiency.

  • Diverse Aspect Ratios and Resolutions: FLUX.1 models support a wide range of aspect ratios and resolutions in 0.1 and 2.0 megapixels, offering enhanced creative possibilities compared to current state-of-the-art models.

Pixit‘s Two Cents: The launch of Black Forest Labs and the development of FLUX-1 is really exciting in the field of AI image generation technology as it might mark a point were the king of open source models, Stable Diffsuion, might get a huge competitor. With three model sizes, improved image quality, faster inference times, and greater user control, FLUX-1 sets a new standard for customizable and efficient image generation. The substantial Series A funding round positions Black Forest Labs very well to keep on developing and delivering new models like the Text-to-Video model they already announced to build. We cannot wait to tinker around more with it. We are quite happy with the results of the tiniest model so far (see above).

Meta Unveils Segment Anything 2 for Zero-Shot Video Segmentation

Story: Meta has introduced Segment Anything 2 (SA2), a groundbreaking machine learning model that extends the capabilities of its predecessor, Segment Anything, to the video domain. SA2 enables zero-shot video segmentation, allowing users to quickly and accurately identify and outline objects in videos without the need for extensive training data or manual annotation.

Key Findings:

  • Zero-Shot Video Segmentation: SA2 enables users to segment objects in videos by simply specifying what they want to identify, without requiring pre-trained models for specific object classes.

  • Efficient Video Processing: Unlike running the original Segment Anything model on individual video frames, SA2 is designed to process video natively, resulting in a more efficient workflow.

  • Open-Source Release: Meta plans to make SA2 open and free to use, following the company's commitment to advancing the field of AI through open research and collaboration.

  • Large-Scale Training Data: SAM2 was trained on a combination of a publicly released annotated database of 50,000 videos and an internally available dataset of over 100,000 videos, showcasing the importance of large-scale data in developing robust AI models.

  • Potential Applications: SAM2 has the potential to shape workflow in various industries, including scientific research, content creation, and video analysis, by enabling efficient and accurate video segmentation at scale. 

Pixit‘s Two Cents: By extending the capabilities of the original Segment Anything model to the video domain, SAM2 opens up new possibilities for efficient and flexible video analysis and content creation. We really liked the first model already and there were tons of improvements made by the community. We are really eager to see what's going to be built with the improved capabilities and ability to track segments in videos. For example the bacteria tracking case was a very interesting real world application.

Canva Acquires Fast-Growing AI Platform Leonardo.ai

Story: Canva, Australia's largest privately held technology company, has acquired Leonardo.ai, a rapidly growing generative AI platform, in a blockbuster deal that bolsters Canva's artificial intelligence capabilities. Founded in December 2022, Leonardo.ai has already amassed over 19 million users and developed its own foundation AI model called Phoenix.

Key Findings:

  • Rapid Growth: Despite launching just seven months ago, Leonardo.ai has experienced remarkable growth, attracting 19 million users and raising $47 million in a Series A funding round.

  • Proprietary AI Model: Leonardo.ai's development of its own foundation AI model, Phoenix, played a significant role in attracting Canva's interest and accelerating the acquisition process.

  • Strategic Fit: The acquisition aligns with Canva's vision of integrating cutting-edge AI technologies into its design platform, enhancing its offerings and user experience ahead of an anticipated IPO.

  • Competitive Landscape: Leonardo.ai is considered a competitor to other prominent generative AI startups, such as Midjourney and OpenAI's DALL-E, highlighting the growing interest and investment in this field.

  • Undisclosed Terms: While the financial terms of the deal have not been disclosed, the acquisition represents a significant milestone for both companies and the Australian startup ecosystem. 

Pixit‘s Two Cents: Through the integration of Leonardo.ai's cutting-edge AI capabilities, including its proprietary foundation model Phoenix, Canva is positioning itself as one of the the best AI-powered design tools, enhancing its value proposition for users and investors. As we use both of the tools we are really eager to see what's going to come next in terms of integrations and new possibilities. Impressive to see how Leonardo.ai was able to scale this fast.

Small Bites, Big Stories:

  • OpenAI Begins Rolling Out Advanced Voice Mode for ChatGPT: OpenAI starts introducing an advanced voice mode for ChatGPT, allowing users to engage in more natural, conversational interactions with the AI assistant using their voice, with the feature initially available to a limited number of users.

  • Apple Releases Apple Intelligence, Its Long-Awaited AI Features: Apple officially releases Apple Intelligence, a suite of AI-powered features for its devices, including enhanced Siri capabilities, intelligent suggestions, and improved privacy controls, marking the company's entry into the competitive AI landscape.

  • NIST Introduces Tool for Assessing AI Model Risk: The National Institute of Standards and Technology (NIST) releases a new tool designed to help organizations evaluate the potential risks associated with AI models, focusing on factors such as bias, transparency, and robustness.

  • TikTok's Spending Drives Microsoft's Booming AI Business: TikTok's significant investment in Microsoft's AI technologies, particularly in the areas of content moderation and recommendation systems, has been a major contributor to the growth of Microsoft's AI business.

  • Meta Confirms Llama 4 Is Already in Training: During Meta's Q2 2024 earnings call, CEO Mark Zuckerberg reveals that the company is already training Llama 4, the next iteration of its large language model, signaling Meta's continued investment in AI research and development.

  • Google's Gemini Pro Dominates AI Benchmarks, Surpassing GPT-4o and Claude 3: Google's Gemini Pro, the latest version of its AI model, outperforms leading competitors such as OpenAI's GPT-4o and Anthropic's Claude 3 across various benchmarks, solidifying Google's position as a frontrunner in the AI race.