Story: StabilityAI announced Stable Diffusion 3 in early preview. The new text-to-image model is supposed to improve performance in multi-subject prompts, image quality, and spelling abilities. The company says that the suite of models will range from 800M to 8B parameters - without specifying how many models there will be in total.
Key Findings:
Technology: The technical report will be published soon. But it seems that the text-to-image model combined Diffusion Transformers (DiTs) with flow matching. In Diffusion Transformer, the U-Net backbone is replaced by a transformer architecture (Vision Transformer, ViT), that is, converting images into patches and processing each patch by transformer blocks. Sora is using a similar technique.
Open Source: The weights of the model will be available, similar to previous models, allowing to run and fine-tune the models locally.
Variety of models: Multiple models will be available upon the official release to run the model locally on a variety of devices - from smartphones to servers.
Waitlist: You can sign up to join the waitlist here.
Story: Google recently faced a challenging situation with its text-to-image model (Imagen 2), which mistakenly incorporated diversity elements into images in an inappropriate manner. This incident has brought to light the complexities and potential pitfalls in the development and deployment of AI technologies. As a consequent, Google pauses Gemini’s ability to generate images of people.
Key Findings:
Inaccurate Historical Depictions: The AI model generated images with incongruous representations of historical figures (like multi-cultural Founding Fathers)
Impact on AI Industry: Such incidents can have broader implications for the AI industry, emphasizing the need for robust testing and sensitivity in AI systems.
Re-release: Google aims to re-release an improved version of the model soon, focusing on more accurate and sensitive AI outputs.
Pixit‘s Two Cents: The recent incident with Google's AI highlights the critical importance of responsible AI development, especially in areas like diversity representation. It serves as a reminder for companies like Pixit, who specialize in AI and image generation, to prioritize accuracy and sensitivity in AI implementations.
Story: Google introduces Gemma, a new family of state-of-the-art open models. Built on the foundation laid by Gemini, Gemma models are designed to empower developers and researchers to create AI applications responsibly. With a focus on accessibility, these models come in two sizes, Gemma 2B and Gemma 7B, and are complemented by tools to encourage innovation, collaboration, and responsible usage.
Key Findings:
Broad Accessibility and Support: Gemma models are available worldwide, offering pre-trained and instruction-tuned variants, with toolchains for JAX, PyTorch, and TensorFlow. Integration with popular tools and platforms makes it easy to start developing with Gemma.
Responsible AI Development: A new Responsible Generative AI Toolkit accompanies Gemma, guiding the creation of safer AI applications. Google emphasizes safety and responsible AI with extensive evaluations and fine-tuning processes.
Optimized Performance: Gemma models deliver best-in-class performance for their size, capable of running on various devices and optimized for multiple AI hardware platforms, ensuring efficient and accessible AI development.
Commercial Use and Research Support: The terms of use allow for commercial applications, and Google offers free credits for researchers and developers, highlighting its support for the broader AI community.
Pixit‘s Two Cents: At Pixit we are glad to hear that Google finally joined many other big tech companies in open sourcing (some) of their models. The new Gemma models also give some very interesting insights in how the bigger brother Gemini was built. We are very interested to see what the Open Source community it going to unveil in the weeks to come. Thanks Google and keep on open sourcing.
Nvidia’s Chat with RTX is a promising AI chatbot that runs locally on your PC: Nvidia makes it easy to run a large language model on your own Window PC.
Microsoft’s AI growth is helping its cloud business weaken Amazon’s lead: Microsoft’s cloud is growing significantly faster than Amazon Web Services of late, thanks in part to its cozy OpenAI relationship (five years ago Azure was half as big as AWS, now it is three-quarter its size).
AI is detecting cancer - which doctors do not see (German only): Computer vision is helping to detect cancer and to defuse bombs.
How AI is changing gymnastics judging: An AI-powered Judging Support System is helping judges review routines of gymnastics in case of an inquiry.
V-JEPA: The next step toward Yann LeCun’s vision of advanced machine intelligence (AMI): Meta released a new non-generative model that learns by predicting missing or masked parts of a video in an abstract representation space. The model is is based on a self-supervised learning method allowing it to learn from unlabeled data efficiently.
AI Companies Form Accord on Election Deepfakes: Leading AI companies, including Google, Microsoft, Meta, OpenAI, TikTok, and Adobe, have agreed to an accord aimed at identifying, labeling, and controlling AI-generated content that could mislead voters. However, this accord stops short of outright banning such content.
ProSiebenSat.1 and RTL Cooperate on AI-based Advertising (German only): Germany's largest TV rivals, ProSiebenSat.1 and RTL, are joining forces against Amazon, Google, and Meta, opting out of American advertising technology in favor of their collaboration. This move aims to leverage AI for more targeted viewer engagement and to compete more effectively with the advantages of U.S. tech giants.