Image by Google Deepmind
Story: Story: Google has released its latest image generator, Imagen 3, to the US (only). The tool is available through Google’s online platforms (i.e., ImageFX or Vertex AI) and it is supposed to generate images with “better detail, richer lighting, and fewer distracting artifacts” compared to Google’s previous models. Imagen 3 will be available in multiple versions to create images for different use cases from photorealistic landscapes to richly textured oil paintings. You can find the official research paper here.
Key Findings:
Better Prompt Understanding: During training, Google added richer details to the captions of each image, a common practice among leading text-to-image models
High-Quality Images: The model is supposed to excel at generating fine details, such as the wrinkles on a persons hand (we’ll check whether this is true!)
Improved Text Rendering: The model offers better accuracy in rendering text within images (again, we’ll check on that)
Security: The model is designed with security measures, such as declining to generate images of public figures like Taylor Swift or produce images of weapons. In addition, is uses SynthID (see our last blog post about SynthID) to embed a digital watermark directly into the pixels of the image, making it detectable for identification purposes while remaining imperceptible to the human eye.
What’s Next: Future updates will include inpainting and outpainting capabilities, as well as expanding Imagen 3’s availability within the Gemini app.
Pixit‘s Two Cents: We’re always excited to see new Text-to-Image models, and Imagen 3 is no exception. However, it's disappointing that the model is currently only available in the US. This limited availability has been a recurring issue, as we couldn't fully test previous models due to their restricted access, primarily through Google's Vertex AI, which wasn't very user-friendly for broader experimentation. That said, the new ImageFX application looks much more feasible and promising. We’re eager to try out Imagen 3 as soon as it becomes available in Europe and see how it stacks up against both closed and open-source models.
Story: xAI has introduced an update to its chatbot, Grok, adding the ability to generate images. The company launched two versions: Grok-2 and Grok-2 mini. These new image-generation capabilities are powered by Black Forest Lab’s FLUX.1 model, enabling users to create and publish images directly on the X social platform. Currently available in beta, this feature is exclusive to X's Premium and Premium+ users.
Key Findings:
AI Competition Heats Up: By adding image generation, Grok positions itself as a strong competitor to AI tools like OpenAI's ChatGPT and Google's Bard, pushing the boundaries of what conversational AI can offer - but compared to other competitors, Grok is relying on another company for the image capabilities.
Integration: The FLUX.1 integration allows Grok to generate images directly from prompts on the X platform.
Restrictions: In contrast to other AI image generators, Grok’s FLUX.1 integration has few content restrictions, alighning with Elon Musk’s vision for “anti-woke” AI.
Pixit‘s Two Cents: The ability to create images directly on the platform is a game-changer, seamlessly blending text and visuals in a way that enhances user interaction. It's impressive to see how quickly xAI adopted Black Forest Labs’ FLUX.1 model, demonstrating a cool and practical use case for text-to-image models in real-life applications.
Story: Sakana AI has introduced AI Scientist, a tool that claims to be able to write scientific papers on any topic for a mere $15. By leveraging advanced natural language processing and machine learning techniques, AI Scientist generates high-quality, well-structured research papers in a matter of minutes.
Key Findings:
Affordable Scientific Writing: AI Scientist offers a cost-effective solution for researchers, students, and professionals, generating scientific papers for just $15, making it accessible to a wide range of users.
Comprehensive Topic Coverage: The tool claims to be able to write papers on any scientific topic, drawing from a vast knowledge base spanning various disciplines, including biology, chemistry, physics, and more.
Customizable Output: Users can specify the desired length, format, and style of the generated paper, ensuring that the output meets their specific requirements and adheres to academic standards.
Time-Saving Solution: By automating the process of scientific writing, AI Scientist enables users to focus on research and analysis, while the tool handles the time-consuming task of drafting the paper.
Plagiarism-Free Content: AI Scientist generates original content based on the input provided by the user, ensuring that the resulting paper is plagiarism-free and ready for submission or publication.
Pixit‘s Two Cents: By providing an affordable and accessible tool for generating high-quality research papers, AI Scientist has the potential support researchers and students who may lack the time or resources to produce extensive written works. However, it is essential to consider the ethical implications of AI-generated scientific content. While AI Scientist can streamline the writing process, it is crucial for users to ensure that the generated papers are based on accurate and reliable data, and that they undergo proper review and validation before being submitted or published. We have our doubts as to how far this tool is able to write a paper on novel hypotheses and generate meaningful output. On paper (pun intended) it looks very promising though.
OpenAI Introduces Structured Outputs in the API for Enhanced Data Extraction: OpenAI announces the introduction of structured outputs in its API, enabling developers to extract structured data from text using advanced natural language processing techniques, making it easier to integrate AI-powered data extraction into various applications and workflows.
Sonova Launches Hearing Aid with Real-Time Translation and Transcription Powered by OpenAI's Whisper API: Sonova, a leading hearing aid manufacturer, introduces a groundbreaking hearing aid that incorporates real-time translation and transcription capabilities, leveraging OpenAI's Whisper API to enhance communication and accessibility for individuals with hearing impairments.
How People Are Actually Using AI in Their Daily Lives: MIT Technology Review explores the real-world applications of AI in people's daily lives, showcasing how the technology is being used for tasks ranging from creative pursuits to productivity enhancement and personal assistance, while also highlighting the potential risks of emotional reliance on AI companions.
AI Terminology Explained: A Guide for Humans: The Verge provides a comprehensive guide to AI terminology, explaining key concepts and jargon in an accessible manner for those new to the field or seeking to better understand the rapidly evolving AI landscape.
AuraFlow Version 0.3 Released on Hugging Face: Aurflow Version 0.3 was released on the Hugging Face platform, introducing new features and improvements including support for various aspect ratios and resolutions up to 1536 pixels.
Microsoft Copilot: Everything You Need to Know About Microsoft's AI Assistant: TechCrunch provides a comprehensive overview of Microsoft Copilot, the company's AI-powered assistant, detailing its features, capabilities, and potential impact on productivity and collaboration across Microsoft's suite of products and services.
ControlNeXt - A new Controlnet Approach: A Team of researchers from the Chinese University of Hong Kong has proposed a light-weight controllable module for various base models (SD1.5, SDXL, SD3, SVD) and tasks (image / video generation with various conditions).