Skip to main content
Made with ❤️ by Pixit
Made with ❤️ by Pixit

Apple Unveils MGIE: A Leap in AI-Powered Image Editing

image editing apple

Story: Apple has introduced MGIE, an open-source AI model that aims to revolutionize image editing by following natural language instructions. MGIE, short for MLLM-Guided Image Editing, integrates multimodal large language models (MLLMs) to understand user commands and execute pixel-level changes, offering capabilities from Photoshop-style alterations to comprehensive photo optimizations.

Key Findings:

  • Collaborative Innovation: Developed with the University of California, Santa Barbara, MGIE represents a significant advancement in AI research and next to Ferret one of Apple’s very few contributions to open source AI.

  • Intelligent Editing Mechanics: Utilizing MLLMs, MGIE interprets and converts user input into precise editing instructions, facilitating both global photo enhancements and detailed local adjustments.

  • Versatile Editing Features: From simple color adjustments to complex object manipulations, MGIE introduces a new level of expressive and Photoshop-style modifications, backed by its capability for global optimization and targeted local edits.

  • Accessibility and Usage: MGIE is accessible as an open-source project on GitHub, complete with a demo notebook and an online web demo, making it user-friendly for diverse editing tasks.

  • New State of the Art: MGIE not only showcases Apple's expanding expertise in AI but also sets a new standard for this kind of models. In a qualitative comparison it shows significance improvements in comparison to similar approaches like “instruct pix2pix”.

Pixit‘s Two Cents: MGIE's launch underscores the growing potential of AI to simplify and enhance creative processes. Its innovative approach to instruction-based image editing paves the way for more intuitive, efficient, and flexible image manipulation tools. The qualitative examples show significant improvements over previous methods and models. Of course we at Pixit had to try the Huggingface Webdemo and see for ourselves how good it is. In our tests we still saw some shortcomings, like misinterpretation and too many adjustments, but also some very impressive results. We’re excited where this is heading!


Google Transforms Bard into Gemini: Finally Introducing Ultra 1.0

gemini ultra

Story: Google has rebranded Bard as Gemini, launching a mobile app and Gemini Advanced (Think of ChatGPT Plus) featuring Ultra 1.0, its most advanced AI model yet. Gemini Advanced with Ultra 1.0 has been ranked as the most preferred chatbot in blind evaluations, designed to handle complex tasks with improved reasoning and creativity. The introduction of Gemini with the Ultra 1.0 model represents a leap in Googles AI capabilities, offering enhanced conversational interactions and a more intuitive understanding of user prompts. Also Imagen-2 will participate from the better understanding.


Key Findings:

  • Gemini's Advanced Capabilities: Gemini Advanced provides access to Ultra 1.0, enabling complex tasks like coding and creative project collaboration with greater efficiency and understanding. It’s the first real contestant against GPT-4.

  • Mobile Accessibility: Gemini's new mobile app facilitates on-the-go interaction, making AI assistance more accessible and versatile for everyday tasks.

  • Google One AI Premium Plan: Gemini Advanced is part of Google's new subscription plan, which is comparable to ChatGPT Plus, which is OpenAI’s offering. It gives the user more advanced AI features along with benefits like 2TB of storage. Furthermore, subscribers will soon be able to use Gemini’s features in Gmail and Google Docs (Hello Microsoft Co-Pilot).

  • Enhanced Safety Measures: Google emphasizes safety and bias mitigation in Gemini Advanced, incorporating extensive checks and feedback-based refinements.

  • improved Image Generation Capabilities: Next to the enhanced model, with Gemini users can also use Imagen-2 to create unlimited images with the even better text understanding. This makes Google’s offering again similar to the DALL-E 3 integration in ChatGPT.

Pixit‘s Two Cents: The evolution from Bard to Gemini marks a big step for Goolge. Like other companies they rebrand their suite of AI offerings. The introduction of the Ultra 1.0 model, their most capable one, is going to keep the public in busy when people try to figure out how good it really is. It’s also interesting to see how Google now tries to compete with a similar paid offering like ChatGPT Plus. Coming from the image side of things, we are also very happy to see that Imagen-2 will participate from the improved model in terms of text understanding as well, which makes it a another very capable model that might become used more and more.


Google's MobileDiffusion: Pioneering Image Magic on Mobile Devices

mobile diffusion

Story: Google's latest innovation, MobileDiffusion, introduces a groundbreaking method for subsecond text-to-image generation on mobile devices. This efficient latent diffusion model, combined with DiffusionGAN for one-step sampling, enables the creation of high-quality images rapidly. Designed with mobile use in mind, MobileDiffusion stands out for its modest model size and remarkable performance on premium smartphones, offering a significant advancement in mobile-based AI applications.


Key Findings:

  • Efficient On-Device Generation: MobileDiffusion enables the fast generation of 512x512 images in half a second, showcasing its efficiency.

  • Innovative Model Design: It incorporates a unique blend of a latent diffusion model and GAN technology for enhanced performance.

  • Mobile-Friendly Architecture: Tailored for mobile use, its architecture allows for rapid deployment and execution on standard smartphones, which makes it different from competitors like Stable Diffusion. With a model size of “only” 520 million parameters it is quite small in comparison.

  • Enhanced Inference Efficiency: Achieves a notable reduction in the number of necessary sampling steps, optimizing the text-to-image conversion process. Most other methods need to use more than 10 steps to achieve good results.

  • Practical Applications: Offers a wide range of applications, from enhancing mobile user experiences to supporting creative endeavors on-the-go.

Pixit‘s Two Cents: MobileDiffusion's unveiling by Google is another big step in Diffusion Model optimizations. Even though there are other approaches like SDXL Turbo or PixArt-δ that offer very compelling results, none of them runs that fast on mobile devises. This innovation makes totally new applications possible and makes the creative processes of text to image models accessible to an even larger range of people. We cannot wait to see it live and in action. It also looks like a fun thing to try out!


Small Bites, Big Stories:

Tags:
Pix
Post by Pix
Feb 12, 2024 9:28:43 AM