Pixit Pulse: The Weekly Generative AI Wave

AI News - Week #51

Geschrieben von Pix | Dec 18, 2023 10:14:31 AM

EU Publishes World's First Comprehensive AI Legislation

Story: The European Union has finalized the AI act, positioning itself as a global leader in AI regulation. After intense negotiations and debates spanning over two and a half years, the EU has formulated the world’s first comprehensive AI law. This act aims to mitigate risks posed by AI in critical areas such as healthcare, education, and public services, and introduces stringent rules for high-risk AI systems.

Key Findings:

  • Regulation: Developers have to disclose training and testing methodologies, adhere to copyright laws, and implement digital watermarks (e.g., see our last blog post about SynthID for watermarking AI-generated content)

  • Restricted AI Applications: The AI Act bans certain AI applications, including untargeted facial recognition scraping and the use of emotion recognition systems in educational institutions and workplaces.

  • Enhanced Transparency: The AI Act aims to empower authors, musicians, and other creators by requiring more transparency enabling creatives to understand whether their works have been used.

  • AI Authority: A dedicated AI authority will be set up within the Commission to enforce the regulation of foundational models, with national authorities monitoring AI systems.

Pixit‘s Two Cents: At Pixit, we commend the EU’s AI Act for addressing key aspects when developing foundational models (check out this Cheat Sheet), especially in terms of transparency regarding the use of copyright data. This regulation marks a significant step towards ethical and responsible AI development. However, we are cautiously optimistic, hoping that these regulations will not hamper the progress of foundational models. Further, it’s crucial that these new rules don’t disproportionately affect small startups like ours, which rely on leveraging foundational models.

From Generic to Domain-Specific Large Vision Models (LVMs)

Story: The evolution of Large Language Models (LLMs) has significantly changed the way we process text. Recently, a parallel transformation is occurring in image processing with the introduction of Large Vision Models (LVM). Unlike LLMs, which are effective due to the similarity between internet text and proprietary documents, LVMs face a unique challenge: Internet images don't always align with the specialized needs of various industries. This mismatch is where domain-specific LVMs shine, as highlighted by the recent announcement from Andrew Ng.

Key Findings:

  • Internet Images vs. Industry-Specific Needs: Generic LVMs, typically trained on diverse internet images (e.g., dogs, humans, food), struggle in industrial contexts where images differ significantly from standard internet images.

  • Domain-Specific LVM Advantages: Experiments by Landing AI demonstrate that LVMs tailored to specific domains, like pathology or semiconductor wafer inspection, are more efficient in identifying relevant features in those unique images.

  • Need for Labeled Data: Building domain-specific LVMs requires around 100,000 unlabeled images from that domain (the more the better) and when using pretrained LVMs in a supervised learning setting, a domain-specific model requires considerable less labeled data (about 10% to 30%) compared to generic LVM.

Pixit‘s Two Cents: We have seen a similar shift from generic to domain-specific models in the era of Convolutional Neural Networks (CNNs) - so we are not surprised to see domain-specific LVMs outperforming generic LVMs. Again, this is a real game-changer for businesses with unique, proprietary image datasets (e.g., see our last blog post about Bosch leveraging their domain-specific datasets). At Pixit we are closely following these developments, recognizing their potential to redefine the way we make professional headshots.

Google’s Deepmind unveils Imagen-2 - Their most advanced T2I Model

Story: Google’s Deepmind just announced their second iteration of their own Text to Image (T2I) model Imagen. It promises many advancements in terms of quality, flexibility and responsibility. Imagen-2 can already be used via Google Cloud’s Vertex AI.

Key Findings:

  • Improved Image-Caption Understanding: Deepmind trained their new Imagen-2 model with more and higher quality image and caption pairs, thus making it able to better understand a broad range of user prompts.

  • More Realistic Image Generation: The researchers were able to improve on many of the aspects other models struggle with. The new model is much improved in creating realistic faces, lighting, hands and is even able to produce text.

  • Fluid Style Conditioning: Imagen-2 allows to incorporate a reference image next to the text prompt, which is able to guide the final outcome by adapting the style of the reference image.

  • Advanced Inpainting and Outpainting: By providing an image mask the user is able to guide the model in extending an original image or generating new content directly into the image. The feature will be available in 2024.

  • Watermarking via SynthID: Deepmind incorporated their SynthID technology (covered in our newsletter a few weeks ago) in order to add invisible watermarks to the generated images.

Pixit‘s Two Cents: At Pixit we are always overly excited to see new models and advancements in the world of diffusion models. Seeing big tech tackling big issues that are still present (e.g. correct hands or text in the images) is very exciting on the one hand, but also something that needs to follow in open source models to keep them competitive on the other hand.

Small Bites, Big Stories:

  • Meta has launched "Imagine with Meta”: It’s a standalone web-based generative AI tool that enables users to create high-resolution images from text prompts, similar to OpenAI's DALL-E, Midjourney, and Stable Diffusion, and powered by Meta's Emu image generation model

  • Leonardo.Ai, has raised $31 million USD: It offers unique features for creative industries and enterprise users, with 7 million users generating over 700 million images. The platform stands out for its customization capabilities and real-time image generation using both text and sketch prompts.

  • Runway ML develops real world models: Their new long term project, "General World Models" (GWM), aims to develop AI systems capable of simulating and understanding complex real-world environments and interactions instead of controlled settings like video games or specific applications like driving.

  • Sports Illustrated Publisher AI Controversy: The misuse of AI in content creation (potentially) led to two C-level changes at Sports Illustrated’s parent company.

  • VideoBooth - Video Generation with Image Prompts: Researchers from the S-Lab of Nanyang Technological University and the Shanghai AI Laboratory developed a new framework called VideoBooth that generates videos given an image as a prompt.