Skip to main content
Made with ❤️ by Pixit
Made with ❤️ by Pixit

Exploring Google's Lumiere: A New Approach to AI Video Generation

google-lumiere-in-the-context-of-business-portraits

Story: Google introduced Lumiere, a text-to-video diffusion model that improves global coherent motion (i.e., consistent motion) in videos compared to other leading text-to-video models. Moreover, the researchers introduce a new architecture called Space-Time U-Net (STUNet) that helps to generate the entire video with a single pass in the model (as opposed to other architectures that produce single keyframes with a pass in the model). Finally, the proposed model facilitates a wide range of other content creation tasks and video editing, including image-to-video, video stylization, and Cinemagraphs.

Key Findings:

  • Lumiere's Unique Method: Lumiere adopts a different approach from traditional AI video generation. Using STUNet, it understands and manipulates both the spatial and temporal dimensions in videos, leading to more fluid and natural-looking motion.

  • Advanced Frame Production: Notably, Lumiere generates up to 80 frames per video, significantly more than for example the 25 frames by Stable Video Diffusion. This results in smoother transitions and more dynamic movements.

  • Quality of Video Output: The videos produced by Lumiere show a notable improvement in realism. When compared with platforms like Runway and Meta’s Emu, Lumiere's outputs, such as the movement of a turtle, display a higher degree of lifelikeness.

  • Comparison with Competitors: Google's Lumiere offers more natural video movements than for example Runway's text-to-video platform. Moreover Lumiere allows for tasks like image-to-video, video stylization, video inpainting and Cinemagraphs.

Pixit‘s Two Cents: The introduction of STUNet in Lumiere represents an important development in AI video generation, particularly in the way it handles motion. The technology's ability to mimic real-life movements, like that of the turtle in the demonstration, is really stunning. We are also very eager to see the other abilities like video inpainting come to fruition, since it allows for many interesting use cases.


Nightshade: Protecting Digital Content From Being Used Without Consent

poisoin-for-image-models-relevant-for-business-photography

Story: Nightshade, a tool to protect your media content and copyrights, is finally available (we reported about an early version of Nightshade in one of our previous blogs). The tool turns any image into a poisoned data sample such that the models that were trained on these images will deteriorate in performance and learn unpredictable behaviors that deviate from expected norms.


Key Findings:

  • Image Poisoning: Nightshade applies a perturbation to the original images, that is, it changes the pixel values. However, because the values of the pixels are changed only marginally, the poisoned images are indistinguishable from non-poisoned image (at least in theory! In practice you might see some differences)

  • Example: Human eyes might see a shaded image of a cow in a green field largely unchanged, but an AI model might see a large leather purse lying in the grass. Trained on a sufficient number of shaded images that include a cow, a model will become increasingly convinced cows have nice brown leathery handles and smooth side pockets with a zipper, and perhaps a lovely brand logo.

  • Generalisability: Using the example from above, Nightshade infects not only the concept of cows but also similar concepts, such as cattle, bulls or oxen

  • Consequences: With Nightshade, 50 to 100 images are typically enough to deteriorate model performance and to lead the model learning unpredictable behaviours

Pixit‘s Two Cents: Nightshade is one of the first tools helping creators, artists, and alike to protect their content from being used without consent. By unintentionally using a few poisoned images, developers put their models into jeopardy. Thus, developers are forced to look for licensing images. Awesome!


EU’s AI Act was leaked preliminary

leaked-final-draft-ai-act-relevant-for-generative-art

Story: We're examining the leaked final draft of the EU's AI Act, a comprehensive 900-page document that outlines future regulations for artificial intelligence. This critical legislation reflects the culmination of negotiations between the EU Commission, Council, and Parliament, and it's supposed to shape the landscape of AI development and usage. Please keep in mind that it is only a final draft and not in effect yet.


Key Findings:

  • Diverse Perspectives and Consensus: The document showcases the varying positions of key decision-makers, with columns dedicated to the EU Commission's draft text and the desires of the Parliament and Council. A significant focus has been on reaching a consensus that balances market interests with human-centric objectives.

  • Market vs. Human-Centric Approach: Initially focusing on market regulation, the Parliament has pushed for a shift towards prioritizing human interests, aiming to ensure the development of trustworthy AI that protects health, safety, rights, and the environment. Nevertheless, the market interest are of high relevance.

  • Emphasis on Health and 'Green AI': Discussions have delved into the nuances of health protection, with the final version advocating for high-level health protection. The concept of 'green AI' has also been introduced, highlighting the need for environmentally conscious AI development.

  • Biometric Data Regulation: One of the most contentious issues, the use of biometric data, is addressed with a focus on privacy and data rights. The AI Act sets detailed guidelines for the application of biometric technologies, balancing innovation with ethical considerations.

  • Shift from Foundation Models to General Purpose AI: The Act transitions from regulating Foundation Models to General Purpose AI, imposing requirements for labeling AI-generated content and ensuring technical robustness and reliability.

Pixit‘s Two Cents: The AI Act represents a big step forward in regulating AI, with a nuanced approach that considers market dynamics, human-centric values, and ethical concerns. We think it is essential to strike a balance between giving innovation room to breathe and safeguarding public interests in this rapidly advancing field. We are also excited how exactly some of the current catchwords turn out to be in reality when they have to be implemented by companies around Europe.


Small Bites, Big Stories:

  • Mixtral now integrated in Brave Browser: Brave's AI browser assistant, Brave Leo, now integrates Mixtral 8x7B, an open-source large language model (LLM) developed by Mistral AI

  • Runway to unveil new Multi Brush Tool: RunwayML has introduced a new Multi Motion Brush tool in its AI video generator, enhancing the ability to animate specific areas within a video. This tool allows users to define and animate up to five different regions independently.

  • OpenAI launches @mention feature: OpenAI has launched the possibility to mention other previously created GPTs, thus allowing to call in multiple experts in one Chat. The feature did not roll out globally yet.

  • Adobe introduces new AI-powered Premiere Pro features: The features are supposed to remove tedious work required to complete editing tasks, such as enhancing speech or audio category tagging. The AI-powered features are available in Premiere Pro beta.

  • Audi uses AI to generate rim designs: Audi internally uses the self developed “FelGAN” to create rim designs as inspiration for their designers and developers.

Tags:
Pix
Post by Pix
Jan 29, 2024 10:36:47 AM