Skip to main content
Made with ❤️ by Pixit
Made with ❤️ by Pixit

Great Advances in AI Video Generation: Runway's Gen-2 Update

week_46_1

Story: Runway, a leading New York City-based generative AI video startup, has recently made headlines with its Gen-2 update. This significant upgrade to their text/image/video-to-video model has captivated users and industry experts alike. Their upgrade has not only increased the resolution of generated videos but also enhanced their realism, sharpness and consistency, resulting in more detailed and smoother motion as well as other added features.

Key Findings:

  • The upgrade includes significant improvements in video resolution and enhancing the realism of the generated videos.
  • The new model also ensures smoother, more natural motion, and lifelike clarity in videos. It maintains continuity and coherence across frames, reducing visual glitches and distortions.
  • The upgrade allows Gen-2 to now create videos with a length of up to 18 seconds.
  • The new “Director Mode” allows users to choose the movement and speed of the AI “camera” movement in the generated videos.

Pixit‘s Two Cents: The advancements Runway's Gen-2 model brings to AI video generation are not just technically impressive, they shift the boundaries of creative possibility. The improved resolution and realism in the videos produced as well as the newly achieved consistency and ways to influence the videos, like the “director mode”, give it more and more of the features that are needed for productive use cases.


A Deep Dive into DALL-E 3's Multilingual Capabilities

week_46_2

Story: Yennie Jun delves into the intricacies of AI image generation, particularly focusing on how Open AI’s DALL-E 3 handles prompts in various languages. The study reveals fascinating insights about the model's prompt transformation capabilities and the inherent biases that accompany them.

Key Findings: 

  • Prompt Transformations in Multiple Languages: Jun discovered that irrespective of the original prompt's language, DALL-E 3 always transformed it into English. This step could potentially alter the original meaning and nuances of the prompt.
  • Gender and Age Biases: The study showed that DALL-E 3 tended to generate gendered prompts even from neutral ones. Furthermore, it highlighted a tendency to describe women as younger, in contrast to a broader age range for men.
  • Language-Specific Prompt Transformations: The language of the original prompt appeared to influence the transformed prompt. For instance, prompts in Burmese often led to the generation of images featuring Burmese individuals.

Pixit‘s Two Cents: This research underscores the complexity and the nuanced challenges in AI image generation, especially when dealing with multilingual inputs. The findings about automatic prompt transformations and the biases they reveal are significant. They demonstrate the technological limitations and ethical considerations in AI development. Especially in the light of findings like these, we recognize the importance of improvements in the inclusivity and accuracy of AI models.


OpenAI Unveils a Wave of Innovations at Its First Developer Event

week_46_3

Story: OpenAI's first developer event on 6/11/2023 was packed with significant announcements. The event showcased a series of advancements and launches. Among the key developments were the introduction of new models and APIs, marking a huge leap in AI capabilities.

Key Findings: 

  • GPT-4 Turbo: OpenAI unveiled GPT-4 Turbo, an improved version of the GPT-4 model. This model comes in two variants: one for text analysis and another for both text and image understanding. Notably, GPT-4 Turbo has a significantly larger context window (128k tokens) than GPT-4 and a more recent knowledge cut-off date (April 2023).
  • Customizable GPTs and GPT Store: Users can now create their own GPT versions for various applications. OpenAI also plans to launch a GPT Store for user-created AI bots, which will initially feature creations from verified builders.
  • New Assistants API: This newly launched API allows developers to build their own agent-like experiences, ranging from coding assistants to AI-powered holiday planners.
  • DALL-E 3 API: The DALL-E 3 text-to-image model is now accessible via an API, complete with built-in moderation tools.
  • Copyright Shield Program: A new initiative to protect businesses using OpenAI’s products from copyright claims was announced. This program covers legal fees for customers facing IP lawsuits related to content created by OpenAI’s tools.

Pixit‘s Two Cents: TThe introduction of GPT-4 Turbo and the DALL-E 3 API signify major leaps in AI capabilities, offering greater depth and versatility in AI-generated content. The ability for users to create and publish their own GPT versions fosters a community-driven approach to AI development. At Pixit we already started experimenting with our own “Pixit GPT” for various tasks. Moreover, OpenAI's focus on user protection through the Copyright Shield program reflects a larger trend of big tech companies using this approach to reassure the safe usage of their models.


Small Bites, Big Stories:

  • MidJourney, Stability AI, and DeviantArt's Copyright Battle: In a significant legal development, a U.S. judge dismissed copyright infringement claims against MidJourney, Stability AI, and DeviantArt based on the claim that they are by and large trained on human artists’ work.
  • Scarlett Johansson's Legal Action Against AI Ad: Scarlett Johansson has taken legal action against an AI application that used her likeness in an online advertisement without her consent. This raises significant legal and ethical questions regarding the use of AI-generated images.
  • AI Image Detection Tools’ Uncertainty Lead to Another Level of Misinformation: “AI or Not”, a tool determining whether an image has been generated by AI, was labelling a photo from the Israel-Palestine conflict as AI-generated, sparking a debate about the accuracy of such tools. This debate highlights the challenges of image detection tools.
  • Stability AI’s Leap into 3D: Stability AI announced a preview of “Stable 3D”, a tool designed create 3D contents for graphic designers, digital artists, and game developers. With Stable 3D, users can generate 3D models in minutes, either from images, illustrations, or textual instructions.

Tags:
Pix
Post by Pix
Nov 13, 2023 11:49:04 AM