Skip to main content
Made with ❤️ by Pixit
Made with ❤️ by Pixit

StabilityAI Releases Stable Diffusion 3 Medium

SD3

Story: StabilityAI released Stable Diffusion 3 (SD3), a Multimodal Diffusion Transformer (MMDiT) text-to-image model that shows great advancements in image quality, typography, complex prompt comprehension, and resource efficiency. The model utilizes three text encoders (i.e., OpenCLIP-ViT/G, CLIP-ViT/L, and T5-XXL), employs a 16-channel VAE, which allows to capture more training details, and features 2 billion parameters (for comparison - SDXL: 6.6B and SD1.5: 1B parameters). SD3 was pre-trained on 1 billion synthetic and filtered publicly available data.

Key Findings:

  • Technical Details: The backbone of SD3 consists of Diffusion Transformers (see here for an explanation) and the three text-encoders produce two different text representations (vector conditioning: 77x2049 and context representation: 154x4096) that are used to improve image captioning and typography.

  • License: You can use the model for free under the open non-commercial license or pay 20$ for the creator license.

  • Training: After pre-training, the fine-tuning data includes 30M high-quality aesthetic images + 3M preferences data images.

  • Overall Quality, Prompt Understanding, and Typography: SD3 excels in delivering photorealistic images with exceptional detail, color, and lighting (e.g., realism in hands and faces), while also mastering complex prompt comprehension (e.g., spatial reasoning) and high-quality typography (e.g. fewer errors in spelling).

Pixit‘s Two Cents: Finally! We were waiting almost over half a year to get access to the newest SD model. The first results are promising, and we are excited to build our applications on top of SD3.

Apple Introduced Apple Intelligence at WWDC 2024

apple

Story: At the Worldwide Developers Conference (WWDC) 2024, Apple introduced Apple Intelligence, a personal intelligence system that puts generative AI at the core of your iPhone, iPad, and Mac. The tool draws on your personal context, while protecting your privacy, and enables AI applications in language, text, and image, all designed to enhance user experience and productivity. You can find the full video here.

Key Findings:

  • Language and Text Processing: Apple introduced system-wide writing tools, inclduing priority notifications, rewriting, proofreading, and revising capabilities not only in Apple apps (e.g., Mail) but also across third party apps. Users will be able to rewrite texts in professional, friendly, or concise tones and summarize emails with ease.

  • Image Generation and Personalization: Users will be able to personalize their images with photos and emojis, create sketches, illustrations, and animations, and even generate “Genmojis”, that is, create their own emojis using generative AI. Additionally, users can search photos via Siri (e.g., “give me all the photos of me wearing Pixit merch”) and create memories using Apple Intelligence.

  • Action Features: Apple’s AI can perform tasks like transcribing audio files, processing data across various apps, and understand photos. For example, users can ask Siri to check if it is possible to move a meeting to another date considering travel time, other meetings during that time, and more - crazy! It also can find the information from the photo you took from your passport and use that to automatically fill in a form.

  • On-Device and Server-Based Processing: Apple prioritises on-device processing to ensure privacy, resorting to server-based processing only when necessary. For the latter they use “Private Cloud Compute” to protect your privacy.
  • Research Paper: Few know, but Apple released a paper on the reborn of Siri that disclosed a lot of information about a system called Ferret-UI (a multimodal vision-language model) that understands icons, widgets, and text on IOS mobile screen. You can access the paper here.

Pixit‘s Two Cents: Apple’s Apple Intelligence system sounds absolutely amazing. We’re super excited to see generative AI in use and if even half of the applications they introduced at WWDC are working, it will be a lot of fun! Finally, we can experience with the AI capabilities we have been waiting for.


Meta Pauses AI Training with European User Data Amid Regulatory Pressure

meta

Story: Meta has halted its plans to use data from European users to train its AI models, citing increased regulatory scrutiny and data privacy concerns. The decision follows a lot of pressure from European regulators, who have been tightening their data protection laws and enforcement measures.

Key Findings:

  • Regulatory Compliance: Meta's decision reflects its commitment to comply with stringent European data privacy regulations, including the General Data Protection Regulation (GDPR).

  • Data Privacy Concerns: The Irish Data Protection Commission (DPC), on behalf of several data protection authorities in the EU, was putting a lot of pressure on Meta because of the growing concerns about user data protection and the ethical implications of using personal data for AI training.

  • Impact on AI Development: This pause may slow down Meta's AI development efforts as it seeks alternative data sources and adjusts its strategies to align with regulatory requirements. Especially considering that Meta argued that it needed the data from European users to reflect “the diverse languages, geography and cultural references of the people in Europe”. 

Pixit‘s Two Cents: This development underscores the critical importance of data privacy in AI training. Companies must navigate complex regulatory landscapes to innovate responsibly. Meta’s decision highlights the braoder industry challenge of balancing innovation with regulatory compliance and data privacy.


Small Bites, Big Stories:

Tags:
Pix
Post by Pix
Jun 17, 2024 10:16:13 AM