StabilityAI Releases Stable Diffusion 3 Medium
Story: StabilityAI released Stable Diffusion 3 (SD3), a Multimodal Diffusion Transformer (MMDiT) text-to-image model that shows great advancements in image quality, typography, complex prompt comprehension, and resource efficiency. The model utilizes three text encoders (i.e., OpenCLIP-ViT/G, CLIP-ViT/L, and T5-XXL), employs a 16-channel VAE, which allows to capture more training details, and features 2 billion parameters (for comparison - SDXL: 6.6B and SD1.5: 1B parameters). SD3 was pre-trained on 1 billion synthetic and filtered publicly available data.
Key Findings:
-
Technical Details: The backbone of SD3 consists of Diffusion Transformers (see here for an explanation) and the three text-encoders produce two different text representations (vector conditioning: 77x2049 and context representation: 154x4096) that are used to improve image captioning and typography.
-
License: You can use the model for free under the open non-commercial license or pay 20$ for the creator license.
-
Training: After pre-training, the fine-tuning data includes 30M high-quality aesthetic images + 3M preferences data images.
- Overall Quality, Prompt Understanding, and Typography: SD3 excels in delivering photorealistic images with exceptional detail, color, and lighting (e.g., realism in hands and faces), while also mastering complex prompt comprehension (e.g., spatial reasoning) and high-quality typography (e.g. fewer errors in spelling).
Pixit‘s Two Cents: Finally! We were waiting almost over half a year to get access to the newest SD model. The first results are promising, and we are excited to build our applications on top of SD3.
Apple Introduced Apple Intelligence at WWDC 2024
Story: At the Worldwide Developers Conference (WWDC) 2024, Apple introduced Apple Intelligence, a personal intelligence system that puts generative AI at the core of your iPhone, iPad, and Mac. The tool draws on your personal context, while protecting your privacy, and enables AI applications in language, text, and image, all designed to enhance user experience and productivity. You can find the full video here.
Key Findings:
-
Language and Text Processing: Apple introduced system-wide writing tools, inclduing priority notifications, rewriting, proofreading, and revising capabilities not only in Apple apps (e.g., Mail) but also across third party apps. Users will be able to rewrite texts in professional, friendly, or concise tones and summarize emails with ease.
-
Image Generation and Personalization: Users will be able to personalize their images with photos and emojis, create sketches, illustrations, and animations, and even generate “Genmojis”, that is, create their own emojis using generative AI. Additionally, users can search photos via Siri (e.g., “give me all the photos of me wearing Pixit merch”) and create memories using Apple Intelligence.
-
Action Features: Apple’s AI can perform tasks like transcribing audio files, processing data across various apps, and understand photos. For example, users can ask Siri to check if it is possible to move a meeting to another date considering travel time, other meetings during that time, and more - crazy! It also can find the information from the photo you took from your passport and use that to automatically fill in a form.
- On-Device and Server-Based Processing: Apple prioritises on-device processing to ensure privacy, resorting to server-based processing only when necessary. For the latter they use “Private Cloud Compute” to protect your privacy.
-
Pixit‘s Two Cents: Apple’s Apple Intelligence system sounds absolutely amazing. We’re super excited to see generative AI in use and if even half of the applications they introduced at WWDC are working, it will be a lot of fun! Finally, we can experience with the AI capabilities we have been waiting for.
Meta Pauses AI Training with European User Data Amid Regulatory Pressure
Story: Meta has halted its plans to use data from European users to train its AI models, citing increased regulatory scrutiny and data privacy concerns. The decision follows a lot of pressure from European regulators, who have been tightening their data protection laws and enforcement measures.
Key Findings:
-
Regulatory Compliance: Meta's decision reflects its commitment to comply with stringent European data privacy regulations, including the General Data Protection Regulation (GDPR).
-
Data Privacy Concerns: The Irish Data Protection Commission (DPC), on behalf of several data protection authorities in the EU, was putting a lot of pressure on Meta because of the growing concerns about user data protection and the ethical implications of using personal data for AI training.
-
Impact on AI Development: This pause may slow down Meta's AI development efforts as it seeks alternative data sources and adjusts its strategies to align with regulatory requirements. Especially considering that Meta argued that it needed the data from European users to reflect “the diverse languages, geography and cultural references of the people in Europe”.
Pixit‘s Two Cents: This development underscores the critical importance of data privacy in AI training. Companies must navigate complex regulatory landscapes to innovate responsibly. Meta’s decision highlights the braoder industry challenge of balancing innovation with regulatory compliance and data privacy.
Small Bites, Big Stories:
-
Shutterstock Made $104 Million Licensing Assets to AI Devs Last Year: Licensing assets is a significant and growing part of Shutterstock’s business operation and last year the company made $104 in through licensing agreements alone.
-
TechScape: What we learned from the global AI summit in South Korea: Leaders from industry agreed about the importance of providing transparency and accountability in the age of (generative) AI signing multiple agreements, pacts, pledges, and statements.
-
Microsoft-backed Mistral AI raises $645 million at a $6 billion valuation: Mistral AI, a french startup, raised $645 million to further develop LLMs (e.g., Mistral 8x 7B). The company needs the money to train big models and keep up with big companies like Microsoft or Google.
-
How to opt out of Meta’s training: A great guide (for users in Europe and the UK) on how to opt out from Meta scraping your data.
-
Picsart teams up with Getty to take on Adobe’s ‘commercially-safe’ AI: Picsart and Getty are developing an AI image generator that’s trained only on Getty’s licensed photos.
-
Microsoft’s all-knowing Recall AI feature is being delayed: The tool screenshotting everything you do on your computer (i.e. Copilot Plus PC) will not be part of the launch this week.
-
How much does ChatGPT cost? Everything you need to know about OpenAI’s pricing plans: A great guide on OpenAI’s pricing plans for ChatGPT free, ChatGPT Plus ($20), ChatGPT Team (~$30 per user), ChatGPT Enterprise (~$60 per user), and ChatGPT Edu.
-
Spotify announces an in-house creative agency, tests generative AI voiceover ads: Spotify came up with an in-house agency called Creative Lab to help brands create custom marketing campaigns.
-
Amazon says it’ll spend $230 million on generative AI startups: Amazon invests millions of dollars, in the form of AWS credits, into startups developing generative AI models to power their products, apps, and services.
-
Databricks expands Mosaic AI to help enterprises build with LLMs: Databricks is launching five new tools: Mosaic AI Agent Framework, Mosaic AI Agent Evaluation, Mosaic AI Tools Catalog, Mosaic AI Model Training and Mosaic AI Gateway.
Tags:
Jun 17, 2024 10:16:13 AM