AI News #66

Geschrieben von Pix | Apr 15, 2024 7:23:16 AM

How People Are Really Using LLMs

Story: While ChatGPT, and LLMs in general, boasts hundreds of millions of users and promises multi-trillion dollar contributions to the economy, many still hesitate due to AI inaccuracies and, maybe more importantly, perceived lack of practical applications. Filtered Technology explored the web to find specific, real-life applications of ChatGPT and alike that span personal, professional, and creative use cases.

Key Findings:

Use Cases for Generative AI: The company divided 100 categories into six top-level themes, including: Technical assistance & troubleshooting (23%), content creation & editing (22%), personal & professional support (17%), learning & education (15%), creativity & recreation (13%), and research analysis & decision making (10%).
Effective Brainstorming: The most common use is generating new ideas.
Examples Use Cases
- Using LLMs as a search engine
- Editing text
- Drafting emails
- Using LLMs as an explanation tool
- Producing demo data

Pixit‘s Two Cents: The work by Filtered Technology provides an excellent foundation for anyone exploring ChatGPT for the first time or seeking to discover new use cases. The use cases identified offer a glimpse into how AI can be integrated meaningfully into daily activities, supporting both personal and professional growth. At Pixit we are using LLMs mainly to write & review code for our applications, text editing, and content creation. In addtion, we’d love to see a similar report for the applications of diffusion models.

Generative AI by iStock

Story: Getty Images has recently introduced Generative AI by iStock, targeting small businesses, designers, and marketers. This tool offers “affordable and commercially safe” generative AI capabilities while minimizing legal risks and costs (we reported here). In this article, the author analyzed the tool from a risk perspective, that is, with respect to it being commercially safe and legally protected.

Key Findings:

Commercial Safety and Legal Protection: Getty Images assures that its AI technology is commercially safe and legally protected, a claim substantiated by their exclusive training data (”If you have complete control over what the model is being trained on, we know it cannot infringe on IP”, Grant Farhall, Chief Product Officer at Getty Images)
Legal Safeguards: To bolster confidence, Getty Images provides $10.000 of legal protection for images generated on iStock, with more extensive indemnification options available through other platforms and APIs.
Controlled Output: The AI model is programmed to avoid generating images of copyrighted figures, trademarks, or sensitive content, such as famous personalities or branded products, ensuring compliance and reducing the risk of legal complications. In addition, the system prevents the generation of specific brands.
Ownership and Usage Rights: Getty Images clarifies that while generated images may inform future training, they do not claim ownership of the generated content, ensuring users retain control over their creations.

Pixit‘s Two Cents: Getty Images move to create a commercially safe text-to-image model is a smart move. Compared to other models (e.g., DALL-E and Midjourney), the model from Getty Images might be a appealing choice for individuals and businesses who are afraid of legal consequences.

Apple just unveiled new Ferret-UI LLM - An AI that reads your iPhone screen

Story: Researchers at Apple have been working on an AI model that is able to understand what is going on with your phone screen called Ferret-UI. It is one of many works Apple is releasing recently that try run on your phone and understand the context it works in. Ferret UI MLLM (Multimodal large language model) leverages advanced computer vision and natural language processing techniques to analyze the screen content in real-time, enabling it to offer relevant suggestions, perform tasks, and provide personalized support to users.

Key Findings:

Screen Reading Capabilities: Ferret UI can read and understand the content displayed on an iPhone screen, allowing it to provide context-aware assistance and recommendations to users.
Advanced AI Techniques: The AI model leverages state-of-the-art computer vision and natural language processing techniques to analyze screen content in real-time, enabling it to offer relevant suggestions and perform tasks based on the displayed information.
Enhancing User Experience: By providing personalized support and recommendations based on the content users are interacting with, Ferret UI aims to enhance the overall user experience and make iPhone usage more efficient and intuitive.
Seamless Integration: As an Apple-developed AI model, Ferret UI is expected to integrate seamlessly with the company's ecosystem, potentially extending its capabilities to other Apple devices and services in the future.
Potential Applications: While the full scope of Ferret UI applications is yet to be revealed, it could potentially assist users in various scenarios, such as providing relevant information, automating tasks, and offering intelligent suggestions based on the content they are viewing or interacting with on their iPhone screens.

Pixit‘s Two Cents: The ability to read and understand iPhone screen content opens up a world of possibilities for context-aware assistance and personalized recommendations. As users increasingly rely on their smartphones for a wide range of tasks, having an AI model that can provide intelligent support based on the information they are interacting with could be a game-changer. However, as with any AI technology that analyzes user data, privacy and security concerns will need to be addressed. Apple's strong track record in protecting user privacy and their efforts to get the models running on their own A-Series & M-Series silicon look very promising to us.

Small Bites, Big Stories:

OpenAI's Sora just made its first music video and it's like a psychedelic trip: OpenAI published the first AI made music video for the song Worldweight by August Kamp. You can find the video here.
Microsoft 365’s Copilot gets a GPT-4 Turbo upgrade and improved image generation: Subscribers to Microsoft 365 can now (a) access GPT-4 Turbo and (b) create up to 100 images using Microsoft Designer (instead of 15).
Introducing improvements to OpenAI’s fine-tuning API and expanding the custom models program: OpenAI announced six new features for its fine-tuning API and an extension of its custom models program.
Showing AI just 1000 extra images reduced AI-generated stereotypes: Researchers made an AI image generator produce less offensive images by feeding it a tiny amount of additional training data
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models: Researcher found that smaller latent diffusion models (LDMs) often outperform larger ones in generating high-quality results when operating under a given inference budget.
Adobe Is Buying Videos for $3 Per Minute to Build AI Model: The company is offering photographers and artist $120 to submit videos of people with the ultimate goal of building a text-to-video generator.
GPT-4-Turbo’s knowledge updated until the end of 2023 (German only): OpenAI updated its GPT-4-Turbo model and incorporated data up to the end of 2023.
Archetype Wants to Let You Talk to Your House, Car, and Factories: Archetype, a new AI startup, aims to create digital twins of physical objects like houses, cars, and factories, allowing users to interact with them using natural language and receive real-time information and control.
Inside Big Tech's Underground Race to Buy AI Training Data: As the demand for AI training data surges, tech giants are engaging in an underground race to acquire diverse datasets, raising concerns about data privacy, ethics, and the potential for biased AI systems.
Introducing Stable LM 2.12B: Stability AI releases Stable LM 2.12B, a new open-source language model that offers improved performance, efficiency, and multilingual support.
Meta Confirms That Its Llama 3 Open Source LLM Is Coming in the Next Month: Meta announces that it will release Llama 3, an open-source large language model, within the next month.
Embodied Question Answering for Robotics and AR Glasses: Meta introduces OpenEQA, a new dataset and benchmark for embodied question answering, which aims to advance the development of AI systems that can understand and interact with the physical world, with applications in robotics and augmented reality.
Google Cloud Introduces Imagen 2: Google Cloud unveils Imagen 2, an enhanced version of its text-to-image generation model, offering improved quality, control, and safety features.

Vollständigen Beitrag anzeigen