Microsoft Unveils Phi-3-Vision, a Multimodal AI Model for Image Analysis
Story: Microsoft has introduced Phi-3-Vision, a small language model that incorporates computer vision capabilities. This AI model pushes the current boundaries of what's possible with compact AI systems by enabling them to process and understand visual information alongside text.
Key Findings:
-
Compact Size: Phi-3-Vision boasts a small model size of just 4.2B million parameters, making it highly efficient and deployable on a wide range of devices.
-
Multimodal Capabilities: Unlike most small language models, Phi-3-Vision can process both text and images, allowing for more context-aware understanding and generation.
-
Vision-Language Tasks: Phi-3-Vision excels at various vision-language tasks such as image captioning, visual question answering, and image-text retrieval.
-
Broad Applications: The model's multimodal abilities open up new possibilities for AI-powered applications in fields like education, accessibility, and creative tools.
-
Open Source Release: Microsoft plans to open source Phi-3-Vision, fostering collaboration and accelerating research in the AI community.
Pixit‘s Two Cents: Microsoft's Phi-3-Vision is a big step in terms of compact AI models. By integrating computer vision into a small language model, Phi-3-Vision shows the potential for creating more capable and versatile AI assistants that can understand and interact with the world in a multimodal way (Think GPT-4o). As AI continues develop more and more in our lives, models like Phi-3-Vision could pave the way for a new generation of intelligent applications in language and vision alike. We expect to see exciting developments especially in mobile device capabilities looking at you WWDC (Apple’s coming developer conference).
Mistral launches Codestral: A Powerful Code-Centric AI Model for Developers
Story: Mistral has launched Codestral, its first code-centric large language model (LLM). This 22B parameter, open-weight generative AI model is designed to empower developers by specializing in coding tasks, from generation to completion, across more than 80 programming languages.
Key Findings:
-
Versatile Language Support: Codestral is trained on a diverse dataset of over 80 programming languages, including popular ones like Python, Java, C++, and JavaScript, as well as specialized languages like Swift and Fortran.
-
Enhanced Developer Productivity: The model assists developers by completing coding functions, writing tests, and filling in partial code using a fill-in-the-middle mechanism, reducing errors and bugs.
-
Impressive Performance: Codestral outperforms existing code-centric models like CodeLlama 70B and Deepseek Coder 33B on various benchmarks, setting a new standard for code generation performance and latency.
-
Accessible and Integratable: The model is available under a non-commercial license on Hugging Face, through dedicated API endpoints, and is integrated with popular tools like LlamaIndex, LangChain, Continue.dev, and Tabnine.
Pixit‘s Two Cents: By combining an extensive language base, impressive performance, and seamless integration with popular tools, Codestral has the potential to shake up the way developers write and interact with code. As AI continues to transform the software development landscape, models like Codestral will play a crucial role in enhancing developer productivity and enabling the creation of advanced AI applications. We expect to see many more developers leveraging Codestral's capabilities and are eager to try it out ourselves even locally with our new deep learning workstation.
Cohere’s Open-Source LLM, Aya, Speaks 23 Languages
Story: Cohere has introduced Aya 23, an AI model with open weights designed to support nearly two dozen languages. This move aims to democratize access to powerful language models by allowing researchers and developers to use and build on Aya 23.
Key Findings:
-
Multilingual Support: Aya 23 supports a wide range of languages (i.e., 23) from English, to German, to Turkish.
-
Open Weights: Cohere released the weights to enable developers and researchers to fine-tune and customize it for specific needs. You can access the weights via HuggingFace.
-
Performance: According to Cohere, Aya 23 performed better than models like Gemma, Mistral and Mixtral “on an extensive range of discriminative and generative tasks”
-
Missing Information: Cohere did not disclose any information about the training data.
Pixit‘s Two Cents: It’s great to see that LLMs are available in different languages, including languages that were not greatly supported yet. Still, other languages spoken in Asia and Africa are still missing.
Small Bites, Big Stories:
-
GitHub Copilot Extensions Launch: Integrating AI with Dev Tools: GitHub launches Copilot Extensions, allowing developers to integrate third-party skills with its AI pair programming tool, aiming to create a seamless development experience.
-
Microsoft Paint Gets AI-Powered Image Generator: Cocreator: Microsoft Paint is introducing a new AI-powered image generation tool called Cocreator, which can generate images based on text prompts and user doodles, and features a "creativity slider" to control the level of AI involvement.
-
Meta Trains AI with Public Instagram and Facebook Photos: Meta uses publicly available photos and text from Instagram and Facebook to train its text-to-image generator model, Emu, without using private user data.
-
Alphabet and Meta Offer Millions to Partner with Hollywood on AI: Alphabet and Meta are offering tens of millions of dollars to partner with Hollywood studios to license content for use in their AI video generation software.
-
SAP Expands Partnership with AWS for AI Integration: SAP announces an expanded partnership with Amazon Web Services (AWS) to accelerate and simplify the adoption of generative AI technology, such as ChatGPT, for businesses.
-
Microsoft's Recall for Copilot Plus PCs: AI-Powered Search and Retrieval: Microsoft's new Windows 11 tool, Recall, tracks and logs everything you do on your computer, allowing you to search and retrieve any activity, with features like an explorable timeline, live meeting transcription, and native integration into Windows.
-
Elon Musk's xAI Plans Supercomputer for AI Chatbot Grok: Elon Musk's xAI startup is planning to build a supercomputer to power its next AI chatbot, Grok, with a goal to launch by fall 2025, potentially partnering with Oracle.
-
Google's AI Overviews: Fixing Inaccurate Answers and the Future of Search: Google is working to remove inaccurate results from its AI-generated search results, known as AI Overviews, which have been providing humorous and sometimes bizarre answers, and is using these examples to develop broader improvements to its systems.
-
Canva Create 2024: Introducing a whole new Canva: Canva announces a major redesign, introducing a suite of new workplace products, including a redesigned editing experience, Canva Enterprise, and new tools for workplace learning, collaboration, and content creation.
-
Vox Media's Partnership with OpenAI: Vox Media has entered a partnership with OpenAI, aiming to support audience loyalty and editorial differentiation, while protecting its intellectual property and ensuring compensation for the use of its published content.
-
UALink Promoter Group Established for AI Accelerator Chips: Tech heavyweights Intel, Google, Microsoft, Meta, and others form the UALink Promoter Group to develop a new industry standard for connecting AI accelerator chips in data centers, aiming to reduce dependence on Nvidia's proprietary technology.
-
OpenAI Disrupts Covert Influence Operations Misusing AI Models: OpenAI terminated accounts linked to covert influence operations attempting to manipulate public opinion using their AI models, but found no significant audience increase resulted from the campaigns.
Tags:
Jun 3, 2024 10:21:43 AM