Microsoft Expands AI Offerings with Phi-3-Vision and GitHub Copilot Extensions

Phi-3-Vision

Microsoft has announced Phi-3-Vision, a 4.2B parameter multimodal model.

Model Summary

Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Azure

Huggingface

GitHub Copilot Extensions

GitHub has announced extensions for GitHub Copilot.

AI-Generated Summary

GitHub introduces Copilot Extensions, expanding the capabilities of the widely adopted AI developer tool by integrating a growing ecosystem of partner tools and services. Copilot Extensions enable developers to seamlessly interact with various tools and services within their IDE or GitHub.com, enhancing their productivity and innovation through natural language.

Key Highlights:

  • Partner Ecosystem: Initial partners include DataStax, Docker, LambdaTest, LaunchDarkly, McKinsey & Company, Microsoft Azure and Teams, MongoDB, Octopus Deploy, Pangea, Pinecone, Product Science, ReadMe, Sentry, and Stripe.
  • Seamless Workflow: Developers can access, manage, and deploy various tools without context-switching, all within GitHub Copilot Chat, Visual Studio, and VS Code.
  • Private Extensions: Organizations can create custom Copilot Extensions for their internal tools, enhancing their bespoke developer workflows.
  • Natural Language Integration: Copilot Extensions allow developers to perform tasks, retrieve data, and solve problems using natural language commands.

How it Works:
Developers can manage incidents, troubleshoot issues, and deploy solutions by invoking various tools through GitHub Copilot Chat, significantly reducing context-switching and improving workflow efficiency.

Future Outlook:
GitHub aims to make Copilot the most integrated and powerful AI platform, lowering the barrier to entry for software development and enabling a broader range of developers to innovate effortlessly.

Getting Started:
Access to Copilot Extensions is available through the GitHub Marketplace and Visual Studio Marketplace. Developers and organizations are encouraged to join the Copilot Partner Program to integrate their tools and services into the Copilot ecosystem.

9 Likes

great news, this should lower barrier for implementing AI and lead to wider adoption

1 Like

As someone who uses Copilot a lot, mainly for boilerplating, it absolutely sucks at explaining and debugging.

I have NEVER used “Fix This”, as it almost always just brute forces a solution that is just … awful.

The explaining is often way too generalized and not entirely matching the context. I can see they’re trying harder for it to understand the context but in most cases I have much better results selectively copying and pasting the code into ChatGPT.

It’s autocompletion feature is the ONLY feature I want. These extensions are coming in way too fast and I CANNOT see them being useful.

Although I see the benefits here I just wished the fundamentals were focused on more instead of these flashy new features. This just feels like building a mansion on top of 4 vertical logs

I’m excited about Phi-3 being multi-modal though.

3 Likes

I agree.

I’ve personally found GitHub Copilot to be the most reliable coding assistant LLM, with Copilot Extensions the utility of that service will be greatly magnified and I expect it should enable small development teams (including solo devs) to more quickly and reliably develop more sophisticated products.

I’m also very excited whenever a new “small” language model is released.

I think the best AI applications are the ones which utilize multiple models, each doing what they’re best at.

While gpt-4o will almost certainly outperform phi-3-vision-128k-instruct at the vast majority of vision tasks, there will absolutely a place for a small vision-capable model in many workflows.

Best performing local vision model I’ve tried. Does a great job with text extraction

1 Like

Today, I plan on toying with it a bit to see how it does with converting rendered math expressions to \LaTeX.

If it does a serviceable job I might look into further fine-tuning for that task.