AI Prototyping to Production with Promptflow

AI Prototyping to Production with Promptflow Daron Yöndem Azure Application
Innovation Tech Lead Microsoft http://daron.me @daronyondem

AI is everywhere

87% of organizations believe AI will give them a competitive
edge Source: MIT Sloan Management Review

Getting Started The state of the art is evolving so
quickly, it makes it difficult to decide what to use. Along with that, guidance and documentation is hard to find. Development Applications often require multiple cutting-edge products and frameworks which requires specialized expertise and new tools to stitch these components together. Context Large Language Model doesn't know about your data Evaluation It is hard to figure out which model to use and how to optimize for their use case. Operationalization Concerns around privacy, security, and grounding. Developers lack the experience and tools to evaluate, improve and validate the solutions for their Proof of Concepts, and to scale and operate in production. What slows down GenAI adoption?

People Introducing LLMOps How to bring LLMs apps to production
Process Platform Bring together people, process, and platform to automate LLM-infused software delivery & provide continuous value to our users.

The paradigm shift from MLOps to LLMOps Traditional MLOps LLMOps
Target audiences Assets to share Metrics/evaluations ML models ML Engineers Data Scientists ML Engineers App developers Model, data, environments, features LLM, agents, plugins, prompts, chains, APIs Accuracy Quality: accuracy, similarity Harm: bias, toxicity Correct: groundness Cost: token per request Latency: response time, RPS Build from scratch Pre-built, fine-tuned served as API (MaaS)

Deployment & Inferencing Package and deploy the LLM flow as
a scalable container for making predictions. Additionaly enable Blue/Green deployment with traffic routing control so that A/B testing can be done for the LLM flow. Prompt Engineering Prompt engineering or tuning with instructions describing the tasks that will be performed by the LLM model along with several measures for securities. CI CE and CD Continious Integration, Continious Evaluation and Continous Deployment of the LLM flows to maintain code quality with engineering best practices, comparing LLM performance and promotion to the higher environments. Foundational LLM Selection of the right Foundation Models such as Azure OpenAI models, Llama2, Falcon or any models from HuggingFace. If necessary, a fine- tuned model. Data & Services Enrich LLM models with domain scpecifc grounding data (RAG pattern) or enable in-context learning with use case speicifc examples. Monitor Monitoring performance metrics for the LLM flow, detecting data drifts and communicating the model's performance to stakeholders. Online Evaluation LLM online evaluations are very criticle to understand the performance, potentials risks, etc, where the LLM answer will be evaluated by one or more evaluation mechanism. Experiment & Evaluate Execute the flow (prompt + additional data or services) end-to-end with s ample input data. Evaluate the responses from LLM for large datasets against ground truth (if any) or if answer is relevant as per the context. LLM LifeCycle

Operationalize LLM app development with prompt flow LLMOps is a
complex process. Customers want: • Private data access and controls • Prompt engineering • CI/CD • Iterative experimentation • Versioning and reproducibility • Deployment and optimization • Safe and Responsible AI Design and development Develop flow based on prompt to extend the capability Debug, run, and evaluate flow with small data Modify flow (prompts and tools etc.) No If satisfied Yes Evaluation and refinement No Evaluate flow against large dataset with different metrics (quality, relevance, safety, etc.) If satisfied Yes Optimization and production Optimize flow Deploy and monitor flow Get end user feedback

Azure AI Prompt Flow • Create AI workflows that connect
various language models, APIs, and data sources to ground LLMs on your data. • One platform to design, construct, tune, evaluate, test, and deploy LLM workflows • Evaluate the quality of workflows with rich set of pre-built metrics and safety system. • Easy prompt tuning, comparison prompt variants, and version- controlling. Streamline prompt engineering projects

Flow Orchestration Develop your LLM flow from scratch • Use
any framework such as LangChain or Semantic Kernel to build initial flows • Add your own reusable tools • Manage your flows as files on disk • Track run history

Integration Management Manage APIs and external data sources • Seamless
integration with pre-built LLMs like Azure OpenAI Service • Built-in safety system with Azure AI Content Safety • Effectively manage credentials or secrets for APIs • Create your own connections in Python tools

LLM Tuning Variants • Create dynamic prompts using external data
and few shot samples • Edit your complex prompts in full screen • Quickly tune prompt and LLM configuration with variants

Prompt Evaluation Evaluation • Evaluate flow performance with your own
data • Use pre-built evaluation flows • Compare multiple variants or runs to pick best flow • Ensure accuracy by scaling the size of data in evaluation • Build your own custom evaluation flows Tune Variant 0 Tune Variant 1 Tune Variant 2 Flow variants Evaluatio n Bulk Test

Evaluation Metrics • Groundedness:evaluates how well the model's generated answers
align with information from the input source. • Relevance: evaluates the extent to which the model's generated responses are pertinent and directly related to the given questions. • Coherence: evaluates how well the language model can produce output flows smoothly, reads naturally, and resembles human-like language. • Fluency: evaluates the language proficiency of a generative AI's predicted answer. It assesses how well the generated text adheres to grammatical rules, syntactic structures, and appropriate usage of vocabulary, resulting in linguistically correct and natural-sounding responses. • Similarity: evaluates the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model.

Deployment Deploy • Seamless transition from development to production with
AzureML’s managed online endpoints, Azure Kubernetes Service and Azure App Service. Productio n Tune Variant 0 Tune Variant 1 Tune Variant 2 Flow variants Test App

LLM Chain Evaluation DEMO

Resources Go To https://aka.ms/prompt_flow • Getting Started with Azure AI
Studio's Prompt Flow https://www.youtube.com/watch?v=vkM_sgaMTsU • LLMOps with Azure Prompt Flow & Github https://www.youtube.com/watch?v=j0YJ3BZjrFs

Thanks http://daron.me | @daronyondem Grab slides on http://decks.daron.me/

AI Prototyping to Productionwith Promptflow

AI Prototyping to Production with Promptflow

Daron Yondem

More Decks by Daron Yondem

Other Decks in Programming

Featured

Transcript