Cost Governance |
Cost Management |
Verify PTU cost savings vs pay as you pricing for Azure OpenAI and OpenAI models. |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Cost Management |
Ensure the right and cost effective model is in use, unless the use case demands a more expensive model. |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Cost Management |
Allocate provisioning quotas for each model based on expected workloads to prevent unnecessary costs. |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Cost Management |
Use the right deployment type, global deployment offers lower cost-per-token pricing on certain GPT models. |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Cost Management |
Choose the right hosting infrastructure, depending on your solution’s needs e.g. managed endpoints, AKS or Azure App Service. |
NA |
🟡 Medium |
|
Cost Governance |
Cost Management |
Define and enforce a policy to automatically shutdown Azure AI Foundry and Azure Machine Learning compute instances. |
Azure AI Foundry |
🔵 Low |
link |
Cost Governance |
Cost Management |
Configure ‘Actual’ and ‘Forecasted’ Budget Alerts. |
Azure Cost Management |
🟡 Medium |
link |
Cost Governance |
Token Optimization |
Use prompt compression tools like LLMLingua or gprtrim |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Token Optimization |
Use tiktoken to understand token sizes for token optimizations in conversational mode |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Cost familiarization |
Understand difference in cost of base models and fine tuned models and token step sizes |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Batch processing |
Batch requests, where possible, to minimize the per-call overhead which can reduce overall costs. Ensure you optimize batch size |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Cost monitoring |
Set up a cost tracking system that monitors model usage and use that information to help inform model choices and prompt sizes |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Token limit |
Set a maximum limit on the number of tokens per model response (max_tokens and the number of completions to generate). Optimize the size to ensure it is large enough for a valid response |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Costing Model |
Evaluate usage of billing models - PAYG vs PTU. Start with PAYG and consider PTU when the usage is predictable in production since it offers dedicated memory and compute, reserved capacity, and consistent maximum latency for the specified model version |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Quota Management |
Consider Quota management practices. Use dynamic quota for certain use cases when your application can use extra capacity opportunistically or the application itself is driving the rate at which the Azure OpenAI API is called |
Azure OpenAI |
🔴 High |
link |
Cost Governance |
Cost estimation |
Develop your cost model, considering prompt sizes. Understanding prompt input and response sizes and how text translates into tokens helps you create a viable cost model |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Model selection |
Consider model pricing and capabilities when you choose models. Start with less-costly models for less-complex tasks like text generation or completion tasks and for complex tasks like language translation or content understanding, consider using more advanced models. Optimize costs while still achieving the desired application performance |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Usage Optimization |
Maximize Azure OpenAI price breakpoints like fine-tuning and model breakpoints like image generation to your advantage. Fine-tuning is charged per hour, use as much time as you have available per hour to improve results without slipping into the next billing period. The cost for generating 100 images is the same as the cost for 1 image |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Usage Optimization |
Remove unused fine-tuned models when they’re no longer being consumed to avoid incurring an ongoing hosting fee |
Azure OpenAI |
🟡 Medium |
link |
Cost Governance |
Token Optimization |
Create concise prompts that provide enough context for the model to generate a useful response. Also ensure that you optimize the limit of the response length. |
Azure OpenAI |
🟡 Medium |
link |