Cost Governance

Category	Subcategory	Recommendation	Service	Priority	Reference
Cost Governance	Cost Management	Verify PTU cost savings vs pay as you pricing for Azure OpenAI and OpenAI models.	Azure OpenAI	🔴 High	link
Cost Governance	Cost Management	Ensure the right and cost effective model is in use, unless the use case demands a more expensive model.	Azure OpenAI	🟡 Medium	link
Cost Governance	Cost Management	Allocate provisioning quotas for each model based on expected workloads to prevent unnecessary costs.	Azure OpenAI	🟡 Medium	link
Cost Governance	Cost Management	Use the right deployment type, global deployment offers lower cost-per-token pricing on certain GPT models.	Azure OpenAI	🔴 High	link
Cost Governance	Cost Management	Choose the right hosting infrastructure, depending on your solution’s needs e.g. managed endpoints, AKS or Azure App Service.	NA	🟡 Medium
Cost Governance	Cost Management	Define and enforce a policy to automatically shutdown Azure AI Foundry and Azure Machine Learning compute instances.	Azure AI Foundry	🔵 Low	link
Cost Governance	Cost Management	Configure ‘Actual’ and ‘Forecasted’ Budget Alerts.	Azure Cost Management	🟡 Medium	link
Cost Governance	Token Optimization	Use prompt compression tools like LLMLingua or gprtrim	Azure OpenAI	🟡 Medium	link
Cost Governance	Token Optimization	Use tiktoken to understand token sizes for token optimizations in conversational mode	Azure OpenAI	🔴 High	link
Cost Governance	Cost familiarization	Understand difference in cost of base models and fine tuned models and token step sizes	Azure OpenAI	🟡 Medium	link
Cost Governance	Batch processing	Batch requests, where possible, to minimize the per-call overhead which can reduce overall costs. Ensure you optimize batch size	Azure OpenAI	🔴 High	link
Cost Governance	Cost monitoring	Set up a cost tracking system that monitors model usage and use that information to help inform model choices and prompt sizes	Azure OpenAI	🟡 Medium	link
Cost Governance	Token limit	Set a maximum limit on the number of tokens per model response (max_tokens and the number of completions to generate). Optimize the size to ensure it is large enough for a valid response	Azure OpenAI	🟡 Medium	link
Cost Governance	Costing Model	Evaluate usage of billing models - PAYG vs PTU. Start with PAYG and consider PTU when the usage is predictable in production since it offers dedicated memory and compute, reserved capacity, and consistent maximum latency for the specified model version	Azure OpenAI	🔴 High	link
Cost Governance	Quota Management	Consider Quota management practices. Use dynamic quota for certain use cases when your application can use extra capacity opportunistically or the application itself is driving the rate at which the Azure OpenAI API is called	Azure OpenAI	🔴 High	link
Cost Governance	Cost estimation	Develop your cost model, considering prompt sizes. Understanding prompt input and response sizes and how text translates into tokens helps you create a viable cost model	Azure OpenAI	🟡 Medium	link
Cost Governance	Model selection	Consider model pricing and capabilities when you choose models. Start with less-costly models for less-complex tasks like text generation or completion tasks and for complex tasks like language translation or content understanding, consider using more advanced models. Optimize costs while still achieving the desired application performance	Azure OpenAI	🟡 Medium	link
Cost Governance	Usage Optimization	Maximize Azure OpenAI price breakpoints like fine-tuning and model breakpoints like image generation to your advantage. Fine-tuning is charged per hour, use as much time as you have available per hour to improve results without slipping into the next billing period. The cost for generating 100 images is the same as the cost for 1 image	Azure OpenAI	🟡 Medium	link
Cost Governance	Usage Optimization	Remove unused fine-tuned models when they’re no longer being consumed to avoid incurring an ongoing hosting fee	Azure OpenAI	🟡 Medium	link
Cost Governance	Token Optimization	Create concise prompts that provide enough context for the model to generate a useful response. Also ensure that you optimize the limit of the response length.	Azure OpenAI	🟡 Medium	link

Cost Governance

Navigation