VellumVellum
Help CenterPrompt EngineeringEvaluating Prompt Performance

Evaluating Prompt Performance

Last updated October 31, 2023

The effectiveness of a Language Model (LLM) application is significantly influenced by the quality of its prompts. Hence, evaluating prompt performance is a crucial step in the prompt engineering process. Vellum provides a suite of tools to quantitatively assess prompt performance, ensuring your LLM applications are optimized for desired outcomes. This guide outlines the steps to evaluate prompt performance within the Vellum platform.

Evaluating Prompt Performance in Vellum:

  • Step 1: Accessing the Prompts Section
  • Log in to your Vellum account and navigate to the 'Prompts' tab on the dashboard.
  • Step 2: Selecting a Prompt
  • Choose the prompt you wish to evaluate from the list of prompts available.
  • Step 3: Running Initial Tests
  • Use the ‘Run Test’ button to execute initial tests on the selected prompt.
  • Review the test results to understand the prompt's initial performance.
  • Step 4: Setting Evaluation Metrics
  • Navigate to the ‘Evaluation Metrics’ section and set the metrics you wish to use for evaluation.
  • Common metrics include precision, recall, F1 score, and others depending on your specific use case.
  • Step 5: Performing Quantitative Evaluations
  • Utilize the ‘Quantitative Evaluation’ feature to run evaluations based on the metrics set.
  • Review the evaluation results to understand the quantitative performance of the prompt.
  • Step 6: Iterating and Refining
  • Based on the evaluation results, make necessary adjustments to the prompt text.
  • Re-run the evaluations to see how the adjustments impact the performance.
  • Step 7: Comparing Versions
  • Use the version control feature to compare different versions of the prompt and their respective performance evaluations.
  • Step 8: Saving and Documenting
  • Once satisfied with the performance, save the prompt.
  • Document the evaluation process and results for future reference and continuous improvement.
  • Step 9: Moving to Deployment
  • If the prompt performance meets the desired criteria, move the prompt to the 'Deployments' section for deploying to production.

Conclusion:

Evaluating prompt performance is a pivotal aspect of ensuring the efficacy of your LLM applications. Through a systematic evaluation process in Vellum, you can refine and optimize your prompts for better performance, making informed decisions on prompt deployments. The analytics and evaluation tools provided by Vellum equip you with the insights needed to continuously improve your prompt engineering efforts, driving better outcomes for your LLM applications.

Was this article helpful?