Job Details

AI Evaluation and Model Quality SME

BH-31999
  • £70 to £75 Per: hour + Benefits: N/A
  • Reading, South East,
  • Contract
We are seeking an AI Evaluation & Model Quality Specialist to support the delivery and validation of AI-driven solutions in collaboration with a global technology partner. The role will focus on defining and executing robust evaluation frameworks to measure model accuracy, reliability, and production readiness across speech-to-text, summarisation, and intent-based AI systems.

Working closely with engineering, product, and partner teams, the successful candidate will design metrics, curate high-quality ground-truth datasets, and conduct rigorous model validation to ensure solutions meet agreed performance and governance standards before deployment.

Key Responsibilities
  • Design and implement evaluation frameworks for AI models, including speech-to-text and generative AI outputs.
  • Define and apply appropriate performance metrics (e.g., word error rate, semantic accuracy, relevance, completeness) and establish acceptance thresholds.
  • Create, validate, and maintain high-quality labelled ground-truth datasets to support transcription, summarisation, and intent evaluation.
  • Conduct statistical analysis and systematic error diagnostics to identify root causes and compare model performance.
  • Support model validation and governance activities, including regression testing and quality sign-off across SIT, UAT, and production readiness cycles.
  • Provide empirical insights to guide prompt optimisation and model tuning, balancing accuracy, latency, and cost considerations.
  • Contribute to post-deployment monitoring frameworks, including model performance tracking, drift detection, and continuous improvement processes.
  • Translate technical evaluation outcomes into clear, evidence-based insights for business and stakeholder audiences.
Key Skills & Experience
  • Strong understanding of AI evaluation methodologies and performance metrics, particularly for speech-to-text and generative AI systems.
  • Experience designing and managing labelled datasets for model testing and validation.
  • Proficiency in statistical analysis, model benchmarking, and structured error analysis.
  • Experience working within model validation, testing, or AI governance frameworks.
  • Familiarity with prompt engineering and empirical model optimisation approaches.
  • Understanding of monitoring strategies for deployed AI systems, including performance degradation and drift detection.
  • Strong communication skills with the ability to present technical findings clearly to non-technical stakeholders.
Working Environment
The role will operate within a cross-functional delivery team and collaborate closely with a global technology partner to ensure AI solutions are rigorously evaluated, governed, and ready for enterprise deployment.
Joe Matthews Associate Director

Apply for this role

© Copyright 2023 Focus Cloud
Site by Venn