We are seeking an AI Evaluation & Model Quality Specialist to support the delivery and validation of AI-driven solutions in collaboration with a global technology partner. The role will focus on defining and executing robust evaluation frameworks to measure model accuracy, reliability, and production readiness across speech-to-text, summarisation, and intent-based AI systems.
Working closely with engineering, product, and partner teams, the successful candidate will design metrics, curate high-quality ground-truth datasets, and conduct rigorous model validation to ensure solutions meet agreed performance and governance standards before deployment.
Key Responsibilities
The role will operate within a cross-functional delivery team and collaborate closely with a global technology partner to ensure AI solutions are rigorously evaluated, governed, and ready for enterprise deployment.
Working closely with engineering, product, and partner teams, the successful candidate will design metrics, curate high-quality ground-truth datasets, and conduct rigorous model validation to ensure solutions meet agreed performance and governance standards before deployment.
Key Responsibilities
- Design and implement evaluation frameworks for AI models, including speech-to-text and generative AI outputs.
- Define and apply appropriate performance metrics (e.g., word error rate, semantic accuracy, relevance, completeness) and establish acceptance thresholds.
- Create, validate, and maintain high-quality labelled ground-truth datasets to support transcription, summarisation, and intent evaluation.
- Conduct statistical analysis and systematic error diagnostics to identify root causes and compare model performance.
- Support model validation and governance activities, including regression testing and quality sign-off across SIT, UAT, and production readiness cycles.
- Provide empirical insights to guide prompt optimisation and model tuning, balancing accuracy, latency, and cost considerations.
- Contribute to post-deployment monitoring frameworks, including model performance tracking, drift detection, and continuous improvement processes.
- Translate technical evaluation outcomes into clear, evidence-based insights for business and stakeholder audiences.
- Strong understanding of AI evaluation methodologies and performance metrics, particularly for speech-to-text and generative AI systems.
- Experience designing and managing labelled datasets for model testing and validation.
- Proficiency in statistical analysis, model benchmarking, and structured error analysis.
- Experience working within model validation, testing, or AI governance frameworks.
- Familiarity with prompt engineering and empirical model optimisation approaches.
- Understanding of monitoring strategies for deployed AI systems, including performance degradation and drift detection.
- Strong communication skills with the ability to present technical findings clearly to non-technical stakeholders.
The role will operate within a cross-functional delivery team and collaborate closely with a global technology partner to ensure AI solutions are rigorously evaluated, governed, and ready for enterprise deployment.