Test, Evaluation, Verification, and Validation

Test, Evaluation, Verification, and Validation (TEVV) encompasses systematic processes for examining AI systems and their components throughout the AI lifecycle. TEVV is integrated across all stages rather than being limited to specific checkpoints.

TEVV Activities by Lifecycle Stage:

Design and Planning: Internal and external validation of assumptions for system design, data collection, and measurements relative to intended deployment context.

Development (Model Building): Model validation and assessment, including performance evaluation and bias testing.

Deployment: System validation and integration testing in production environments, user experience evaluation, and compliance verification.

Operations: Ongoing monitoring, periodic updates, incident tracking, detection of emergent properties, and processes for redress and response.

Key Principles:

Independence: TEVV actors should ideally be distinct from those performing development tasks
Continuous Process: Regular assessment throughout the system lifecycle
Multi-disciplinary: Incorporating technical, societal, legal, and ethical perspectives
Documentation: Formal reporting and documentation of results
Stakeholder Engagement: Involving domain experts, users, and affected communities

TEVV processes provide insights for risk management, enable mid-course corrections, and support post-deployment risk management. They are essential for establishing and maintaining trustworthy AI characteristics and informing the MEASURE Function in the NIST AI Risk Management Framework.

Effective TEVV requires appropriate resources, expertise, and organizational commitment to systematic evaluation practices.