← All FAQs
multi-rater-quality
How does Burna AI measure inter-rater reliability?
A six-tab analytics dashboard shows Cohen's kappa statistics, classification metrics, confidence calibration, weekly trends, agreement matrices, and override rates. Data streams in real time from grading events to the analytics layer, so site leadership has live answers to questions like 'how many cases were graded this week,' 'what is our inter-rater agreement rate,' and 'where is the AI suggestion being overridden most often.' Audit-ready output supports FDA expectations for documenting inter-rater reliability in AI/ML tool deployments.