Training and QA
Inter-Rater Reliability for Transcripts
Inter-rater reliability helps supervisors, instructors, and teams compare two independent transcripts of the same language sample. ConductSpeech reports agreement across boundaries, words, morpheme counts, codes, and an overall score.
This is useful for graduate training labs, onboarding, and clinical QA because it turns transcript differences into a concrete review conversation.
Sample result
Inter-Rater Reliability for Transcripts
Boundary agreement
How closely raters split utterances
Word agreement
How closely transcript words match
Coding agreement
How consistently codes were applied
Boundary
Utterance agreement
Words
Word-level agreement
Morphemes
Count correlation
Codes
Coding agreement
Boundary
Utterance agreement
Words
Word-level agreement
Morphemes
Count correlation
Codes
Coding agreement
How it fits into a speech workflow
1
Collect
Start from a recording, transcript, or saved session.
2
Review
Check speaker turns and make clinical edits before relying on results.
3
Measure
See the language measures and notes that matter for this feature.
4
Use
Bring the output into reports, progress review, or research exports.
Compare the same sample
The reliability workflow is built for two transcripts of the same recording. ConductSpeech rejects different recordings so the agreement score reflects rater differences rather than different source material.
Useful for university programs
Instructors can have students transcribe and code the same sample, then compare agreement. The resulting scores make it easier to identify whether the class is struggling with boundaries, word accuracy, morpheme counts, or coding conventions.
Practical agreement metrics
ConductSpeech summarizes utterance boundary agreement, word-level agreement, morpheme-count correlation, coding agreement, and an overall score. Advanced terms can be explained lower on the page without making the hero copy intimidating.
What users see
Reliability report fields
A compact result view turns the feature into reviewable language, not a technical readout.
Boundary agreement
How closely raters split utterances
Word agreement
How closely transcript words match
Coding agreement
How consistently codes were applied
Clinical interpretation notes
- Reliability scores are meaningful only when both transcripts come from the same sample.
- The scores identify disagreement; a supervisor still decides which transcript is clinically correct.
Related pages
SALT Transcript Editor
Edit AI-drafted transcripts with SALT-style codes, review suggestions, and re-analyze language sample metrics instantly.
SALT-Compatible Language Sample Analysis
AI language sample analysis with SALT-style coding, C-units, SI, mazes, grade norms, reliability, and clinical reports.
Clinical Language Sample Reports
Generate IEP-ready language sample reports with MLU, PGU, SI, C-units, maze summaries, norms, and fluency context.
SALT-compatible analysis methodology
Read how ConductSpeech documents conventions, validation, and limitations.
Ready to try it
Start with a real language sample.
Create an account, upload or review a sample, and see how this feature appears inside the ConductSpeech workflow.