From Subjective to Scalable: How AI is Standardizing Interpreting Quality

“Most interpreting quality is still invisible,” said Bryan Forrester, CEO of Boostlingo, at a presentation during SlatorCon Remote on March 24, 2026. Forrester introduced the topic, pointing out that inconsistent interpreting quality greatly impacts client operations.

Referring specifically to remote interpreting, Forrester highlighted that the traditional model of Quality Assurance (QA) relies on spot-checking: “calls are evaluated through a small sample, not full visibility of thousands of calls.” As such, reviewing a fraction of interpreting encounters leaves the vast majority of interactions unmonitored and unmeasured.

According to the CEO, the industry is currently undergoing a fundamental shift from anecdotal evidence to observable data. Historically, human reviewers have provided the gold standard for quality, but this method is fraught with subjectivity and scalability issues.

“You could have two humans listen to the exact same call, but come away with two very different opinions or scores,” Forrester noted. He further explained that human review also falls often into the accuracy trap, focusing solely on whether words were translated correctly while ignoring the nuances of conversational flow, turn-taking mechanics, and professional tone.

Covering 100% of an interaction allows stakeholders to spot trends in real-time instead of isolated errors. They are not getting a report a week after the fact. Forrester compared this evolution to sales intelligence platforms like Gong, which replaced manager gut feel with data-driven coaching.

“You could have two humans listen to the exact same call, but come away with two very different opinions or scores.” — Bryan Forrester, CEO, Boostlingo

Redefining Interpreting Quality Benchmarks

One point Forrester emphasized during the presentation was the distinction between literal accuracy and contextual quality. He shared an example from a reproductive health setting. A provider asks a Spanish-speaking patient if she is using birth control. The patient replies using a verb that literally means “taking care of myself,” but in that specific medical context, it actually means “using contraception.”

Forrester offered that, beyond the words themselves, Boostlingo identifies several pillars of a good quality session, including completeness (message integrity), professionalism (appropriate tone and neutrality), and technical quality (monitoring audio clarity, background noise, and connection stability).

Boostlingo’s Chief Product Officer, Brian d’Agostino, demonstrated the company’s Assure tool, the company’s new AI-powered quality layer.

The CPO showed a session-level breakdown where the AI acts as a panel of three independent “judges.” These AI judges score a call across different rubrics to create a mean score, providing a level of consistency that human-only teams struggle to achieve.

Evaluators can click on specific bookmarks in a transcript where the AI detected a potential error, allowing a human lead to jump directly to that second of the call to verify the nuance.

D’Agostino also showed the platform’s redaction capabilities, where sensitive Personally Identifiable Information (PII) is automatically scrubbed from audio and text.

Trust and Security

Forrester pointed out that “quality is the unit of trust, and if accuracy breaks, we know adoption will stop.” In his experience, benchmarking AI performance against human standards allows interpreting providers to give clients evidence-based recommendations, such as using AI for low-risk front-desk check-ins, while reserving human experts for complex interactions.

“Quality is going to be based on each individual use case, not by a set of averages across use cases. Being able to say, ‘here are our benchmark scores in Legal,’ for example, and ‘here are our benchmark scores in Education’ is going to be really critical depending on who the customer is,” offered the CEO.

Forrester identified early trends showing a strong correlation between high AI-generated scores and positive customer experiences, acknowledging that the process requires constant refinement to align machine evaluation with human expectations.

Finally, central to this tech evaluation evolution is a rigorous approach to data security, where Forrester advocates for a belt-tightening philosophy on privacy. This protocol is based on specialized third-party AI to redact sensitive healthcare and personal information across transcripts and summaries and ensures compliance with HIPAA and System and Organization Controls 2 (SOC2, a privacy compliance standard).