A Zero-Shot Adaptive Policy for AI Interpreting

Simultaneous machine translation (SiMT) aims to deliver real-time translations as a source language, spoken or written. Traditionally, this requires models that control when to “read” more of the source and when to “write” the translation — decisions that rely on intensive model training, complex model designs, and significant computing power.

Now, researchers Libo Zhao, Jing Li, and Ziqian Zeng from Hong Kong Polytechnic University and South China University of Technology have introduced PsFuture, a zero-shot, adaptable read/write policy that enables SiMT models to make real-time translation decisions without additional training.

The researchers said they drew inspiration from human interpreters, who dynamically decide when to listen and when to speak based on evolving contexts. “Interpreters shift from listening to translating upon anticipating that further future words would not impact their current decisions,” they explained.

PsFuture allows translation models to make similar, context-aware decisions, leveraging “the model’s inherent linguistic comprehension and translation proficiency” and eliminating the need for further training.

Simulated Look-Ahead

Rather than relying on a fixed number of source words to determine the right time to start translating, PsFuture allows a model to anticipate what’s coming next. By using pseudo-future information — a simulated, brief “look-ahead” similar to how interpreters anticipate what might come next in a sentence — the model assesses if additional context would change its next translation output. If not, the model proceeds with translating. If more context is needed, it waits to “read” further. 

By using this simulated “look-ahead” information to decide the best timing for each read/write action, PsFuture achieves real-time translation with minimal delay, providing accuracy and adaptability similar to highly trained adaptive models, but without their training requirements, the researchers noted.

“To our knowledge, PsFuture is the only adaptive method in the current SiMT field that offers such flexibility,” they said.

Alongside PsFuture, the researchers developed Prefix-to-Full (P2F) training, a method that prepares offline models for real-time translation tasks. Offline translation models are typically trained to process an entire sentence and therefore struggle with real-time requirements. P2F training helps these models translate sentence fragments, or prefixes, which makes them more suitable for SiMT applications that need quick response times without sacrificing quality.

Slightly More Processing

The researchers compared PsFuture against previous approaches for three language pairs — Chinese-English, German-English, and English-Vietnamese — and reported that PsFuture demonstrated strong results across all three language pairs.

Specifically, they found that PsFuture matches the performance of established adaptive policies that rely on extensive training. They also noted that PsFuture’s zero-shot approach reduces latency between source input and translation output while making SiMT more accessible and computationally efficient for widespread use.

Although promising, PsFuture has a minor trade-off: it requires slightly more processing to handle each read/write decision, which could affect performance on very long texts. However, the researchers emphasize that PsFuture’s reduced need for training and computational resources makes it a “simple yet effective” solution for most SiMT applications.

They also highlighted PsFuture’s versatility, as it can be directly applied to most existing simultaneous translation models. “The PsFuture approach is versatile, compatible with most translation models,” they said. 

The code will be soon available on GitHub.