Phrase CEO Georg Ell on the Arms Race in Language Technology

The CEO shares that enterprise-grade technology for generating multilingual content at scale is still in demand, underscoring the need for robust, enterprise-quality solutions.

Subscribe on Youtube, Apple Podcasts, Spotify, Google Podcasts, and elsewhere

The podcast explores new product launches from Phrase, including the introduction of Next GenMT, which combines GPT 4o with Phrase’s own MT engine to enhance translation quality and efficiency.

Georg also discusses Auto LQA (Language Quality Assessment), an AI-driven solution designed to assist linguists, not replace them, and significantly reduce costs and time spent on quality assessment.

The CEO highlights Phrase’s strategic shift towards being a platform rather than a product-centric company with an updated pricing model that allows customers to access a comprehensive suite of capabilities.

Georg concludes by discussing Phrase’s strategic partnerships with major LSPs and the company’s ecosystem-first approach.

Transcript

Florian: Today is Round 2 with Georg Ell. Georg is the CEO of localization language tech platform Phrase. We met up at the recent SlatorCon London and thought we really needed a round two to this because Georg was on the pod last year. Since our last podcast, it was mid-last year, a lot has happened in the industry, and of course, with Phrase. Just catch us up a bit what you’re seeing in the market and what’s new at Phrase.

Georg: I just think it’s a wonderful time to be in this industry because there’s so much change happening. It depends if you like change or not. It’s always been interesting to me in 20 plus years in technology now that there are people in the technology industry that don’t enjoy change, and I always wonder why they chose this career. But like the acceleration now in so many ways is extraordinary. So some broad themes that we’re seeing. Clearly, it’s the adoption of generative technologies in many different areas, and I’m sure we’re going to talk a lot about that today. That introduces all sorts of changes that are second-order changes, and that includes for us, and I think for others in the industry, thinking about business model and the commercialization of software and services. And I think AI, the people who build and sell and then buy and consume AI-fueled services have some really interesting challenges to grapple with. Perhaps we could expand on that in a little bit. And then, I think coupled with that, and this is also why it’s a wonderful time to be in the industry, is that business leaders are now paying attention to language technology as never before. And with that comes business value-type questions around cost, cost per word, return on investment, time to value, that our industry hasn’t historically been super strong expressing itself in those terms. And so that’s going to be a big theme for us as we make announcements in the next days, and certainly by the time this podcast goes out around how we’re adding value through hyperautomation. It’s a big focus for Phrase on how we add value to both our enterprise and our LSP customers. Maybe a few things we could unpack there.

Florian: 100%. You said that the spotlight, as never before, is on language tech. Do you think it’s almost considered a solve problem in the C-suite? They don’t have hours a day to think about this, the C-suite that is. Do you think at this point in time, they almost consider it a solve problem, generating multilingual content at scale? Do you maybe sense there’s an impatience that there’s a robust enterprise-grade technology still needed to do this?

Georg: Definitely. I thought Martina from Vinted gave a wonderful presentation at your conference where she spoke about this. She talked about the journey from the last couple of years where the expectations from the business were X. When she joined, the delivery was below those expectations and then the delivery’s increased, but the expectations have increased as well, which is really interesting. So I think in the industry broadly, we’re chasing those expectations. As business leaders pick up their phones, they open up a generative model and they get what looks like magic in the palm of their hand. And of course, that magic when you try to, and this is part of the theme of what I’d like to talk to you about today, is the innovation around generative and the AI technologies in general is nonlinear. And a little bit like the models themselves, it’s nondeterministic. So even the journey of innovation to actually turn those technologies into enterprise scale, enterprise quality, enterprise class solutions is quite a journey. You don’t always end up where you thought you were going to. And I think that’s something that business leaders perhaps haven’t fully understood yet. And the best way to show it to them is to just get them to press the regenerate button on their app and then see how many different ways the technology has of responding and say, do you see the implications for consistency here? There’s something in that.

Florian: Let’s go a little bit more theoretical. I want to run a framework by you that I’ve been thinking about, and that’s probably some amalgamation of various podcasts I’ve listened to and various market frameworks I’ve seen. But currently, I would call it the stack idea. You start from the compute, like the NVIDIA, the Google Cloud, the AWS, then you go to the foundation models like OpenAI, like Cohere, like these maybe seven, eight, others. Then you go to the platforms that make this accessible, like Phrase, you go to narrow use case technologies like the 50 Under 50 Startup List we published. Then on top, you have this service layer of many, many hundreds of thousands of language service providers that maybe system integrate this or just provide that added expert layer on top. What do you think about this? Just open question. Does this make sense? Where would you change it or discard it?

Georg: No, I think that way of thinking about it does make sense. What you have to figure out is where to play and how to win. I think this is where there are companies making very large investments in building in-house large language models, and I’m interested to see how that turns out. I think our choice is to be at the application platform layer. So if you look at what we’ve done since we last spoke on the podcast in August, we’ve released the ability to create custom models. We’ve released quality performance scoring that’s all based on these large language models. We’ve released portal concepts that can go enterprise-wide and bring in these different engines, loads of different integrations. It’s about building the enterprise application layer that allows for hyperautomation to take place and I would think about that in three parts. There’s the ingress and egress, so your information and information out. There’s the project management of what happens in the middle. And then there’s the actual machines doing translating, whether it’s machine translation (MT) or it’s generative, or it’s actually a combination of the two, which is one of the things that we’re going to announce this week is that actually we’re now building on Phrase NextMT, our in-house MT engine. We’re releasing Phrase Next GenMT, which is the fusion of GPT-4.0 and NextMT. So it’s generative and MT together or Next GenMT. You can say it how you like, but it’s Next GenMT. And that is incredibly powerful because that brings in tag handling and all sorts of additional analysis that a pure NMT solution can’t bring, but it brings with it the rigor and the ability to leverage language assets that a generative model struggles with. So you put the two things together, you get something really powerful and high quality. And we’ve done that really fast. I mean, 4.0 was only released, what, three weeks ago, something like that, so pretty pleased with that. So the stack idea makes sense. We play at the application layer. What worries me about people building in-house models is that you’re in an arms race with people with very deep pockets and that concerns me for people making that bet. And I think it was said at your conference that there may only be five or six large foundation-level players in a year or two’s time. And when Mark Zuckerberg comes out and says, I’m going to open source it and put $100 billion into developing this open-source approach, it’s a scorched-earth policy for all the venture-backed foundation models that we’re hoping to enter that space. I like the approach. I think we’re very much at that application layer.

Florian: So the foundation model is the part that you probably want to stay away from or should have been in five years ago, maybe.

Georg: But you have to raise the kinds of money that Microsoft, Google, OpenAI, Mistral can raise. And there’s like 100 others that have raised a lot of money. There’s only going to be a very small number of winners. So I think actually the way that the cost curve for those models is proceeding and the way the ability curve is proceeding, it’s much more interesting to build on top of those than it is to play in that space. There’s even the argument that, particularly with the open-source approach that Zuckerberg is taking, that there’ll be a degree of commoditization in those models. So the value, I think, with the exception of potentially small handful of hyper-scaled winners, is going to be in the application layer for the thousands of other companies building enterprise solutions.

Florian: You said you’re launching Next GenMT and you’re building it on GPT-4.0. So the 4.0, when I was playing around with it, yes, faster, but not hitting out of the park better. So is the faster really a key to integrating it on Phrase?

Georg: There’s a few things that are interesting going on there. So we have a slide we use internally. I’ll try and represent it with words in hands. You’ve got the solution in the middle, and you’ve got GPT-4.0 down here in a box, and then a bunch of empty boxes, and we could fill those boxes up with other large language models in the future. And at the top, we have various different capabilities that we’ve built and various different use cases. And again, there are words in some of those boxes and then blanks in others. And we could easily add additional components into this model of building, and we could add additional use cases and build it out. So we’re not building a dependence on 4.0, but 4.0 at the moment, in combination with the assets that we can bring from the customer and then the quality scoring models that we’ve built internally, does provide a solution that is much better than native 4.0, or much better than an NMT model from any other provider, so it’s the best quality. Cost plays an important role. Again, fresh memories from your conference, but it was on the stage that to an audience of 170 people, it was said people are going to need to get used to spending an order of magnitude more to get an LLM-based solution. So I’m going to challenge that a little bit and say, actually, the way that we’ve done it, you’ll spend more than a pure NMT solution, but it may only be 1.5 times instead of 10 times. So the cost component is also an important component in the choice of model. Now, in the future, you could even use some of the open-source models and maybe address that further as well. But there is some value to be had in combining large language models with neural MT models, with all the workflow that we can bring. But it doesn’t have to be at an order of magnitude more per word as some would say it needs to be.

Florian: Yeah, let me ask you another question then. When we did our survey for the market report that we published a couple of weeks ago, 40% of buyers, kind of a narrow group there, localization buyers, named multilingual text generation as their key language AI requirement. Now, what do you think about that?

Georg: I am pretty excited by this whole challenge, basically. I think that multilingual generation at source actually creates a bigger imperative for solutions like ours rather than a lesser one. So historically, you would have had a human being create source text and then potentially check that source text and send that to a translator and I would say that that is effectively a prompt. And then that prompt went to a translator who then put it into multiple languages. And instead, that someone is creating a prompt, sending it to a machine, which is then creating a language in 50 languages immediately. And so what that means is that whereas historically, you might have had a chance to do some degree of quality scoring early on in that process, now you have no chance because the machine can generate it so fast that it’s all going out the door very, very quickly. So this goes to the hyperautomation to hyperscale type journey we’ve been talking about for a year, where I think volumes are going to go through the roof as more and more generative AI, more and more multilingual content at source is used by enterprises to generate content in real time. So if you work backwards from hyperscale, to get the value out of that, what you really want is hyper-personalization, and to have any chance of achieving that, it’s hyperautomation. So there’s actually a journey we haven’t spoken about a lot yet, which is hyperautomation goes to hyper-personalization, goes to hyperscale. I have talked about that five-year vision, but I think there’s a logical sequence here that follows. So we’re still trying to establish and generate value in hyperautomation, but what’s quickly going to follow after that is hyper-personalization and hyperscale. So if you’re generating content at source in 50 different languages, you need a machine to be doing quality scoring incredibly quickly, because if there’s a mistake, it’s going to get propagated quickly. And that has implications for the whole industry, because then suddenly the role of the human reviewer changes a little bit from artisanal creator, which you can do if you have time, but if you’re generating 50 languages right out the gate and you want to publish them, you don’t have time anymore. So you need a machine to do quality scoring incredibly effectively. Have a workflow which can potentially put that through multiple cycles of NMT or generative MT. Use whatever the right tools are, and that’s where a selection engine like ours becomes incredibly important. Achieve this quality scoring for that type of content that you need, publish, and if you don’t, send that to human for quick review, and it’s a risk reduction process. It’s a different kind of human review. That means a different kind of CAT tool, more lightweight, fast and responsive. So we have opportunities there as an industry, and I think that’s at Phrase as well. And there’s a different business model, right? You don’t want to build per minute. You want to build per minute not spent. You want to pay for speed. You want to pay for speed and quality and it’s a very different business model to pay for speed and quality than it is to pay per word, per minute, which is historically that kind of model. So I think the technologies that are in the industry, Phrase is an enabling technology for a lot of this, but the drive of AI to accelerate how fast all of this is happening is going to challenge the business models of service providers quite substantially.

Slator 2024 Language Industry Market Report — Language AI Edition

The 140-page flagship report features in-depth market analysis, language AI opportunities, survey results, and much more.

$970 BUY NOW Included in our Growth, Pro, and
Enterprise plans. Subscribe now!

Florian: It is already challenging it. I can tell you that from what we’re hearing. There’s major search for new business models, changed business models, new business lines definitely going on. All right, so you mentioned GPT-4.0 and tag handling with GPT-4.0. I remember tag handling from my localization days. Tell us more about this.

Georg: Yeah, so tag handling is an interesting challenge. Now, our AI research team, so Dr. Alon Lavie and his team, who joined us last year. So the research team figured out a reliable way of persuading GPT-4.0 to respect inline tags and then put them back in the right place on target. That’s something that LLMs are not generally brilliant at without very careful instruction, but we’ve done significant testing and found something that really works. So yeah, it involves embedding a Regex into the prompt, plus a little extra magic is what the research team tell me. And we’re excited to have people use it, try it, give us some feedback on it.

Florian: I think a lot of people are excited about anything that’s involving fixing tags and automating tag, so good work there. In our Pod about a year ago, you mentioned that large companies may have complex requirements for localization despite limited localization maturity. Now my question is, are you seeing that LLMs and AI is enabling them slowly to leapfrog the previously more mature companies or not at all? Is the delta to the early adopters getting even bigger?

Georg: Yeah. It is really interesting. I think that AI in almost every software category, has the potential to start to bring software and services more closely together. Because historically, software has been built as a tool to make someone who uses that tool more productive in a particular job or function. When you add AI into the mixture, it actually allows that software to start to bring a whole solution. It’s like the difference between using Google to get 10 blue links versus asking ChatGPT, and it gives you the answer. So one is like a software tool, one is essentially a service. And so I was at a dinner in London recently, and someone from Index Ventures stood up and she said this. She said AI actually allows software to start to fuse with the whole of the services market, which is even bigger and total addressable than software alone. So I do think that large language models are going to allow tech forward-thinking LSPs and then tech platforms that infuse AI all the way throughout to combine to offer some really interesting, complete solutions. So one of the things we’ve always been very focused on at Phrase is to be very strategically neutral, and we have relationships with all of the LSPs, so we’re not an LSP ourselves. We never will be a services company, but we can build solutions together with LSPs that are potentially somewhat unique to that LSP. And then the same offer is there for another LSP. We can build a solution that’s a little bit unique there and that could be like a commercial proposition where it actually combines services of theirs with software from ours. It could be a technology proposition because it’s a platform and it’s extensible. We could have an LSP build a particular model that we then plug in or they surface through our technology, right? And that could be a unique offering from that LSP, but powered through the Phrase platform out to the mutual end customers. So I think that’s pretty interesting. I do think large language models are not yet at a state of complete Holy Grail, solves all problems, like general level. We know from our own experience recently that actually you can aim really, really high with your expectations around building on top of LLMs and then discover that the reality of the innovation journey is like, it’s not straight, and you end up in a place that’s a little different from where you thought you were going to. It still adds a lot of value, but perhaps some things go much faster than you think and some things go a little more slowly. That’s maybe something you could come on to.

Florian: You mentioned before building a complex solution and offering, maybe even together with an LSP, for a specific end customer. That’s building, going, selling, selling it. But more often than not, you’re actually responding to an RFP. Are you seeing any changes in the RFPs in the last 12 months for language AI, translation AI platforms? Any specific features or capabilities? Are the RFPs written well or still somewhat traditional? Or maybe even pie in the sky, they want everything and everything for a super low price. What’s the trend there?

Georg: Procurement always want that, but I think what we’re seeing is that the curve, if you like, of RFPs and the ambition level that we’re seeing is definitely shifting along the adoption curve. We’re seeing more and more specific requirements in how to achieve time, cost, and quality improvements, specifically through the use of large language models and AI. What are you doing in this area? There’s some trends that we’re seeing is around not just picking a technology, but actually, how are we going to trust the output? How are you going to help us to make sure that we can trust that your use of these technologies is going to be consistently at the quality levels and predictably the types of outcomes that we want. What controls can you help us build into this? How can I make sure that my data is actually clean enough and high quality enough to feed into this whole process. Throughout the process, how can I best leverage AI to help in quality assessment and automated quality improvement? So these are the trends that we’re seeing, and I think maybe this is a good moment for me to talk about some of our announcements that we’re doing this week because they fall very much in line with these things. What I would say to finish the point on your RFP question is that whilst the bulk of the curve has moved to the right on adoption, what we are seeing is some customers that are really stretching and shooting much further ahead than others. So you’re starting to see some enterprises, the curve is stretched out into the right in terms of adoption. Some really ambitious enterprises doing some very interesting things at that end. And we have a couple of examples that are really fun to talk about there. And I love those customers. Every year, we have one or two customers that really push our capabilities to the next level. And of course, whenever we have a customer like that, that raises the high watermark for our capabilities that now all of the customers can benefit from. But you always have these one or two that you associate with, yeah, in 22, it was them. In 23, it was them. In 24, it’s these ones. They love working with these customers and I’ve told them in that language. It’s really great.

Florian: You mentioned a few new things that you are about to launch. I’m not sure if they’ve launched already by the time this podcast comes out. But yeah, what’s in the pipeline? What’s coming?

Georg: Building on the RFP question, so it’s all about hyperautomation, which is a theme from Phrase over the last few quarters. We’re taking those next steps, enabling the next step in hyperautomation. And a lot of the time for people, it’s about, practically, what can I do? So not pie in the sky, very practical things. So I’ll try and do it in that way. So the first one is data cleansing. So historically, companies have spent tens or even hundreds of thousands, once a year, twice a year, to have someone manually clean all of their language assets. We’ve now used our custom AI training model to build a solution for that. And you can have savings of up to 85 % on cost and 96 % on time to very quickly clean and curate. We’re calling it Automated Asset Curation of your language assets. So that is, if you like, the beginning of a really healthy, productive cycle, because with clean language assets, then you can get into leveraging those assets in different ways. So our portal that we’ve announced earlier this year now, as of this week, has single sign on, so you can go enterprise-wide with that portal. What differentiates that portal from everything else is it leverages the language assets that you’ve just cleaned. So now you can produce a custom portal for each department if you want to, give them a custom model with clean language assets so you can have one for legal, one for marketing, and eliminate the cost of shadow localization across your organization and do it with almost zero effort in the localization team, so massive impact factor. I would then say Next GenMT, we’ve talked a bit about that, and that’s really exciting because that’s actually fusing Phrase NextMT with OpenAI’s GPT-4.0 to create what we call Next GenMT. And it’s actually a framework, so the framework itself has these different components. In the middle, we have the LLM product that we’re providing to customers. So the first operation of that is essentially translate. We’ve got prompt building blocks like terminology and tag handling at the top. And then underneath, we’ve got that foundation model, GPT-4.0. Now, in the future, that could be other foundation models. It could be other fundamental language operations, translate just being the first one, and it could be other prompt building blocks. We’re building a whole framework for putting more and more on top of that. So Phrase Next GenMT for translation is really just the first embodiment of that. And then the journey that I’ve alluded to a couple of times is around what we’re calling Auto LQA, so language quality assessment is key for a lot of customers. Some of our customers spend millions and millions on this. In fact, I met with a few at a conference earlier this year, and I asked, is it six figures or seven you spend on LQA? And they all smile and they said, no, it’s not seven. So it’s expensive, and as a result, it’s also something customers only tend to do on a fraction of their content. So wouldn’t it be great if you could machine automate that? So we aimed really, really high. We aimed and built a solution that we thought would actually be able to effectively do LQA for a customer. You don’t always end up exactly where you thought that you might. And we learned of some really interesting things along the way. So what we’ve got at the moment and what we’ll release and into general availability on Wednesday is Auto LQA, but it’s more of a copilot experience working alongside the linguist, it’s not intended to replace the linguist, certainly not in this edition. And there are still tremendous cost and time savings there. So we have one customer who actually used it extensively and achieved an 80 % cost reduction. I think many others will find a range that starts at around 30 % and then rises in terms of cost reduction. Time reduction, again, is around 80, 90 %, probably in more cases. One of the things that’s really interesting here is the tension between human, like post-edit distance, linguist-driven quality assessment and a machine-driven quality assessment. I’ll start from this perspective. MIT did a study asking groups of linguists to relate or correlate the quality assessment of pieces of content. And the correlation factor between human beings doing quality assessment is 0.48. In other words, humans mostly disagree on how to translate something. So you ask, can linguists translate Shakespeare, you’ll get 10 different answers. Now, the correlation efficient between machines and then linguists, actually, and our Auto LQA and the linguists is 0.46. So it’s incredibly close. It’s almost like a human, which is 0.48, 0.46. But the feedback we were getting is, because the correlation is too low, we can’t trust it. It can’t replace. So I would say the way to think about our Auto LQA at the moment is like it’s a 0.46 correlation human. In other words, it’s basically a human-like reviewer that another human can then review as opposed to a machine that replaces the first human. Now, we hope in time, of course, to drive that quality up, and maybe the ratio of human to machine will go tilt more in favor of the machine, I guess, over time. That’s inevitable. But we certainly found, and this is where the humility has to come in, that we’ve had an amazing run of innovation over the last few years, and I’m beating the drum for all that innovation. I’m proud I have the team. I’m still really proud of them, but we learned something here. It reminded me of something someone told me years ago, which is, Georg, if I mow my own lawn, the lines can be a bit squiggly, but if I pay you to mow my lawn, the lines have to be straight. And so the standard for us to build an Auto LQA has to be better than human, not equivalent to human. So we’re at equivalent to human, and that’s a copilot, and we’re launching that this week. We’re working to better than human. That’ll obviously be something that we work on over the weeks and months to come.

Florian: The copilot is built for, let’s say, in-house localization, language services teams, but also maybe for LSPs working for those?

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

$290 BUY NOW Included in our Pro and Enterprise plan.
Subscribe now!

Georg: Very much so, yeah. We’ve been running an early access program with this. We’ve had more than 25 organizations be part of that program. It’s a high white glove, high touch kind of program. We’d love if there are people listening to this that want to participate in that kind of engagement with us, we’d love to do more of that. I will say the concept of the early access program is something we’re committed to doing more of because it’s been so valuable. And I think in this world where things are changing so quickly, it’s actually very valuable for the companies as well to be part of an early access program before we say it’s general availability. So we had 25 companies, enterprises, and LSPs in there. It was probably about half and half, enterprises and LSPs, and the LSPs have found huge value in that as well. And if you can reduce the cost per word, if you can make it all happen faster, then you can increase the surface area, and now you can run LQA on much more than a tiny fraction of your content, so that’s all been pretty exciting. There’s more in our release as well on analytics, on automating over-the-air updates, and other cool things. Zendesk integration for help desk we just released, and so on and so forth.

Florian: The early access is a double-edged source sometimes. It’s like, for example, Google is launching all of these things and I’m like, where can I? Oh, no, it’s early access. It’s not actually launched yet. Oh, this was only at the CEO presentation, but I can’t get it yet.

Georg: Yeah, it’s super frustrating that. I tried to try and play with some of their tools myself, and I gave up pretty quickly because it wasn’t easy to do. I mean, it’s more tilted a bit to the enterprise, the larger customers. But if you’re an LSP customer or an enterprise customer, just speak to your customer’s success manager, and he or she can bring you into the program. It’s very easy.

Florian: And pricing all of this is really tricky and you guys launched pricing. I went to the website a couple of weeks ago and saw that the whole pricing page was redone. I don’t know. Tell us more.

Georg: I’d love to. Thank you, so this is part of our movement towards being a platform company. So historically, we sold product A, product B, product C, much like most people do today, still today. And as we were starting to think of ourselves more as an integrated suite of products and then changing the language again to more of a platform of products. And as we built workflow tools that went across the top, and increasingly now, we think of ourselves as a flexible composable type platform. So a nice example actually is, I mean, hyperautomation generally requires composability. You need to be able to pull the capabilities from thing A and thing B, put them together, put a quality score in there, have some logic, maybe take a few cycles, a few loops, and then pump it out via an integration. And all of that requires this composability and the ability for a customer’s content to traverse all of the capabilities of our platform without an impediment. And what was happening was that our pricing was actually an impediment because people would have a customer of thing A and have to go through a procurement cycle to buy a thing B, and that was a problem. So starting in September last year for new business customers, and then from January this year at renewal for existing customers, and then from April this year for self-service customers, so it’s been really phased, we have been migrating to this platform approach. So you can no longer buy Phrase TMS as a standalone product from Phrase. You can not buy it. You can only buy the Phrase platform and the Phrase platform has inside it every capability that we have built at a certain volume. And that’s a fixed price, fixed volume across all of the capabilities. And then if you want to go beyond that because you have a requirement to do lots of managed words and strings or process lots of words through a machine translation or lots of words through a TMS, you can then buy those additional capabilities on a volume basis. And this has been incredibly well received by customers because there’s a lot of flexibility in it. You can choose the base level, T-shirt size, small, medium, large, extra large, and then you can go beyond it on the volume basis. For our enterprise customers, there’s even more flexibility top-end because that base level license, we can flex it up and down, so it depends how you like to buy. If you’re an enterprise that says, I only want to talk to my CFO once a year, put more in the base license and have less variability. If you’re someone that says, I only want to pay for what I need, put a bit less in the base license, a bit more variability, and maybe then you come back throughout the year as and when you need it. So for our enterprise customers, we have that extra level of flexibility as well. And so the reaction has been positive, but critically, what it allows us to do is to build solutions right across our platform to drive this hyperautomation agenda, which is the most important thing.

Florian: Speaking of right across the platform, and there’s so much multimodal, I don’t know, lack of a better word, technologies coming out in language AI. We had this demo from OpenAI where they had this 20-second translation thing, speech-to-speech, a live speech-to-speech translation, big use case. Obviously, Phrase, very heavy text enterprise, big scale, but are you looking at these multimodal things as well? Is that something that may be included at some point in the offering?

Georg: Yeah, we’re looking pretty closely at it, actually. We already have in the Phrase, it’s like strings element of the platform, we have effectively a headless CMS for digital web and applications. And how we extend that and to be more multimodal as part of the current strategic thinking, pretty actively pursuing some opportunities there. I think, interestingly, I think at the end, actually, words are quite canonical, because two years ago, it required millions and millions of venture capital finance and extraordinary business to do basic text-to-speech or speech-to-text. And now it’s an API call to a hyperscaler. Actually, if you can build a canonical library of prompts, and a bit like we talked earlier with, you’ve got an LLM model in the middle, you build IP around your prompts, you build IP around the connection and the workflow and the quality scoring, the words themselves, actually capturing the prompts could be the canonical thing. So we need to explore this area a little bit, whether we actually need to store and process images and sounds and videos, or actually you can just call an API and the words are the canonical item. It’s something I’m talking to a lot of people about. But yeah, the internet is going multimodal and so will we.

Florian: Now, we did talk about cost of machine translation or quality, machine translation versus LLM. I just want to revisit this because I also saw a post from Konstantin from Intento recently that he’s basically saying, hey, we’re getting this all wrong. It’s on par now. So more NMT and LLM, it’s reached parity also in terms of cost. Others at SlatorCon, for example, have argued it’s still super expensive. And yes, we’re exploring it, but there’s a major cost element. Can we just revisit your take? Because you’re obviously a big conduit for that big consumer, big router, as it were for this.

Georg: That’s right, so we’ve seen Konstantin saying it’s par. On stage at SlatorCon, we heard an order of magnitude, so 10X higher. Our phrase Next GenMT is going to be around about a 1.5X, so about 50% higher and not 1000% higher. So much, much closer to Konstantin’s view of it and I think what you need to do is you need to be able to prove not just that there is a basic translation going on, but it fits into a general workflow with quality scoring, leveraging language assets, and all the other things that you need to do to actually have an end result, like a solution, not just solve a partial component of it. But I do think that’s why we bet on not being a foundation, not building our own LLM, but leveraging the cost curve that’s happening there.

Florian: How hard is it to be foundation model agnostic? Is that just an architectural decision in software and it’s not a big deal?

Georg: I talked to our team about this the other day. We’re doing most of our work on OpenAI at the moment, and I wanted to make sure that we weren’t building a single vendor dependency. And what if someone else comes along with the next best thing, how quickly? I think the overwhelming majority of our architecture is reusable. You need to fine tune the prompts, of course. At the edges, there’s things that we would need to redo, and that’s, I guess, where some of our IP would come in as well. But the majority of the architecture is pretty like plug and play.

Florian: Something that was, for some strange reason, much bigger five years ago than it is now as a topic in the core localization industry, is the user interface for the linguist that’s interacting with all this great technology. It used to be interactive and you’re typing and it’s changing as you go. I don’t know, what’s the status on the Phrase side and where do you see this going?

Georg: I think this is where it goes back to the changing role of the linguist. There’s still a need for a highly creative, artisanal, crafted experience for many linguists today. However, as content gets generated by machines, multilingual at source, the implications of that are simply that a tsunami wave of volume is now going to get pushed through these systems, and it will simply be impossible for the same approach to be peanut buttered across all of that volume. So instead, there’s going to have to be more machine quality assessment in there. We’re going to have to continue to improve the accuracy of that, of course, to be better than human and drive that Pearson correlation coefficient up from 0.46 to something higher, so humans start to really trust the machine. But in doing that, we’re also going to need to change the editing tools for linguists as well. So linguists who are doing that type of work, that rapid response, high-quality risk reduction work, as opposed to pay per minute type work or pay per word type work, that requires a different interface, and I think that’s something we will be working on.

Florian: Must be a big challenge because you’re reading, you’re processing, it’s basically becoming like a reading job, right? And then you have to do some interventions from time to time, but if the UI is terrible, then you’re probably burning out after a week.

Georg: It needs a different kind of UI, a different business model. It’ll need maybe new players, new vendors to shake things up a bit as well, as well as some of the traditional vendors or the existing vendors, let’s say pivoting. There’ll be some who pivot, some who don’t, and there’ll be some new players. I’m excited to work with all of them.

Slator Pro Guide: Language AI for Consumers

This 16-page guide explores how consumers are using AI to generate, translate, edit, and dub speech and text in multiple languages.

$260 BUY NOW Included in our Pro and Enterprise plan.
Subscribe now!

Florian: In our market report, speech-to-text was also a major use case mentioned by the buyers. I was wondering if this is something that you’re also looking at. Again, you can get these things via API now, but there’s these platforms that used to be quite niche for speech-to-text and they’re now grabbing a bit more mainstream attention and maybe also then have some automated translation feature afterwards. But do you see this as something you want to maybe include as well or it’s an add-on feature? Is it core, non-core?

Georg: It’s an API call, isn’t it? It’s amazing how you don’t have to build this technology in-house anymore because you can just call that and then incorporate it into the workflow. So we already do this with some of our customers. We already do it with some of our partners. CaptionHub is a great example where we partner very closely with them, and they will do elements of speech-to-text and even text back to speech, and then do the translation in the Phrase platform. And it’s the combination of these things that allow much of the very high-quality, very high-scale work to take place. So, yeah, I definitely believe in all of that. I think it’s part of going multimodal, but it’s also partly why I think the prompt itself and the text remains canonical because it’s the text ultimately which forms the basis of the API call.

Florian: Now, speaking of partnerships, you announced with Lionbridge a partnership, but that’s like 12, 18 months ago. I mean, should we expect more such high profile, large LSP partnerships being announced?

Georg: I mean, so we already work with seven of the top 10 global LSPs, and then hundreds and hundreds of LSPs in the long-tail below that, if you like. So we did announce a big partnership with Lionbridge. Around the same time, we actually announced a partnership with Acolad as well. We’re really excited about some of the work we’re doing right now with people like Argos, Vistatec, Welocalize, Acclaro, these really tech-enabled forward-thinking LSPs who I think do get it, if you like. They understand the change. We continue to work alongside people like Acolad as well, like I said, another big announcement for last year. But I think what’s really changed for us is the fact that we’re now avowedly calling ourselves and trying to think about ourselves as a platform on top of which people build solutions and having those conversations very directly with partners to say, what do you need? We want to be a platform that you build on. This ecosystem-first strategy. So that’s a little different from the Memsource of days gone by, which was a little bit more of a product company. And so we recognize that the largest customers, all customers, medium and large customers of ours, we are part of an ecosystem. We are not the whole solution and when we get a thank you message from a customer, we will be one of three or four companies that are mentioned on that message. There’ll be a number of services partners. So what I’m really excited by is when we can do a press release, which is what I call quadratic, because I don’t know what comes after tripartite. A quadratic press release is a customer, ourselves, a technology partner, and a services partner. So if we can have an LSP, a content partner, ourselves and a customer in a press release or in a case study, showing how all four of those things work together, that is exciting to me. And part of the reason that’s exciting to me is I don’t think we in this industry just do localization anymore. I think we do digital transformation. So if we’re all just competing for the little bit of the localization budget, we need to think more broadly, so localization, tech and services, and then more broadly into digital transformation. And that’s where if you can partner with big SIs, they can also bring all of us into much bigger projects than the ones we’ve been in before. So that’s part of the excitement for me. We are very ecosystem-driven.

Florian: As a conference organizer, what you just said sounds like a panel to me, a quadratic panel. That’s four people on a panel, a moderator that barely has to intervene. We should make that happen. Hey, so DeepL kind of shook things up with their $2 billion valuation. That firmly put a close to the funding winter for kind of the core translation, localization industry. How do you think this will change or affect maybe the core translation, localization industry’s perception in the eyes of VC, PE firms, growth funds?

Georg: I think it’s helpful because it’s shone a light on the industry. It’s attracted headlines and interest. I think Index Ventures are a high-quality investor. It provides useful evidence for investor confidence that the TAM, the total addressable market, here is perhaps bigger than we’ve been thinking that it is. I think that requires a belief that volumes are going to significantly grow, and that requires a degree of venture capital style belief, as opposed to just a straight line from where we are today. But I do believe that that is true. I think that the pace of the evolution in generative AI, we’re talking about multilingual generation at source. But if I’m right in the video I put out a year ago, and that actually the next time we all visit nike.com, the web is just going to be real-time and streamed down in a hyper-personalized way, then you’re going to get hyperscale. And if we all believe in that, then the addressable market here, back to the digital transformation point, is way bigger than we’ve all been thinking it is for a little while, and I believe that’s true. So we, at Phrase, we don’t think of ourselves as playing in a TMS market that’s gone. We think of ourselves as playing in a tech and services market because with AI, all of services becomes addressable as well. Not on our own, but in an ecosystem, but that wider ecosystem is available to all of us. So the 28 billion, I think the recent Slator report estimated it at. We see all of that as addressable to us and our partners. So I think it shone a light on that. We’re both a customer of and a vendor to DeepL, so we’re excited to see them succeed. They build a very high quality product. It’s one of the engines that we aggregate in our Phrase Language AI solution, and it’s one of the most popular engines. Our auto select will choose it when it provides the best results. It’ll choose something else if something else is better for a different content type and language pair, but DeepL does really well, and I’m thrilled for them.

Florian: All right. Before we wrap, anything you want to share about the rest of the year? Any other announcements to make or pre-announce?

Georg: We have a really regular rhythm. We call it a cadence here of innovation. Every three months we do it. We wrap up a bunch of stuff we’ve been doing. So June is the one that’s coming up this week. September will be the next one. December will be the one after that, so we’re very regular with how we do these things. But I think we will see more of these early access programs, more customer outreach, not just in the discovery phase, but actually in the hands-on phase, which I think is a feature of development in the Gen AI world, so we’re actually pretty excited for that. I think a lot of customers really enjoy that co-creation process as well. So nothing to pre-announce for September. It’s too soon to do that. We’ll just do June for now. But yeah, thanks for letting me share some thoughts on what’s happening in the industry and our announcements this week.

Phrase CEO Georg Ell on the Arms Race in Language Technology

Transcript

Slator 2024 Language Industry Market Report — Language AI Edition

SlatorCon London 2026

Slator Pro Guide: Translation AI

Slator Pro Guide: Language AI for Consumers

Featured

Boost Language Access

Leading with Excellence

AI should speak every language

memoQ Translation Tech