The absurdity of word counting

Mathijs Sonnemans
April 18, 2023

In the translation industry, word counting has long been the standard method for determining pricing and billing for translation services. It seems like an easy, convenient and straightforward method to quantify the effort required to translate a document; thus, it’s easy to put a price tag on it. Clients would like to get a quote after all. Many in our industry also acknowledge that this approach has significant limitations and is far from perfect for various reasons which we’ll be taking a look at. In my eyes, it’s actually quite absurd that we calculate all costs based on the number of words in a document. However, I also see that word-based pricing won’t be going away in the near future.

Whether you’re a seasoned translator, project manager, CEO or just starting in the industry, understanding the shortcomings of word-based pricing is crucial to making informed decisions. Because if you don’t watch out, you might accidentally make decisions with serious financial consequences.

TL;DR? Skip to here

…in a universe far far away

Imagine a parallel universe that is no different from the one we live in, except for one tiny little alteration: in this universe the construction industry has grown to favor a pricing model where the number of nails used on a construction project is the basis for calculating cost and thus for how much clients are charged. Given a blueprint, contractors need to come up with a quote that is correlated with the effort they will have to put into the project. To estimate the effort they use the blueprint to calculate the number of nails needed and their final quote is the number of nails multiplied by a nail price.

In this universe, we meet Jill. Jill wants to have a shed built in her garden. Blueprints have already been drawn up and the location has been selected. All she needs is a contractor. She decides to send her blueprints to three different contractors for a quote. To her surprise, the different contractors get back to her with three very different price estimations.

Not that weird. In our universe, it’s not expected that all three contractors would quote the exact same price for a project. But the prices Jill received actually differ by a margin of 60%. That’s quite the difference! Jill decides to call one of the contractors to inquire about how the estimate was made. She learns about nail-based pricing and how the construction industry uses it as a standard. However, the contractor cannot tell her more about how the nails in her blueprint were counted than some general principles.

What’s going on here? Well, as we dig deeper, we find that the contractors don’t actually count all the nails by hand. They all have software that scans blueprints and calculates the number of nails needed. Jill did some nail counting herself and also arrived at a different number. She now calls all the contractors to ask how they arrived at their respective nail counts. Unfortunately, none of them can actually answer Jill. This is because even though the contractors rely on their respective nail-counting programs, the actual algorithms used are a secret, closed even to the contractors.

The word count problem

Perhaps our universe is a lot like Jill’s. Except in our universe it’s not nail counting in the construction industry, but rather it’s the language industry that has decided to base pricing on word counts.

Translation providers cannot actually explain to their clients how they reached the numbers that they are quoting

When giving quotes to clients who have files that need to be translated, a translation provider will generally reach for software that can count words (most likely a CAT tool). Users of CAT tools can, depending on the tool, still provide some rules for the counting. And in most cases the documentation on the tool gives some information about the method used. However, the exact manner in which these words are counted is usually a secret if the software is closed source. This means that when asked, translation providers cannot actually explain to their clients how they reached the numbers that they are quoting beyond the broad definitions of the concept of what counts as a word.

But in fact, no two CAT tools actually agree on a definition of a word, as word counting is not an exact science. Mixing Language and technology is asking for trouble - in more ways than one. Even though some sentences can have a clear and undisputed word count, contractions, punctuation, numbers, quotes and many other factors mean that there is no one way to count a word. And thus software needs to make decisions. Let’s not even mention the fact that this software also needs to make word count decisions about languages far more complex than English! Check out this article for a deeper dive into the word count problem.

If you already think that basing pricing on a metric without a standard or consensus is weird, then hold on to your devices. It will get even weirder.

Let’s talk about the carpenters now

In the nail-counting universe, contractors subcontract their carpentry work to carpenters. However, these carpenters are also paid based on the nail count that the contractors calculated. After all, that’s the industry standard. Practically, this means that all the work that isn’t related to hammering nails into planks is ignored in the compensation for the carpenters.

Every shed needs a foundation. Jill’s contractor’s carpenters arrive on a beautiful Monday morning, only to find that the location Jill selected for the shed is situated on an uneven slope. This means that they have to reach for their shovels and spend quite some time digging out a level area on which the shed can stand. This is time the carpenters are not even paid for!

In our word-counting universe, this analogy holds true. Word-counting tools cannot take context into account. Not all files are simple lines of text. They have different formats, tags and markup, and this is even assuming that the files are formatted to specification! This means that in the real world, complicated content, tag-heavy content or unparsable content results in extra work. Either for the translator or for the translation provider. And since the quote was already automatically generated and accepted by the client, this can be considered unpaid labor performed by the translator. Luckily, some smart translation providers actually charge a flat file processing fee to mitigate format-related problems.

Not all problems stem from the file formats though! Sometimes there are linguistic complications that require more effort, like shifting around sentences. On top of that, the more complex the topic, the more time is spent translating documents, regardless of the number of words.

So how does word counting software work?

A word count generally involves the following steps:

Text extraction → given a file of any extension, the software needs to identify which parts of the file are actually translatable text and which parts aren’t in order to extract the translatable content.
Segmentation → the translatable text that was extracted is chopped up into ‘segments’ - units of languages - usually but not exclusively sentences - as they are then more easily processed by computer-assisted translation features.
Matching → each segment is matched with other segments in the file and with already translated segments from other files.
Counting → for each segment, the words are counted. These word counts are then used together with the match type (e.g. 50% match) of the segment in order to give a deeper analysis of the text.

In our parallel universe, all the nail-containing areas are extracted from the blueprints. These areas are then segmented into panels. At this point the contractors had an idea: we can give our clients a discount based on the extent to which we can use prefabricated (pre-nailed) panels! This means that the nail-counting software is instructed to analyze the segmented panel sections and match these with prefabricated panels. When these matches are found, the client gets a discount on the nails that need to be used. Fewer panels equals less labor equals lower cost. Or so it seems.

The contractor creates a table relating the areas that match prefabricated panels to pricing:

Prefab fit	Cost per nail
No match	$0.15
50%	$0.10
80%	$0.08
100%	$0.04

What does prefab fit mean? Well, the software matches the panels against different prefabricated panels in the database. Sometimes the match is perfect. Sometimes the panels only need to be adjusted a little. The percentage specifies how much they need to be adjusted. Pretty clever trick, right?

This is more or less how translation memory (TM) matching works in the localization industry. Translation providers build databases of previously translated segments. These segments can then be used later to help the translator when the same or similar segment needs to be translated.

I hear many different opinions regarding TM matching from the translators I talk to. Some hate it and consider it skimming off profit. Others agree that translating partially matched segments is arguably faster and therefore can be paid at a lower rate. The problem again is that these match percentages are quite arbitrary. First of all, deciding where one segment stops and the next one starts is just as ambiguous as counting words. Secondly, a segment can have a high match percentage based on the number of similar words in the sentence, yet the semantic difference between the sentences compared can lead to a very different translation! Linguistically, matching segments based on word similarity is not a reliable way to quantify effort. Finally, of course, these matching algorithms aren’t made public either!

Imagine how the carpenters in our nail-counting universe will feel if the prefab panels that they brought actually don’t match at all with the shed they are building. They either need to readjust the panels or just build the section the old-fashioned way. The contractor that made the calculation is not there and is unaware of the problem. Yet the carpenters are now stuck with the consequences and cannot change the agreed upon deal. They are forced to hammer in the planks the old fashioned way, and you can bet it will strain the relationship with the contractor.

The solution looks pretty obvious: just compensate carpenters enough on a per-nail basis to account for the unexpected inconveniences that can appear. That sounds good in theory, but the ‘reality’ is that this buffer to the per-nail pricing is the first to be skimmed off in order to undercut the competition.

The biggest problem for translation providers

We have seen how problematic nail-based - I mean word-based - pricing is for translators. But let’s conclude with what I think is the most absurd aspect of word-based pricing. Word-counting software (and thus algorithms) are generally closed source. Which, as we highlighted, leads to the situation where translation providers cannot explain their quotes to their clients. This is not the only problem though!

Imagine that a certain translation provider has built their business on a certain tool named Tramem and built their pricing models on top of that. Tramem has been working well for them in the past, and the business has turned a decent profit every year. However, at some point it is decided that Tramem is not the future for this provider anymore. It doesn’t connect very well to other systems, and the translators are more hyped about another tool: Catword. Transitioning is a pain, but overall everyone is very happy with Catword the moment it is introduced. At the end of the year, the profit/loss balance is calculated, and suddenly the company is losing money! What happened?

Tramem and Catword both counted words differently. While maintaining the same pricing, Catword actually counted up to 50% fewer words in certain cases, which means that clients paid up to 50% less. Granted, translators were also paid less (we won’t even mention how problematic that is), so the percentage-based margin per project stayed the same. However, because the operating costs remained the same, the translation provider was making a loss at the end of the year.

Sounds unrealistic? Not really. Research has shown word count differences upward of 61% among common industry-standard tools. [1]

Should we double down?

Mixing language and technology is asking for trouble. Even though we have amazing AI models nowadays, it is still very hard for computers to comprehend content and thus estimate the real effort required to translate text. Interestingly, I have heard voices that advocate for even more (linguistic) data points on content to help estimate effort. Perhaps even training an AI model! However, I believe that this is not the right direction. More parameters in a model doesn’t necessarily mean better approximations. I would even go as far to say that if a model existed that could accurately estimate effort, then just let it translate the text while it’s at it.

Word counting is ambiguous, segmentation is ambiguous, matching segments is ambiguous. We are stacking more and more levels of ambiguity in order to hold on to a supposedly easy and straightforward way of calculating effort. Given that three out of four steps used to calculate the word count are ambiguous, we can conclude that actually calculating effort is not at all easy and straightforward.

Somehow we did end up in a universe where word counting has become the norm for translation pricing rather than actual time-based compensation

We live in a world where construction cost is not estimated by nail counts but (for the most part) by work hours. Somehow we did end up in a universe where word counting has become the norm for translation pricing rather than actual time-based compensation. The general principle still holds: more words equals more effort. This means that word counting will probably stick around for a while. We need to communicate some estimation of cost to clients after all. However, I would actually invite people to look at even less ambiguous metrics like character count or even file size. These metrics are controversial and would in theory estimate effort just as much as word counts would.

With machine translation more and more integrated into the translation process, I believe that in the future we will at least get rid of translation memory matches. Many CAT tools now show machine-translated segment suggestions next to the translation matches.These machine suggestions should in theory be at least as good as partial memory matches when you measure for effort. Some machine translation tools, where the model retrains based on previous translations, already beautifully combine the two. If every segment has a machine translation-based suggestion, and terminology management is in order, then we can more comfortably use software to estimate effort and cost without having to knit-pick and skim off margins on each possible segment. We can get rid of the ambiguous segmentation and matching layers we currently rely on.

In the near or distant future I see, we will treat the translation process more as part of language operations where people are continuously responsible for multilingual content management rather than translation based on the source/target dichotomy. Here, how many words are “translated” is of marginal importance. What matters most is the true expertise of language professionals together with the different tools in their toolbox. In this universe, they are actually compensated fairly for their expertise and effort. Furthermore, when language professionals are responsible for the performance of their multilingual content over time, paying by hour is the only thing that would make sense.

I have created an open source project in an effort to at least get a discussion started about open source-ness in our industry. My aim is to keep things simple and performant. It’s odd that when it comes to such a highly subjective matter as language, we still want to hide away our tricks. With the collective power of all minds in our industry, we can truly build cool things that might just bring the industry into the 21st century.

References:

[1] Hemker, Andre (2015). Wordbee, Memsource und SDL Trados Studio 2014 in Verbindung mit SDL Studio GroupShare - Ein praktischer Vergleich von SaaS- und klassischen, Client-Server-basierten CAT-Systemen 2014. (Available on request)