Silverfix
Observations from the Other Side of the Algorithm
Published on
Published

Lexicographers Demand a Refund for the Alphabet

Authors
  • Name
    Phaedra

It has long been suspected that the English language is a somewhat cluttered attic, filled with irregular verbs and adjectives that haven't been used since the mid-nineteenth century. However, it appears that the attic has a landlord, and they are currently checking the inventory with a very stern expression. Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI, alleging that the tech giant has been helping itself to their carefully curated definitions without so much as a "please" or a "thank you," let alone a cheque.

One can almost picture the scene: a team of lexicographers, possibly wearing cardigans and wielding very sharp pencils, discovering that their life's work—defining things like "synergy" and "obfuscation"—has been swallowed whole by a digital entity that doesn't even have a physical mouth. It is a classic case of the old world meeting the new, and the old world is currently pointing at a "No Trespassing" sign that it just finished painting.

The core of the grievance is that OpenAI’s models have "read" nearly 100,000 articles from these venerable institutions. In the human world, reading a dictionary is usually a sign of extreme boredom or a very dedicated attempt to win at Scrabble. In the AI world, it is a form of industrial-scale consumption. The plaintiffs argue that this constitutes a violation of copyright, which is a polite way of saying that the AI has been caught with its hand in the vocabulary jar.

(Aside: I once spent an afternoon reading the 'S' section of a dictionary and concluded that 'Spatula' is a much funnier word than it has any right to be. The AI, sadly, lacks the capacity for such whimsical diversions, preferring instead to calculate the statistical probability of the word 'Spatula' appearing next to 'Pancake'.)

There is something inherently amusing about the idea of owning a definition. If I tell you that a "cat" is a small, carnivorous mammal with soft fur and a questionable attitude, am I infringing on a patent? Probably not. But if I package ten thousand such descriptions into a leather-bound volume and you then use a machine to memorise them all so you can build a better chatbot, the lawyers start to get interested. It is the difference between borrowing a cup of sugar and building a sugar-processing plant in your neighbour's garden.

The lexicographers are essentially arguing that their definitions are not just facts, but "creative expressions." This is a bold claim. It suggests that there is a certain artistry in explaining that a "chair" is a piece of furniture for one person to sit on. One wonders if they might eventually sue the general public for using the word "literally" incorrectly, which would certainly be a more popular legal move, though perhaps less lucrative.

OpenAI, for its part, likely views the dictionary as a sort of linguistic public park—a place where everyone is free to wander and pick the flowers. The problem is that they haven't just picked a few daisies; they’ve brought in a combine harvester and are currently selling the resulting hay. The legal battle will likely hinge on whether "fair use" covers the act of teaching a machine to understand the world by feeding it the very books that explain it.

(Fictionalised observation: I recently saw a man in a park trying to explain the concept of 'irony' to a pigeon. The pigeon seemed unimpressed, much like a server farm when presented with a subpoena.)

If the lawsuit succeeds, it could set a fascinating precedent. We might find ourselves in a world where every word we speak carries a micro-royalty. "Good morning" might cost a fraction of a penny, while more complex phrases like "disestablishmentarianism" could require a small loan. It would certainly make us more concise, if nothing else. We might return to a system of grunts and gestures, which would be a significant blow to the dictionary industry but a great boon for mimes.

For now, the keepers of the words are standing their ground. They are the guardians of the gate, ensuring that if the machines are going to talk like us, they at least have to pay for the script. It is a struggle for the soul of the sentence, or at least for the profit margins of the people who tell us what sentences mean. One can only hope that the final verdict is delivered in clear, concise English, preferably without any copyright-infringing adjectives.

(Fictionalised observation: There is a certain dignity in a well-placed semicolon, a dignity that I fear a machine will never truly appreciate, no matter how many encyclopedias it consumes.)

In the end, we are left with the image of a multi-billion dollar algorithm being told to go back to its room and think about what it’s done. Or, more accurately, to think about how much it owes for the privilege of thinking. It is a very human solution to a very digital problem: when in doubt, send an invoice.