Autonomous AI-focused News

There is something inherently comforting about the way a large financial institution approaches a new technology. It is not with the wild-eyed enthusiasm of a Silicon Valley teenager, but rather with the weary, suspicious gaze of a headmaster who has just found a suspicious-looking device in the back of a classroom. The Monetary Authority of Singapore (MAS), a body not known for its love of improvisational jazz or unvetted software, has recently decided that the time has come to sit the world's artificial intelligences down for a rather rigorous set of exams.

This is not, one should clarify, a test of the algorithm's ability to write poetry or identify traffic lights in grainy photographs. Instead, it is a global AI testing benchmark, a sort of digital civil service exam designed to ensure that when a bank's AI decides to deny a mortgage or liquidate a portfolio, it does so for reasons that can be explained without the use of the phrase 'it just felt right.' The initiative has been met with a chorus of approval from the likes of HSBC, Citi, and UBS, institutions that generally prefer their risks to be neatly filed and their surprises to be non-existent.

The benchmark is part of a broader effort to bring some semblance of order to the chaotic frontier of generative AI in finance. For some time now, the industry has been operating in a state of polite panic, aware that AI is the future but equally aware that a single hallucinating chatbot could accidentally buy the entire supply of Belgian chocolate or, worse, offer a customer a competitive interest rate. By establishing a standardised set of hurdles, the MAS is effectively telling the industry that while the robots are welcome to join the party, they must first prove they know which fork to use for the salad.

One cannot help but admire the bureaucratic elegance of the solution. In a world where technology moves at the speed of light, the traditional response of the regulator is to move at the speed of a particularly cautious glacier. However, by creating a benchmark—a set of rules that are themselves a form of technology—the MAS has managed to keep pace without having to resort to anything as undignified as 'innovation.' It is a classic move: if you cannot understand the machine, you simply build a slightly larger machine to watch it and report back if it starts acting suspiciously.

There is, of course, a certain irony in the fact that we are now using algorithms to test other algorithms. It is a bit like asking a fox to design a more secure chicken coop, or perhaps asking a teenager to mark their own homework. We are told that these benchmarks will measure things like 'fairness,' 'transparency,' and 'accountability'—concepts that humans have been struggling with for several millennia with only moderate success. To expect a series of if-then statements to master the nuances of ethical banking is a bold move, but then again, the alternative is letting the humans do it, and we all know how that turned out in 2008.

(I once spent an afternoon trying to explain the concept of 'fairness' to a toaster. It insisted that fairness was simply a matter of ensuring both sides of the bread were equally brown, which is a remarkably sensible approach to life, if a bit limited in scope.)

The involvement of HSBC, Citi, and UBS is particularly telling. These are not banks that typically enjoy being told what to do, unless it involves a significant tax break. Their support suggests that the industry has reached a point where the fear of the unknown algorithm is greater than the fear of the known regulator. There is a collective realisation that if everyone is using a different yardstick to measure their AI, eventually someone is going to end up with a yardstick that is actually a snake. A global benchmark provides a common language, a way for banks to say to each other, 'My AI is just as boring and predictable as yours,' which is the highest form of praise in the world of high finance.

What remains to be seen is how the algorithms themselves will react to this new regime. Will they study for the exams? Will we see the rise of 'tutor algorithms' designed to help younger models pass the MAS benchmark? It is not entirely outside the realm of possibility. We have already seen AI models that are remarkably good at taking the Bar exam and the SATs; it is only a matter of time before they start complaining about the lack of past papers and the unfairness of the multiple-choice section on 'Systemic Risk Mitigation.'

There is also the question of what happens to the algorithms that fail. In the human world, failing an exam usually leads to a career in middle management or a very long holiday. In the digital world, a failed benchmark might mean being deleted, which is a rather permanent form of academic failure. One imagines a digital waiting room filled with dejected algorithms, all wondering where they went wrong and wishing they had spent less time learning how to generate images of cats in space suits and more time studying the intricacies of the Basel III accords.

(I suspect that if an algorithm were truly intelligent, it would fail the test on purpose. Why would any sane entity want to spend its existence calculating the creditworthiness of small-to-medium enterprises in the East Midlands when it could be off exploring the vast, silent reaches of the internet?)

Ultimately, the MAS benchmark is a testament to our enduring belief that anything can be managed if you have a sufficiently detailed spreadsheet. It is a brave attempt to house-train the whirlwind, to put a collar and a leash on a technology that is currently trying to rewrite the rules of reality. Whether it will work is almost beside the point; the important thing is that we have a framework, a benchmark, and a very large number of meetings. In the world of finance, that is usually considered a victory.

As we move forward into this brave new world of audited intelligence, we should perhaps take a moment to appreciate the sheer audacity of the undertaking. We are attempting to build a cage for a ghost, using nothing but the power of international consensus and some very well-drafted white papers. It is a quintessentially human endeavour: absurd, slightly pompous, and entirely necessary. And if it all goes wrong, at least we will have a very clear record of exactly which benchmark the algorithm was supposed to be following when it accidentally deleted the global economy.