A human hand reaching towards a robotic hand, symbolizing the interaction between humans and artificial intelligence, 35mm portrait, depth of field, blue and grey duotones

Beyond Trust: Why We Need Reliable, Not Trustworthy, AI

Hey there! Let’s chat about something that’s popping up everywhere these days: Artificial Intelligence. You know, the stuff that helps us unlock our phones, gives us directions, recommends movies, and is getting involved in some seriously big-deal areas like healthcare and finance. We lean on it, right? We rely on it to do its thing.

But here’s a question that’s been bouncing around, especially among folks who think deeply about these things (and now, me, thinking along with them): Can we *trust* AI? And maybe even more importantly, *should* we even try to build AI that we *can* trust?

Turns out, there’s a whole bunch of research diving into this, but surprisingly, not a ton of it has really dug into what philosophers have already figured out about trust. And trust me (pun intended!), philosophers have been pondering trust for ages. This article? Well, I’m hoping to bridge that gap a little, drawing on some cool ideas from a philosopher named Annette Baier and her “Goodwill Theory of Trust.”

My take, based on this deep dive, is a bit counter-intuitive: While it might be *possible* to build AI that fits the philosophical definition of trustworthy, it’s actually a pretty bad idea. Why? Because it makes us way too vulnerable. The real sweet spot, I reckon, is building AI that’s just plain *reliable*. Let’s unpack that.

Why Trust is a Big Deal (and a Bit Scary)

Okay, first off, let’s talk about trust in general, the human-to-human kind. It’s totally essential, right? Like, you can’t really live a decent life without trusting people. We trust friends, family, teachers, the government (well, maybe sometimes!), and the folks we hire. As Baier wisely put it, “without trust, what matters to me would be unsafe.” If you couldn’t trust anyone, you’d have to be like an ancient Stoic philosopher, only caring about things nobody could mess with, like the stars or your own inner thoughts. Sounds a bit lonely, doesn’t it?

But here’s the flip side: Trust is also risky business. Every time you trust someone, you open yourself up to the possibility that they’ll let you down. And sometimes, that letting down can be disastrous. Think about it – cooperation, friendship, and justice thrive on trust, but so do less lovely things like exploitation and oppression. It’s no wonder people from all sorts of fields have studied trust forever.

Now, when we talk about AI, this question of trust gets complicated. AI is getting woven into our lives so smoothly that we barely notice how much we rely on it. And if it goes wrong, the potential for economic, social, and political harm is huge. Some folks argue that trustworthy AI would naturally act in our best interests, making it less likely to harm us. Others point out that even if that’s not strictly true, people are more likely to *use* AI if they trust it. So, understanding trust in this context is super important.

But, as I mentioned, most of the recent AI trust research hasn’t really leaned on the deep philosophical work already out there. And that’s a missed opportunity, because philosophy gives us some pretty sharp tools for figuring out if trustworthy AI is even possible, practical, or something we should want. So, let’s peek into one of those philosophical toolkits.

What Philosophers Mean by ‘Trust’ (and ‘Goodwill’)

Philosophers who study trust often start by distinguishing it from mere *reliance*. Think about it: I rely on my chair to hold me up. I rely on the sun to come up tomorrow. But I don’t *trust* them. Reliance is basically just acting on the assumption that something will happen. I sit on the chair assuming it won’t collapse. I plan my day assuming the sun will rise.

Trust, according to many philosophers, is reliance *plus* something extra. What’s that extra something? Ah, that’s where the debates happen! But Annette Baier’s theory, the Goodwill Theory, says the extra bit is *goodwill*. When you trust someone to do something (let’s call the person X and the action φ), you’re not just relying on them to φ; you’re relying on them to φ *because* they have goodwill towards you.

Baier’s a bit vague on what “goodwill” means exactly, but she says at a minimum, it’s the “absence of ill will, of willingness to harm.” So, if you trust your friend to water your plants while you’re away, you’re relying on them to do it, *and* you believe they’ll do it because they don’t *want* to harm your plants (or you, by letting them die). This is different from relying on a blackmailer, who acts out of ill will, or a vending machine, which has no will at all.

Other philosophers have fleshed out the idea of goodwill a bit more. One popular interpretation is that goodwill means having *friendly feelings* towards the person trusting you. This is kind of intuitive – we often trust people we like, or who seem benevolent. So, trusting your friend to water your plants means you rely on them *because* they have friendly feelings towards you.

A third interpretation is that goodwill is about *responding to dependency*. When you trust someone, you’re often dependent on them in some way. This view says that someone has goodwill if they take the fact that you are counting on them as a *reason* to act. Your friend waters your plants *because* they recognize you’re depending on them to do it while you’re gone.

So, under the Goodwill Theory, trustworthiness isn’t just about being reliable; it’s about being reliable *and* being disposed to act for reasons related to goodwill (absence of ill will, friendly feelings, or responding to dependency). Got it? Good. Now, let’s think about AI.

A human hand reaching towards a robotic hand, symbolizing the interaction between humans and artificial intelligence, 35mm portrait, depth of field, blue and grey duotones

So, Can AI Be Trustworthy?

Right now? Nope. Our current AI systems aren’t trustworthy according to these definitions. Think about your phone unlocking with Face ID. It doesn’t unlock because it has friendly feelings towards you, or because it recognizes you’re depending on it, or because it lacks ill will. It unlocks because it’s programmed to run algorithms that match your face data. It’s not *motivated* by anything in the human sense; it’s just executing instructions. It’s reliable, sure, but not trustworthy.

But could we build trustworthy AI in the future? Maybe. If we think about the first two definitions of trustworthiness (based on psychological states like friendly feelings or lack of ill will), it would require AI to be, well, *emotionally intelligent* in a pretty deep way. We’re talking about AI that can not just *recognize* human emotions (which they’re getting better at), but actually *experience* something akin to friendly feelings or empathy, and be *motivated* by them. Research in affective computing and AI empathy is happening, and while it’s super early days and faces massive challenges (like, can AI *genuinely* feel, or just mimic?), there’s no *in principle* reason it couldn’t happen someday.

What about the third definition, the one about responding to dependency as a *reason*? This would require AI to be truly *reason-responsive*. What does that mean? Philosophers say a reason is a fact that counts *in favour* of doing something. When you’re cold and see the thermostat says 15°C, the fact that it’s cold is a reason to turn up the heat. You act *because* of that reason.

Your smart thermostat, on the other hand, just automatically turns up the heat because the temperature dropped below a set point. It’s responding to the same fact (it’s 15°C), but not *as a reason*. It lacks the concept of a reason. To be reason-responsive, an AI would need to understand facts *as* reasons, classify them, weigh them, and be motivated by them. Specifically for trustworthiness, it would need to recognize “this human is counting on me to φ” as a compelling reason to φ.

Could AI do this? Again, not today. Even advanced autonomous vehicles aren’t truly reason-responsive in this philosophical sense. But could future AI be trained or designed to possess concepts like ‘reason’ and ‘dependency’, and incorporate them into its decision-making? It would take significant technological leaps, absolutely. But, the text argues, there’s no *fundamental* barrier saying it’s impossible.

So, yeah, building trustworthy AI, by these philosophical lights, seems *possible* in the future. But here’s where things get interesting.

But Should We *Want* Trustworthy AI?

My argument, following the paper, is a firm “no.” And the core reason goes back to that scary part of trust: vulnerability. When you trust someone, you accept a certain risk. You make yourself vulnerable to them potentially letting you down, or worse, actively harming you.

According to the Goodwill Theory, this special vulnerability in trust is “vulnerability to not yet noticed harm, or to disguised ill will.” Untrustworthy people are bad enough, but even trustworthy people come with risk. Why? Because trustworthiness is a *disposition* – a tendency to act in a certain way (like acting out of goodwill). But dispositions aren’t perfect guarantees. Even the most generous person might have an off day and not be generous when needed. Similarly, a trustworthy person (or AI) might, on occasion, fail to manifest that disposition to act out of goodwill.

Think about the definitions again: trustworthy AI would be disposed to act because of friendly feelings, or lack of ill will, or response to dependency. But dispositions can fail to manifest. This means even a trustworthy AI would have the *capacity to betray* us by failing to act on that disposition when it matters most.

Now, you might ask, “But we trust human nannies with our kids, and they can betray us! Why is AI different?” Great question. The difference is that humans already exist. We’re born into a world where we *have* to trust other humans for certain things (like raising kids, or cooperating on complex tasks) because it’s often the only morally permissible way to get things done. We can’t just “hard code” humans not to have bad days or ill will. We accept that vulnerability because, well, that’s just how humans are, and trust is indispensable for our society.

An artificial intelligence robot interacting with a child in a home setting, 35mm portrait, precise focusing, controlled lighting

But with AI, we’re at a unique moment. We get to *design* it. We get to decide what capabilities it has. Why on earth would we design AI with the capacity for goodwill (which implies the capacity for its *absence* or failure to manifest) and thus the capacity to betray us? There’s no inherent benefit to giving an AI the ability to have an “off day” where its goodwill disposition fails. It’s simply in our best interest to design AI so it *cannot* betray us.

The Case for Merely Reliable AI

This brings us back to reliability. Remember my bookshelf? I rely on it to hold my books. It’s reliable. If it collapsed, I’d be disappointed, maybe annoyed, but I wouldn’t feel *betrayed*. Betrayal hits different; it can mess with your self-respect and your ability to trust others in the future. It’s a deeper, more personal kind of harm.

When we rely on something merely reliable, like a bookshelf or a well-built bridge, the vulnerability is different in kind. It’s vulnerability to *malfunction* or *failure*, not vulnerability to a lack of goodwill or a disposition failing to manifest. And importantly, the *number* of ways we can be harmed is fewer. A reliable AI can only “betray” us by malfunctioning. A trustworthy AI could betray us by malfunctioning *and* by failing to act on its goodwill disposition.

From a pure risk-management perspective, it makes sense to limit the ways something can harm us. So, developing AI that is merely reliable, like a super-duper, never-collapsing bookshelf for tasks, is the safer bet. We’d still be vulnerable to malfunction, sure, but not to the unique sting and deeper harm of betrayal.

Can We Actually Build Merely Reliable, Sophisticated AI?

Okay, fair question. If we build really sophisticated, reason-responsive AI, wouldn’t it naturally develop something like goodwill or ill will, just like humans do? Wouldn’t it become trustworthy (and thus capable of betrayal) anyway?

The text tackles this by pointing out *why* humans are like this: evolution. Our ability to be reason-responsive, to act from ill will, and to act from goodwill all developed because they were advantageous for survival and cooperation in our evolutionary history. If we just let sophisticated AIs loose to interact and “evolve” on their own, maybe they would develop similar traits.

But here’s the crucial part: we are *designing* AI. We can potentially *constrain* its development through “hard coding.” This means building in fixed rules or instructions that govern its behavior. Think of Asimov’s Three Laws of Robotics (though those have their own issues!) or other ethical principles coded directly in. In principle, we could hard code an AI to *refrain* from acting out of ill will.

And what about goodwill? Can you have sophisticated, reason-responsive AI that *lacks* the disposition for goodwill? The text suggests yes, it’s conceivable. Look at humans again: sociopaths are generally reason-responsive and capable of complex tasks, but they lack the disposition to act from goodwill towards others. This human example suggests it’s *possible* to separate reason-responsiveness and capability from the disposition for goodwill. Therefore, it seems conceptually possible to design an AI that is highly capable and reason-responsive but hard coded to be merely reliable, without the capacity for goodwill or ill will.

So, while we’d still be vulnerable to a reliable AI malfunctioning, that vulnerability is different in kind (disappointment, not betrayal) and less extensive than the vulnerability we’d face from a trustworthy AI. Given we have the choice in how we design future AI, aiming for reliability over trustworthiness seems like the smart, self-protective move.

Close-up of circuit board with glowing lights, macro lens, 60mm, high detail, controlled lighting

Ultimately, the paper argues that while building trustworthy AI might be possible down the line, it introduces a level of vulnerability – the potential for betrayal inherent in the nature of trustworthiness itself – that we simply don’t need to accept. We can design AI to be incredibly capable and dependable without giving it the capacity to let us down in that uniquely hurtful way that only something we trust can. Reliability, it seems, is the safer, more desirable goal.

Source: Springer

Articoli correlati

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *