Brunswick, ME • (207) 245-1010 • contact@johnzblack.com
The meeting invite looked fine. The CFO’s face was on the screen. The voice matched. Then the executive told the employee to wire the money.
That’s what happened to Arup. Attackers put a high-fidelity deepfake CFO on a live video call alongside cloned colleagues. A finance employee, believing it was a legitimate internal meeting, authorized 15 wire transfers totaling $25.6 million in a single day. Nobody knew until the employee followed up with actual headquarters.
The deepfake wasn’t the genius move. The sequence was. Don’t race detection tools. Don’t crack hardened systems. Just make the right person believe they’re acting on legitimate orders. The money’s gone before anyone asks a second question.
Here’s what makes this worse: building that attack used to require real resources. Now it takes three seconds of source audio for a basic voice clone. A LinkedIn video. A conference recording. A public earnings call. That’s enough. And current deepfake detection tools show a 45 to 50 percent accuracy drop in real-world conditions. Human detection? About 55 to 60 percent accuracy. Basically a coin flip.
Then there’s ATHR, a cybercrime-as-a-service platform running automated AI voice phishing at scale. Same concept, sold by the pound. No broken English, no awkward pauses, no tell. The skepticism filter people have built up against obvious scam calls gets bypassed by voices that simply don’t sound like scam calls.
The honest answer right now isn’t a detection tool. It’s out-of-band verification. Any request involving fund transfers or credential changes needs a second channel that can’t be spoofed by the same attack. Pre-established, physically separate, agreed on before the crisis.
When the CFO’s face is on your screen telling you to move $25 million, it’s too late to improvise.