AI is capable of doing amazing good in the world.  

Too often popular Hollywood fantasy fears of Terminators and super AI’s taking over everything make people terrified of phantoms and blinds them to all the good it can do.

Imagine you had a little app on your phone that could detect skin cancer.  

You point it at the spot and it tells you whether you should call your doctor.  Then you call the doctor and send the results of the scan to the triage nurse on the other side of the phone.  Now she can make better decisions about who gets to see the doctor first instead of just doing first come first serve.  

Today you could be waiting for an appointment for a month or two.

Meanwhile you have a serious problem that needs looking at fast but you’re in line after the old lady who just likes talking to doctors and the hypochondriac just because they called in first.  In the future, you won’t wait.  You’ll simply send that little cancer app’s report over to the triage nurse and she’ll put you first in line because she knows you have a real problem.

That’s just the tip of the iceberg.  AI will change the way we do everything. We’ve already seen self-driving cars and AI beating the pants off the world’s best Go player.  Alibaba’s “City Brain” spots car wrecks in 20 seconds and calls ambulances, all while helping to route traffic better.

But for all the good AI is capable of it’s also got a dark side.  Those same cameras in China that can spot car wrecks can also track dissidents in authoritarian regimes.  

It’s not just Big Brother that we need to worry about either.  Those are just the flashy problems.  Humans are great a seeing big, flashy threats and missing the more important ones right in front of our noses. While we’re focused on fake problems, there are real ones facing us right now.  

We already have algorithms deciding if people should go to jail or get bail.

Tomorrow they’ll help decide who gets a job, who gets a loan, gets into school, or gets insurance.  That doesn’t have to be a bad thing but when it goes wrong it can go horribly wrong.

Today there’s very little transparency in machine learning.  Cathy O’Neil, author of Weapons of Math Destruction, talked about the case of Tim Clifford, a teacher in the New York City public schools, who was teaching for twenty years, won multiple awards and got rated a 6 out of 100 one year and 96 out of 100 the next year even though he didn’t change a thing in his teaching style.  O’Neil’s book started with a story of another teacher, Sarah Wysocki, who got fired because of her poor scores from an algorithm called IMPACT.

Too often algorithms are black boxes or proprietary.  They’re developed behind closed doors and there’s no transparency or way to audit their decisions.  It’s a lot like the way we developed software before open source swept the world.  

The problem with black boxes and proprietary AI systems is we don’t know how they’re making decisions.  You just have to accept how it works.  The algorithm could deliver world class, incredible results or total garbage that looks plausible.

A friend of mine looked to an AI SaaS startup to help them hire great people.  At first the SaaS company’s demos looked great.  But after a bit my friend started to sense something odd about the candidates it was picking.  They asked the AI company to show them what kinds of features the computer had picked out as good characteristics for new hires.  What as one of the big  characteristics the machine lasered in on?

If the candidate was named “Jerry” he would make a fantastic marketing person.

They didn’t buy the software.  

How many other people did?  

All it takes is a company to develop a slick looking demo and a flashy website and sell it to a school administrator who doesn’t know AI from a broom handle and then we have no idea if it’s making good assessments or assessments from the insane asylum in One Flew Over the Cuckoo’s Nest.

As a society we can’t accept that.  That’s why we need a framework for practical AI ethics and that’s my first major post-Red Hat project, a practical AI Ethics program.  

“Practical” is the key word here.

Most companies do ethics totally wrong.

Here’s the broken template that so many companies and organizations seem to follow.  They form a committee with great fanfare.  They put out a report on how AI should be “inclusive and fair.”  Inclusive and fair sounds great but it means absolutely nothing.  It’s a platitude.  It’s no surprise that after a year nothing changes and the group gets disbanded.  

I call this approach “AI Ethics Theater.”

Nothing gets done but everyone feels like they did something.

We need a better approach because in the future when people don’t get hired, or they get fired, or go to jail because of an algorithm people are going to get mad and they’re going to ask hard questions.  You better have the answers if you built the system.  And the answer better be a lot better than “it just works that way” or “we don’t know why the machine did that.”

So how do we do ethics right?

To start with nobody can do it alone so that’s why we started the Practical AI Ethics foundation.  

Right now it’s a foundation in the loosest sense of the word.  It’s a grass roots project that will grow.  I’m willing it into reality because it needs to exist and I want to bring together as many great people as I can to focus on this problem in a real way instead of pumping out platitudes.  

There are three pillars to Practical AI Ethics:

  • A process for discovering and implementing ethics in code
  • Auditable AI
  • Explainable AI

Let’s start with the first because nobody seems to get it right.  

How would a real ethics process work? 

I’ve given the AI Ethics Foundation a head start by designing a blueprint with the help of the 2bAhead think tank in Berlin and with the good folks at Pachyderm where I’m the Chief Technical Evangelist. 

Let’s pretend your company is creating an algorithm that’s handing out loans.  That means it’s actively “discriminating” against people who can’t pay it back.  That’s all right. Companies don’t need to go bankrupt lending to people who will never pay them back.

But there might be a problem with the historical lending pattern of the company.  Maybe they didn’t give loans to many women. Now the company decides they want to discover more women who can pay back those loans.  This might be for a few reasons.

The first is they might decide it’s just their company’s values.  

The second reason is much simpler.  There’s money in it. Historically women may have been underserved and that means the loan company is leaving money on the table.  Ethics can align with profit incentives if it’s done right.

But how do you translate that value to something the machine can understand?  

If a deep learning system is just studying historical data it will just echo the past.  That means you have to think about the problem in a new way.

Maybe you create a synthetic data set with a generative adversarial network (GANs), or you buy a second data set?  Or perhaps you create a rule based system that gives a weighted score that you combine with the black box AI’s score to form a final decision on the loan application?

Now you have the potential of creating a system that more accurately reflects your values as an organization.

But we can’t stop there.  These systems are not perfect.  Not even close. They’re flawed like everything else in life.  They make mistakes and they make different kinds of mistakes than humans do.  Sometimes that make outlandish errors that a human would never make, like identifying the name “Jerry” as a good sign of someone you should hire.  Other times they make subtle or unforeseen errors. 

That’s because they’re making decisions in an infinitely complex environment where we can’t process all the variables called real life.  In the early days of AI we could brute force our way through all the possible decisions. Deep Blue beat Kasperov with raw compute power.  But more complex decisions are just too variable to know all the possibilities.  The game of Go has more possibilities than atoms in the known universe and there’s no way AlphaGo could search them all.  It had to make do with searching 10s of millions or even hundreds of millions of choices with Monte Carlo tree search and sometimes there still wasn’t a good answer.  

Machines and people make predictions based on incomplete information in a chaotic system.  They process as many of the possibilities as they can before they’re overwhelmed with too many variables.  

In other words, they make guesses.  

Those are good guesses but they’re still guesses.  

Sometimes those guesses are wrong and lead to mistakes even when we’re really good at making predictions.  A ball player knows how to predict which way a fly ball is going and time his jump to catch it but he doesn’t get it right every time and he can’t no matter how good he gets or how much he practices.  

And when you step outside of the field of sports into a super complex environment like driving a car on real streets with rain and dust and other cars and street signs covered over or broken you can’t see every problem coming before it happens.  Inevitably, you get a problem that nobody saw coming.  

Sometimes it’s a big PR nightmare, like when Google’s visual recognition systems started identifying people of color as gorillas.  Other times it’s a super subtle problem that might not show up easily without a lot of time going by, like women not getting loans who were really qualified to pay that money back.  

In both cases, you need a program in place to deal with it fast.  You need an emergency AI response team. That means you need to know who’s in charge, who’s going to talk to the public, deal with it on social media and how you’re going to fix it.

Maybe, you need to take that system offline for a period of time, or roll it back to an earlier version, or put in a rule to stop it from making it go off the rails temporarily until you can fix the bigger problem?

That’s what Google did.  They triaged the problem by no longer allowing the system to label anything as a gorilla.  Believe it or not, that’s actually a good emergency response but it’s just the first step. But that’s where they stopped.  Instead, they needed to then go back and actually fix the problem. 

To do that they needed to figure out a better way to train the model and they needed to follow that up with coders who could write unit tests to make sure the problem doesn’t come back.  We spend a lot of time in AI just judging everything by accuracy scores but that’s not enough. Right now data science needs to evolve to take on the best ideas we’ve used in IT for decades, snapshots and roll backs to known good states, logging, forensic analysis, smoke tests and unit tests.

All that leads us to the second two pillars in the program, auditable AI and explainable AI.

These systems should continually log their decisions to a log aggregation system, a database or an immutable blockchain.  That’s where you get to put that AI Ethics committee or a QA team for AI, in charge of models and data integrity, to work.  After that a random sampling of those decisions need to get audited on an ongoing basis. That’s known as the “human in the loop” solution.  Let people look for potential problems with our own specialized intelligence and built in pattern matching ability.

You can also automate monitoring those decisions with other AIs and simple pattern matching systems.  Over the next decade I expect automated AI monitoring and auditing to become its own distinct category of essential enterprise software.  With a human in the loop and automated monitoring that gives you a two pronged approach to spotting problems before they happen.

The second approach is explainable AI.  That’s a bigger problem because we don’t have perfect answers to it yet.  Right now we have machines that can drive cars and hand out loans but they can’t tell us why they made the decisions they made.  Explainable AI is still a hotbed of academic, government and corporate research and the I want the AI Ethics foundation to bring together people working on how to get AI’s to tell us what they’re doing.  

Handling all this data, decision making, and AI’s monitoring AI’s in infinite regress, comes down to something basic.  We need to tools, processes and software to make it happen.