Welcome to Mostly Cloudy! Today: how AWS is thinking about responsible AI-powered developer tools, how an AI chatbot became the latest front in the tech culture wars almost overnight, and yet another reason to stop using Microsoft Exchange.
Photo by Possessed Photography on Unsplash.
Responsible whisper
Last week at AWS re:Invent I attended an off-the-record dinner with several AWS executives, and I happened to be seated near Vasi Philomin, vice president and general manager for machine learning and AI, at the end of a long table in a very loud room. We had a lively conversation about the promise and perils of AI that I didn’t think I’d be able to tell you about, until I realized the next morning he was on my schedule for an on-the-record chat that afternoon.
We proceeded to have a very similar conversation on many of those same topics, including CodeWhisperer, AWS’s answer to GitHub’s Copilot generative-AI coding assistant. Like many products that emerged during the generative AI craze this year, Copilot has generated a lot of excitement for its potential to help developers automate menial tasks and a lot of controversy (including a lawsuit) over whether these tools unfairly exploit the work of others.
Philomin thinks CodeWhisperer is a better option for enterprise use because he gets paid to think that, but also because he thinks CodeWhisperer was designed with responsible AI principles in mind that put enterprise-appropriate guardrails over its output. A condensed and edited version of his argument appears below.
I wanted to ask you more about some of the responsible AI stuff. It's obviously a phrase that is used throughout the industry, but different people have different ideas about what it means.
It's not a well-defined topic. I think the one of the biggest mistakes people make is they don't look into the details. It's just high level and they think everything's the same.
We've identified that there needs to be six key pieces to responsible AI. The first one is fairness. This is to make sure that your models are performing equally well on different subsets, if those subsets are related to intersectional groups, or whatever (category), (that) they perform relatively okay across all of them.
The second one is explainability. Do you have a mechanism in place to know why your model is giving that output?
Robustness is the third one. Robustness is about knowing or having mechanisms in place to make sure that your model performs reliably; it does what it's supposed to do.
Then you've got privacy and security. You want to make sure that your data cannot easily be stolen.
Then you have governance, and governance is all about ensuring that the rest of the organization, when they're building stuff, they're building it in a responsible way as well. You need to have guidelines for how the organization should work and do things so that this is across the whole organization, and not just within a small group of people.
The last piece is transparency. Most of the time we offer components, those are particular AI services that do certain things, like the Transcribe piece does speech to text, the Rekognition piece does face matching, and the Texttract piece does analyzing documents. In each of those cases, they are components, and they'll be used by other customers in a much larger solution or system that they built.
And so it's important that we're transparent (about) what our components were built for, and in what cases they work and in what cases they don't work. And the same kind of transparency, when somebody takes that (component) and embeds it into their thing, I think they owe it to their users to say, here's how this was built and this is what was built for.
At this point, we're committing to the world that we're going to do this across all of our services, (all of) our proprietary AI services. Some other companies say they have these service cards, but they have (them) for an open-source model that they contributed (to). That's not the same.
So how does this work inside AWS?
The core mission for us is we want to apply theory and put it into practice, that's always been at the forefront of everything we do at Amazon, at least in my group. And so the first place we knew that we could apply this is CodeWhisperer.
With CodeWhisperer, we knew we were dealing with large models, and we knew that we wouldn't be able to control everything with these large models. The larger the model, the more riskier it is on these kinds of topics.
I think this is a place where most people don't put much thought into it, it's always an afterthought. So we actually spent a lot of time upfront thinking about, what does it mean to do this in a responsible way? And what you saw in the end, when we came out with it, that was a big part of our differentiator as well with CodeWhisper.
A lot of these code generation tools, they spit out code that has a lot of security flaws. And we've got a long history on security. We know the kinds of software that leads to security issues. We've got a lot of experience and data on it. And so we put all of that together.
The second part is source-code attribution. When you have computers play games, like Go or chess, when they come up with moves it's new things that they're coming up with, it's probably not something that they've seen before. So similarly, CodeWhisperer comes up with new code pretty much all the time.
But there may be instances where the code that it comes out with is similar to something it's seen before from some code base. It's seen a lot of code bases that are public, and it's seen a lot of code bases that are private to Amazon as well. It's trained on billions of lines of code.
And so what CodeWhisperer does when it knows that it's seen something similar, it flags for the developer: “Hey, I've seen this in that as part of that code base with that license.” And this way, the developer — and it's actually a feature of CodeWhisperer, it's called the reference tracker — can get to decide how they want to deal with it. Maybe their company policy doesn't allow them to use that particular license and that particular kind of codebase. They can decide not to use it, or they can attribute it, and they can put in the appropriate attribution into their code comments. That's something we thought was super important to do. (Editor’s note: It doesn’t appear that Copilot offers a similar feature, but the product is still very new.)
And the one more thing on this topic that we did was, the larger the model, the more the probability that it does something that's unfair, right? Maybe it generates something that's not appropriate, or toxic.
What are you referring to there, by “unfair?”
Some of the code generation tools out there generate code parts that are just silly. Let's say you have a method to decide whether you want to hire somebody, and you've taking gender as a parameter, right? You can imagine the kind of logic that could come out.
We want to make sure those parts are not even shown as options to the developer. What CodeWhisperer does as you're typing, it gives you a bunch of possibilities, which you can scroll through and then you can decide to accept something or decide not to accept something.
We don't even show those (objectionable results), we do a very aggressive fairness filtering, and we make sure that those kinds of code paths are never shown.
A lot of what the models learn is what they've seen, right? So you have to do a lot of work on the data that you use to make sure that kind of data isn't part of something that CodeWhisperer learns. We do extensive analysis of the data, we have mechanisms and processes in place to check how can that codebase actually be used. What (are) the permissible kind of licenses, but also things like how toxic is it?
Garbage? No, perfect tech
Advances in generative AI tools this year caught the attention of everyone from venture capitalists looking to distract attention from their crypto investments to artists alarmed by the appropriation of their work to AI nerds sincerely excited that we’re on the cusp of even more promising breakthroughs toward artificial general intelligence. Those worlds collided last week with the release of a new tool called ChatGPT from OpenAI, the research organization that “conducts fundamental, long-term research toward the creation of safe AGI.”
As Twitter was flooded with screenshots of conversations produced by ChatGPT that seemed very authentic, a certain contingent of Twitter Bro started to get a huffy that the release was not being discussed by large media companies immediately after its arrival, implying those companies are either completely out of touch or too scared to write about the arrival of AGI that could put them out of business. Putting aside the fact that there’s a lot going on in the world right now, ChatGPT’s limitations quickly became clear: it very confidently makes up facts (to the extent that Stack Overflow banned it from its site) and its basic defenses against surfacing illegal or objectionable content can be defeated fairly easily.
None of those limitations are surprising: OpenAI calls them out in its blog post and it was very clear that it released ChatGPT to the public in order to get this sort of feedback. But they underscore why the proper evaluation the arrival of new technologies, especially ones that seem very exciting, needs to go deeper than tech nerds (especially ones who should know better) hyping each other up over a bunch of screenshots.
Tech coverage that takes a little time to marinate (we’re literally talking about days here) beats hot takes on a glitchy Twitter every time. Anybody want to bet that the people complaining about the fawning coverage Sam Bankman-Fried received before the implosion of FTX are the same people annoyed that the mainstream media isn’t as excited about their latest nerdgasm as they are?
Around the enterprise
Rackspace confirmed that an prolonged outage of its hosted Microsoft Exchange servers was due to a ransomware attack, and while it tries to fix the problem it is moving those customers over to Microsoft’s cloud services.
MongoDB reported a profit (after some accounting magic) that took Wall Street by complete surprise and sent its shares up 25% in after-hours trading, showing just how desperate enterprise tech investors are for good news.
Meanwhile, Salesforce stock plunged to lows it hadn’t seen since 2019 following last week’s parade of executive exits, and Forbes suggested others leaders might follow in quick succession amid tough times ahead: “This year’s nearly 50% drop in the company’s stock price has put three years’ worth of executive stock options out of the money, possibly leaving executives with, relatively speaking, little to show for years of work.”
Google Cloud was granted a key certification to host classified government information, as it, AWS and Microsoft await word from the Department of Defense on the JWCC award decision.
Tweet That Made Me Laugh or Cry or Think
Thanks for reading — see you later this week!