Americas

  • United States

Asia

lucas_mearian
Senior Reporter

Q&A: What happened when a customer support company upgraded some features to ChatGPT?

feature
Feb 17, 202319 mins
Artificial IntelligenceAugmented RealityChatbots

Fergal Reid, director of machine learning at Intercom, whose customer support software is used by 25,000 enterprises, just rolled out some new features for clients powered by ChatGPT. It turns out that there are still some things to get better.

ChatGPT

ChatGPT is a chatbot sensation that virtually every enterprise — large and small — is considering using to create greater business efficiencies. Five days after its November launch, it had 1 million users. Over the next two months, the AI chatbot had more than 200 million users.

The machine learning language program created by San Francisco-based research firm OpenAI offers human-like text responses to queries. It can summarize long articles or text-thread conversations, make writing suggestions, come up with marketing campaigns, produce business plans and even create or correct computer code. And all that can be done with limited investments.

Microsoft, which owns a 49% stake in OpenAI, has invested billions of dollars in the company. It just launched a version of its Bing search engine based on the latest version of OpenAI’s GPT-4 large language model, on which ChatGPT is based. Not to be outdone, Google recently announced its own AI-enabled, experimental chatbot.

Intercom, a customer support software provider whose products are used by 25,000 enterprises globally, including Atlassian, Amazon and Lyft Business, is at the bleeding edge of ChatGPT use. It just added AI-enabled features to its platform using the features of ChatGPT’s large language model — GPT 3.5. 

fergal reid Intercom

Fergal Reid, director of machine learning at Intercom.

Fergal Reid, director of machine learning at Intercom, said there are undeniable advantages with adding ChatGPT functions to the company’s AI-powered customer service software. Intercom’s software is used to help client service reps answer customer questions. Intercom also sells a support chatbot called Resolution Bot that organizations can embed on their websites to offer automated answers to end-user questions.

But there are concerns with ChatGPT that cannot be overlooked, Reid cautioned. And because of pricing questions, the new customer service software already being used by hundreds of Intercom customers today remains a beta.

Nevertheless, Intercom’s customers who’ve tested it are praising the updated software (based on OpenAI’s GPT-3.5 language model), saying it has made their jobs easier.

Reid spoke to Computerworld about the process of customizing the ChatGPT software for his company’s business use, how it offers business value, and the challenges he and his machine learning team faced and continue to face.

The following are excerpts that interview:

Tell me about your company and why you felt it needed to upgrade its existing product? “Our business is customer service, basically. We produce a messenger so if someone has a customer support or service issue, they come onto a business website and start typing into the messenger and it’s like WhatsApp chat. You’ve probably seen these messengers pop up in the bottom righthand corner of a website.

“Intercom is a leader and one of the first companies to pioneer business messengers like that. So, we have this messenger and then we build a whole customer support and platform for support reps — we call them teammates — whose job is to answer customer support questions again and again, day in and day out.

“We saw [that] ChatGPT… just crossed another level of being able to deal with random, noisy conversations. Humans when they ask questions can phrase them in surprising ways. People during the conversation will call back and refer back to something said a couple of turns earlier in the conversation. That’s been hard for traditional machine learning systems to deal with, but OpenAI’s new tech just seems to be doing much better with that.

“We sort of played with ChatGPT and GPT 3.5…and we were like, ‘Wow, this is a big deal. This is going to unlock much more powerful features for our bots and for our teammates that we had before.'”

How long did it take to create and the rollout the upgrades? “So, internally, we had our first prototype demos in the second week of January after starting work on the product at the beginning of December.

“That’s a pretty fast development cycle. We had maybe 108 customers live with it in beta around mid-January and then had another [beta] release at the end of January. Now, we’re in open beta, so we have hundreds more Intercom customers who are using it. We’re still calling it a beta, even though people are using it in production in their jobs day in and day out, because it’s so new.

This is a machine to build compelling demos that don’t currently deliver real value. So, there’s a lot of work to do to understand how much real value is being delivered here.

“These open APIs that OpenAI has are very expensive compared to any normal API you might use. It just costs a lot of money whenever you get it to summarize something you do. It might cost you like five cents or 10 cents. That gets very expensive. It’s a lot cheaper than paying for a human to do it themselves, but this is something businesses are going to have to figure out. How do we price this?

“That’s another reason we’re still in beta. We’re like everyone else. We’re still learning about the economics of this. We are convinced it saves people more time than it costs in terms of the computer processing costs, but how do all those economics work out and what’s the right way to build for these things?”

What’s the difference between ChatGPT and OpenAI’s GPT 3.5 large language model? Did you work with them separately to create your product? “I would really consider that ChatGPT feels like more of a front end to the GPT 3.5 models. But, yeah, anyone building on ChatGPT is building on the same underlying model, which OpenAI called GPT 3.5. It’s basically the same. What’s different is the user interface.

“ChatGPT is trained with slightly more guardrails, so if you ask it to do something that it doesn’t want to do, it’ll be like, ‘I’m just a large language model, I can’t do x or y.’ Whereas the underlying language models don’t have those same guardrails. They’re not trained to talk to end users on the Internet. So, anyone building products are using the underlying models rather than the ChatGPT interface. It’s basically the same thing, though, in terms of the sophistication of the understanding and the power of the underlying model.

“The model we’re using, Text-Davinci-003, OpenAI released the same day as ChatGPT, so that’s basically what everyone is working with.”

Did you have a choice in what you were going to build? Was there another large language model from a third party you could have used to build in your new service rep features? “Not so much in that ChatGPT at the moment is one application of these models that OpenAI hosts. No one apart from OpenAI can literally use ChatGPT. Technically, I’d say ChatGPT is OpenAI’s service for the general public to live on their website and anyone building ChatGPT things. It’s more technically correct to say they’re using the same OpenAI models to power ChatGPT.”

Is ChatGPT being used for the same tasks as your Resolution Bot product? “The features we shipped initially are features facing the support rep and not end users. We have chatbots that face end users, and then we have machine learning-based productivity features that face the support rep. The initial thing we shipped has features to make the support rep better; it’s not for the end-user.

“The reason we did this is because a lot of the current machine learning models from OpenAI suffer a lot from what’s called hallucinations. If you ask them for an answer to a question and they don’t have that answer, they will fairly frequently just make something up.

The reception has exceeded our expectations. There are a few features like summarization that are clearly valuable and then other [good] features like the ability to rephrase your text or make your text more friendly.

“Think of them almost like their job is generating a plausible next completion. It’s not really to ensure what they give you is factual. They make things up. So, we were initially reluctant to just put them in front of end users, answering end-users’ questions. We were worried, and I still am worried, our customers would feel uncomfortable that the bots would make things up. And our early tests really showed just putting in a fully GPT-powered bot naively in front of customers is a really bad idea. We continue to work on that, and we think there are solutions for it in the future.”

How can this tool be useful for support reps if it makes things up? “While we’re working in this area, and of course have internal R&D prototypes, we’ve nothing we have named or are committed to releasing at this point.

“We initially shipped [our chatbot] only to support the support reps because they’ll generally know what the right answers are, and [the chatbot] can still make them faster and more efficient because they don’t have to type it in themselves 90% of the time. And then 10% of the time, when there’s a minor hallucination or inaccuracy, they can just go and fix that.

“So, it becomes slightly more like an interface. If you use Google Docs or any predictive texting where it can give you suggestions, it’s OK if the suggestions are wrong sometimes, but when they are right they’ll speed you up and make you more efficient. That’s what we initially shipped, and we had hundreds of customers in beta by the end of January. And we had a really good launch with that. We got a lot of really good, positive feedback with the new features. It made support reps more efficient, and we really created a lot of volume with it.

“Where the reps write, it intuitively allows them to rephrase the text, but it’s not just automatically sending it to the end user. It’s designed to empower the teammate to make them faster.”

Are there other ChatGPT features that are also powerful for customer reps? “Yes. Another feature we built is summarization. These large language models are excellent in processing existing text and excellent at generating a summary of a big article or [text] conversation. Again, we have a lot of support reps and they have to hand off conversations when an issue gets too complicated for them. They have to hand it over to a supervisor and often they’ll be mandated to write a summary of the conversation they had with the end user. We have some reps who tell us sometimes writing the summary takes just as long as responding to the conversation does. But, they have to do it.

“So, this technology is excellent at summarizing and condensing text…, it’s radically reduced. So, one of the features we built that we’re most proud of is this summarization feature, and we built that so that you just press a button, and you get a summary of the conversation so far. But then, you can edit it and then send it to whomever you’re escalating it to.

“All the first wave of features are designed to have a human in the loop. The human is augmenting them. They [customer service reps] don’t have to spend a few minutes reading through the whole conversation and pulling out the relevant facts. Instead, the AI pulls out the relevant facts and then the reps just have to approve them, or they can say there was some nuance missing.

“These models are dramatically better than what we had before. But they’re still not perfect. They still occasionally miss nuances. They still don’t understand things that a skilled human rep will.”

How did you modify the ChatGPT software to suit your needs? How does that work? “OpenAI gives us an API that we can send text to and get back the text its model provides. In contrast to the past, you really work with this technology by kind of ‘telling it’ in English, what you want it to do. So, you send it a bunch of text like:

“Summarize the following conversation: Customer: ‘Hi, I have a question.’ Agent: ‘Hi there, what can I help you with today?'”

“You literally send it that text, including what you want it to do, and it sends you text back – in this case, which now contains the summarized version [see .gif]. And then we process that, and present it to the support rep, so they can choose to use it or not.

chatgpt gif image Twitter

An example of a customer conversation with a representative that is then summarized by the Chat-GPT-powered bot and sent to a customer rep supervisor for futher assistance.

“One thing people do is use this to summarize emails. An email will often contain all the previous history of the emails underneath, and you can use this to summarize [that email thread]. This works in a way that programming languages in the past haven’t, but you have to pay a lot of attention to get it to do what you want it to do. When you ask it to do something, you have to be very specific…to avoid errors.

“It’s interesting. It’s a different type of technology to use. It’s a different technology than traditional machine learning.”

It’s a lot cheaper than paying for a human to do it themselves, but this is something businesses are going to have to figure out. How do we price this?

Did you use your own IT team or software engineers to customize ChatGPT for your uses? And how difficult was that? “At Intercom, we have a very big research and development team — like any software and services tech company. I lead the machine learning team here, so most of the people on the team are deep experts in machine learning and PhDs in the area of machine learning — myself included. So, we have a lot of experience training machine learning models and working with them.

“We have an internal [customer success] team that we use as like an alpha customer — Intercom has maybe 100 [customer success] reps. So, we would ship prototypes very quickly to them and get their feedback on the model. But we’re not using them to train the model. We’re just using them to as alpha testers to help identify problems and identify where it goes wrong.

“There is a lot of work to it. It’s very easy to come up with a compelling demo, but there is substantially more work to make it function in production. So, having a team of people who can look at it very early on — we don’t know if this is good or bad. Is this a toy? Will this help reps be more productive? Please tell us. That was really helpful. There were several other products we prototyped, but didn’t ship because they were toys.”

Do you see this chatbot product eventually being shipped for end-user use — without a customer rep middleman, so to speak? “We’re investigating that at the moment. We’re not quite ready to share about that at the moment, but yeah, definitely this type of technology we think will soon be ready for use by end users. A lot of people are trying to work around these problems of hallucinations, and I think we’ve seen examples of them. Google had a launch recently, and at the launch it appeared their model presented something that was factually inaccurate, which disappointed a lot of people.

“So, everyone is having to figure out how to deal with this problem of occasional hallucinations. We are working very hard on that and we’re optimistic, but we don’t have anything else to share at this point.”

How much time and effort has the upgraded AI-enabled features saved clients and their customer service reps? Has it cut their customer response times by a third, a half? “No. Probably not that high. I don’t have hard numbers because this is so new. We’ve got telemetry running against it, but it’s probably going to be another few weeks before we’ll have number. It’s a hard thing to measure.

“I’d say something like summarization can save around a minute or two minutes potentially on a 10- or 15-minute conversation. Something like that. That’s a soft interpretation of the feedback we’re getting, but also what we’re seeing. It’s definitely real and there’s a lot of excitement around it. Ever since going through a public beta, you can go on Twitter and find Intercom customers posting about how it’s saving them time. You don’t have to take our word for it.

“I also think everyone in this space has a challenge of keeping themselves honest at the moment. This technology is so obviously exciting, it’s very difficult to be sober about it and not over exaggerate. This is a machine to build compelling demos that don’t currently deliver real value. So, there’s a lot of work to do to understand how much real value is being delivered here. So, we’ll dig into that, but we wanted to ship early and have our customers tell us what they think of it.

“The reception has exceeded our expectations. There are a few features like summarization that are clearly valuable and then other [good] features like the ability to rephrase your text or make your text more friendly. Or another feature we have is the ability to write a short-hand version of your message and expand it out — customers have responded extremely strongly to those features.”

So, you don’t have any hard data on just how more efficient this makes a customer service representative? “Honestly, we need to look at our telemetry in a month or two and determine if they kept using it every time. I’m confident they will, but we need to check. I think this field overall is still working on the killer apps for this.

“We’ve got this amazing demo with ChatGPT, which has really gotten everyone to pay attention. But, there are some companies like Intercom that are determining how we can turn that from a toy into something with real business value. Then even at Intercom, we’re like, ‘We’ve shipped away these features and they’re cool. They seem like they would be valuable, but they’re not game-changing yet. They’re not making a customer rep twice as fast or three times as fast. How do we do that?’

“I think that’s the next wave of what we’re working on now, and those are longer development cycles. Those are not as simple as integrate quickly and get it out there. You’ve got to go really deep and understand the user problems and all the different facets of it and where did it fail. So, we’re working on that now. Probably a lot of our competitors and people in the industry are also working on the same problems and creating more valuable features.

“We had a pretty fast development cycle and got it out fast and got a lot of great customer feedback and that helps us decide where to go next. That’s the honest take on where things are at.”

There’s so much hype around ChatGPT now. How do you deal with that while trying to temper customer expectations around your product? “Our actual strategy here is to try to be scrupulously honest about our expectations. We feel we can differentiate from the wave of hype by being honest.”

How did you go about getting your customers to opt into using your new ChatGPT-powered bot? “At Intercom we have more than 25,000 customers. And, to a lot of those we say, ‘Hey, we’ve got something in beta. Would you like to opt into it?’ Some customers have just opted in and said in general they’re willing to use early software. And some customers won’t. Some customers are very risk averse enterprises, like a bank, and they don’t want to be part of the beta program.

“When we do have new software, we’ll send them a message to recruit for the beta. Our [project manager] for this set the campaign live and then had to pause it like after five minutes because her inbox was flooding with so much excitement about this. So, that’s what we did. We recruited a couple hundred customers to beta test this in mid-January. We said you have to click here to opt-in and use the API to process your data. Then customers did that and we turned the features on for them.”

“Then we started looking at the telemetry the next day to see where people using this and is it working for them? Then, and this is how we normally run a beta at Intercom, we reached out to them and said, ‘Hey, can we get your feedback from this. We’d like to know if you found this valuable.’ And some customers were generous enough to give us actual quotes after a couple weeks that we featured on our blog.

“Again, we wanted to differentiate from all the hype. So many startups are creating just a landing page that’s just a thin skin over ChapGPT, and we were like we’ve actually got a product and these are actual customer quotes on our website to show it’s something real.”