The Hacker Mind Podcast: Hacking Behavioral Biometrics

Robert Vamosi
November 18, 2021
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

AI is almost good enough at simulating human activity to defeat the biometric systems designed to fight fraud, effectively putting us back at square one.

IIain Paterson and Justin Macorin join The Hacker Mind podcast to share insights from their SecTor 2021 talk on hacking behavioral biometrics. If an adversarial actor wants to simulate user behavior, that actor can use techniques similar to those that a behavioral biometrics firm would use to detect abnormal usage. The researchers predict that soon it'll be hard to tell a human user at the keyboard, or at the mouse, from a bot or AI-driven entity.

The Hacker Mind is available on all podcast platforms.

[Heads Up: This transcription was autogenerated, so there may be errors.]

Vamosi: How do we know who’s on the other end of a connection, who it is that is logging into a computer or an account online? A lot of times we depend on usernames and passwords, but those really aren’t enough. So we include other telemetry that seeks to authenticate that the entity logging in is who they say they are. Such as their MAC address, their IP address, their geolocation. We’re also toward uniquely human features such as someone’s face or someone’s fingerprint or how fast someone types. That's biometrics.

In both of my books, I’ve taken a stand against biometrics as they are today. I’m just not convinced that a fingerprint or an image of my face is secure enough. And I’ve talked to enough people throughout the years who have shown me how to defeat such systems. 

Take for example, that German candy, Gummy Bears. Turns out you can melt that down and use the sugary resin to simulate someone’s fingerprint. There’s a whole Myth Busters episode where they walk throughways to recreate fingerprints.  Vendors though have gotten wise, and some now measure for liveness -- temperature, moisture, etc -- but a lot of these fingerprint systems in the wild can still be fooled today with something as simple as a piece of clear tape.

There’s also facial recognition, where high quality photos of the person you’re trying to intimate can unlock your phone or open a door. Again, there are now extra measure for liveness--in theory a static photo won’t suffice, but in some cases you can just curve the photo and it would simulate the shape of a head. Once again you see how such a simple method can defeat biometrics.

Yeah, I’m a bona fide cynic.  So of course when I saw that some researchers were presenting a talk at SecTor 2021 in Toronto on defeating biometrics with artificial intelligence, well I knew I had to talk to them as well.  And in a moment you’ll hear what they have to say.

[Music]

Vamosi: Welcome to The Hacker Mind, an original podcast from ForAllSecure. It’s about challenging our expectations about the people who hack for a living. 

I’m Robert Vamosi, and in this episode I’m not just throwing more cold water on biometrics, I’m also discussing the scary but very real new world of machine learning and even AI being used by adversaries to simulate human behavior in order to defeat behavioral biometrics and gain access to systems.

[MUSIC]

Vamosi: Throughout all of Shakespeare’s works there’s mistaken identity. Think about it. Characters would travel to a foreign country and adopt a new name, a new past, and carry on. Some with good reason, some just to grift more.  Without a basic ability to authenticate these characters, there’d be no drama, no romance, no tragedy. So the ability to uniquely identify someone is really interesting.  And important.

With computers and devices, there’s another side of this -- can you identify that there’s a human being on the other side of the keyboard. Alan Turing questioned this in the 1950s and thus we have the Turing test, or what was originally known then as the imitation game. The idea was to see whether a computer could possess a level of artificial intelligence that can mimic human responses under specific conditions.

Now, if you can identify there’s a human, is that human is who they say they are? In other words, if a criminal gets ahold of your credentials, could they imitate you? Yes. To a point. If you just use username and passwords-- well that’s easily imitated. So that’s why you need multi factor authentication. 

Multi factor authentication is where you don’t just rely upon one method, you layer it. In security we traditionally define the different factors of authentication as something you know, so that could be the username and password, or an answer to a security question. Then there’s something you have, say, a dongle, or a chip within your physical credit card. And then there’s something you are, something that is biological and uniquely you. That’s biometrics and then there’s a subset known as behavioral biometrics, which looks at the unique ways you as human interact with a machine.. 

To do behavioral biometrics correctly, systems must be good at capturing unique biological identifiers. And that’s what anti-fraud system look at-- these behavioral identifiers. According to some researchers, however, some machine learning algorithms and AI system today are so good at capturing these nuances that they are being able to  counterfeit these biological identifiers online and defeat the anti-fraud systems we may have in place. And that’s not good

Macorin: My name is Justin Macorin cybersecurity research.

Paterson: And I am Iain Paterson. I'm a cyber security professional, I guess. I've been doing this for many years. My current role is CEO of a cybersecurity firm out of Toronto called Satkhira Secure. We would categorize ourselves as a firm that specializes in technical consulting, we would categorize ourselves as an offensive security company.

Vamosi: Justin and Iain gave a talk at this year’s SecTor entitled Behavioral Biometrics - Attack Of The Humanoid.  It’s about how Machine learning and AI can be used to defeat behavioral biometrics. So, to begin, what is Biometrics exactly?

Macorin: Biometrics is is an umbrella term right used to define how we can identify. Humans, through different attributes, and you know we have different attributes like fingerprints, iris things and backup your hand, facial recognition features way we walk EKG and all this kind of stuff. So that's, that's biometrics right. And the focus, you know, at least what my focus is, it's mostly in behavioral biometrics, and that's the way people interact with machines. The way that is so, so we have two hands, and typically we use those hands and those fingers to press buttons on machines like like keyboard like a mouse, we move it right, and we phone it tap it and you swipe and you do all these things. And, and that's really where where my, my area of interests lies.

Vamosi: Then there’s the one-off idea--the crazy thing that someone takes off and everyone’s using it.

Paterson:  You left out. Those of us who like to lick our phone screen. We laugh now that Samsung is making that a feature on like the x 25 When it comes out or anything like that's that's gonna be a thing. I wonder if

Macorin: I don't know how I feel with that, with COVID.

Paterson: It's probably not a good idea but I'm just saying. It wouldn't shock me.

Vamosi: Seriously, what is behavioral biometrics? I mean, what is biometrics?

Paterson: I wouldn't add much to what Justin said there. Biometrics really is a thing that you are. If we think of the different facets of, of, of access or authentication, right. There's what you know there's what you are right and what you have. And so, biometrics really falls into that. There are some of them, that can be time and location driven, such as your geolocation right or what time of day it is. And then you can marry those together to create profiles. 

Vamosi: What Iain’s talking about is that when you step up to an ATM, for example, the bank can …

Paterson: But by and large, Justin really nailed it and biometric really is the human side of it. You know the biological side of it. And the behavioral part really is how the human act interfaces with machines, or how a machine interprets human interface and interaction and that is slightly different. And part of what we do get into them to talk like intent is actually an interesting part of biometrics and whether or not you intend to interface with the machine is one of the considerations of biometrics and actually one of the ways that you see biometrics being attacked or bypassed, is you know, through unintentional interaction with machines that use biometrics as an authentication process 

Vamosi: And, as I mentioned, we’ve already seen some real world failures of biometrics. 

Paterson: Well, one of my colleagues really got it out of Germany, Tomas Ross. So he's, like a world renowned hacker he's very, very talented boy. And one of his roommates, I can't remember his real big kid name but he goes by I think it's like Starscream, or something like that on Twitter, he was, he was the one who published the iris attack against the, the Samsung phones for example and that was like, within a week of them releasing that that functionality, right, like, and it's, you know, it turned, Like, it turns out to be a black and white printed photo that had had, like, you know, some, some, I don't know like a little bit of paste washed across it so it was just like fuzzy enough that you know like the phone was like oh yeah I think that's that guy you know like the phone squinting like looking at this black and white photo or holding up like it was, it was actually the opposite of what you expect, you're like, Oh, you're gonna have to build like a 3d model of the guy's head and it's gonna have to be super precise he's like No, actually we made it like really shitty and suddenly it works. Yeah, because you know it I guess it falls back on, while maybe they're in a low light situation or something like that, right, like you can unlock your phone in the dark sort of thing with this. So, examples of where we build in failsafe systems are another interesting example right and, you know, I think that might tie back into the question you asked earlier, Rob, which is like, I don't know what if you break your hand, you know, like you break a finger or something like that is the system going to allow for the fact that you're typing with a bit of a limp, you know, to let you in so similar thing thereIAIN

[MUSIC]

Vamosi: Ah, ha, so here’s the part of the story that I want to know about-- how do we attack these systems. Perhaps we should further define what we’re trying to attack. For example, what of us is measurable by a machine?

Paterson: So, we use biometrics primarily as an authentication mechanism, trying to think of other use cases but largely the way that we have preferred to use it has been to unlock things right, it is a physical human key. And so the ways that we're seeing it used, you know, just in the ride off a bunch of the different biometric elements of human interaction and how those get applied into the unlocking. You know you've got banks right now we're really big on Voiceprint. If you've registered for online banking over telephone banking, then they're using Voiceprint technology to pre-authenticate you when you call into the system. So, that way the customer service representative at the other end knows with non repudiation. And I use that term loosely but, you know, repudiation, non repudiation being that we can trust that the original sender and the message are authentic, right, that's not a creation.  

Vamosi: NIST defines non-repudiation as assurance that the sender of information is provided with proof of delivery and the recipient is provided with proof of the sender's identity, so neither can later deny having processed the information.

Paterson: So, with a level of non repudiation of the person who's calling in to access the account information or make changes to the account is the person who they say they are. And so Voiceprint technology has been, you know, a real game changer in the authentication and validation of customers, customers at the CSR level in banks, because they can afford the technology, to be quite frank, and also just because there's a call volumes. 

Vamosi: Ever call customer service and when you finally reach a human being, they address you by name. Certainly they use your mobile number or if you logged in you account information, but they also listen to your voice as you navigate the call center questions, authenticating your voice from the last time you called. You know, this call maybe recorded for monitoring purposes--they’d seeking to create a unique voiceprint for you.

Paterson: There's the other obvious ones that Justin touched on fingerprint, you know, everything's got a fingerprint scanner built into these days, Iris. We've seen that implemented in some phones. Once upon a time, facial unlock, and then the 3d kind of take on that. There's a bunch of different technologies there. 

Vamosi: I should probably mention that we’re going to be talking about both the physical fingerprint on your hand and also the fingerprint of your internet browser or your address. These are two different things, biological and mechanical, yet they both seek to find unique characteristics that could be used to uniquely identify you out of the billions of people in the world today.

Paterson:  What's interesting is one of those technologies is built around common libraries, but then the implementation is different so there's a bunch of companies doing it, kind of their own spin on it but they're largely leveraging one or two common libraries. And then there's some more boutique ones. You know Justin mentioned like the, the vein analysis, that's one that we implement at our office in Toronto, to get into our lab, we have like one of those machines that scans back your hand. And then you get into, like, data analysis, which is interesting. So, secure facilities like data centers or you get into government facilities that are controlling do DoD type facilities. 

Vamosi: Here again, by scanning your palm there’s a unique collection of veins. Not even twins have the exact same vein patterns. That requires you to submit you palm, or your finger, or your iris. Are there more passive means of biometrics?

Paterson: If they want to validate that the person walking down the hall should be walking down that hall and have access to the room that's coming up. Then gait analysis can be applied to video feeds to analyze how that person moves, and also. So identify and authenticate that person as they get up to the door that they're coming to. 

Vamosi: So when you start to combine these, like gait and facial recognition, you get these Black Mirror episodes where someone walking down the street on a public sidewalk can be uniquely identified. Doesn’t that just creep you out? 

Paterson: So those are some of them and then there's a couple other interesting ones like EKG. And, you know like I have, like, kind of a cool case study that I kind of whipped up when I was in, in healthcare, that would have been applicable there so maybe talk about that later 

Vamosi: We’ll get to the EKG stuff a bit later in the episode. For now, let’s stay with stuff you might not realize in say your office. There’s also keyboard metrics such as how fast you type, even what type of keyboard you use.

Macorin: So if we take a look at a keyboard, you know, the way that we type keys and the way that we type sequence of keys together, common words like the th e, you know, we're all going to type these kinds of, you know words differently and, and if we take a look at like longer words, we're going to type those also very differently and if we take a look at like email addresses for example. It's a guarantee that I'm going to type your email address, very differently than you would just because you have that muscle memory built into it because you've typed it so many times. 

Vamosi: This is cool. For example, there might be two letters that are typed quickly, but the others spaced out. Or a string of adjacent letters (if your name maps nicely to a QWERTY keyboard) that you might type very quickly. The thing is, some else can type your email address, but only you do it the correct way, so the system can recognize it’s you. How accurate that is depends on the amount of data you can collect. IF the system is light on data, it might be easily fooled. But you are collecting large amounts of biometric data continuously, then the authentication can be very, very good 

Macorin: So, so when it comes to like keyboard analysis. You know, I think that the ability to use keyboard analysis or like multi factor authentication is something that that's very doable. And you don't require, you know, a wild amount of data to conduct that multi factor authentication, when it comes to like that. Multifactor a what's, what's your email, what's your password. When, when it comes to like more of a continuous authentication where we're actually constantly monitoring, who's using that keyboard, that's when you need to start collecting a lot data. Just because won't be able to, to, to match those patterns properly with the machine learning models that they use. So that's that's mostly on the keyboard side and, and once again like the way that you would capture that keyboard activity is also is will also yield different results, and will yield different accuracy. 

Vamosi: In my book, When Gadgets Betray Us, I talked with Dr. Neil Krawetz about keyboard analysis based on a presentation he gave at Black HAt in 2006. He said he was able to look at a chat log, and just on the basis of random typing, or drumming of random letters, he could determine a person’s handedness. What? He could see that when striking randomly, they were more likely to type on one side of the keyboard than the other. You could also use frequency to determine which side of the keyboard was quicker than the other. Further, Krawetz could determine based on the keyboard drumming direction, inside out or outside in, the likelihood that that person played a musical instrument.   Today we can get even more granular with behavioral biometrics. The accuracy of these predictions depends largely on the accuracy of the capture, which is what Justin is talking about.. 

Macorin: So if you capture keyboard activity using an OS hook. You know something that helps write into the operating system like Windows or Mac or Linux, the accuracy there is might be pretty good. If you use something like JavaScript, you know, now, now we're more dependent on the browser to make those timestamp of event, you know, and, and as a result, you know, if your computer's running really high with CPU, there might be significant place there. 

Vamosi: Good point. If your computer is making noise, if you’re rendering a video or compiling code, this could alter the accuracy of the logging events on your machine.

Macorin: Another thing you know another problem with JavaScript is that you have browsers like Firefox. And, and I think that there's a real big push today towards enabling a more private browsing experience. And as a result, a lot of these browsers are starting to implement, you know, anti fingerprinting techniques. And these anti fingerprinting techniques are common if we take a look at like Canvas fingerprinting you know that there are browsers out there that just stop that kind of stuff.

Vamosi: in my book with Kevin Mitnick, The Art of Invisibility, I talk a lot about the Canvas technology that allows marketeers to take that search term used in Google on your phone and link to your desktop with ads. So you look up a winter coat on your mobile device on the train and then when you’re at home on your desktop you see ads for winter coats. There are ways to turn that off--you can add extensions to your browser for example. These anti-fingerprinting techniques can also mess up the behavioral biometric collection.

Macorin:  What's, what's coming out today is anti fingerprinting techniques that actually round up the timestamp to these keyboard events and and what that does is it basically makes it very difficult to monitor. Who's typing out a keyboard because all those events are being bound up in, and a machine learning model just can't can't get enough data to make those predictions, that it's fairly similar when it comes to mouse. The only caveat with mouse, is that you need, like significant amounts of data. And it's also less accurate than keyboard.  

Vamosi: By now you’ve probably taken your hands off the keyboard and will think twice about how you type going forward. It’s an interesting world--behavioral biometrics. What an we change and what can’t we change. Think of Kiasar Soza at the end of the Usual Suspects when he walks out of the police station and changes his gait as me moves further from the camera. I would argue that you can keep this up for a little bit but in the end you will revert to your natural gait. There’s always a tell, such as particular keys on the keyboard that you always--always--strike the same way.  

Macorin:  it's absolutely it's very very relevant because those those those, those special character keys you know like the Ctrl Shift Alt keys, the catwalk beats you know, those are all very good indicators of who's using that computer. And, you know, they are definitely used in in the behavioral biometrics realm. When it comes to keyboard, monitor, there are very, very strong indicators of who's using that. 

[MUSIC]

Vamosi: Another area where I’ve seen behavioral biometrics used is in the gaming industry. Say you want to take over someone’s account. On the internet no one knows you are a dog, but actually they do know something about you. If you’re based in Oklahoma and suddenly you are using an IP address out of South Korea, or if you are more hesitant to move, or move too quickly--these changes all get flagged. You maybe challenged to provide more credentials. Or locked out. 

Paterson: Yeah, so you mentioned the gaming industry so we have seen anti-cheat implementation, right, that has become very common in gaming engines and you're seeing it deployed sometimes like at the individual game level or you're seeing like kind of a uniform anti-cheat system being deployed. Steam has one for example, think epics got one. So it's like an SDK essentially that game developers can build into, into their games or layer on top of their games to identify bought type behavior, and less than one the keyboard side of things but absolutely on the mouse input side of things they're looking for mouse movements where it, The mass movements too rapid and too precise, for a human to really do that. And so you've got, then the hackers hackers. In this case, the cheaters who are developing these cheat. Cheat tools right largely a lot type type tools that are trying to introduce randomization of mouse movement into those engines so that they don't get snapped by the anti cheat, engines, it becomes a cat and mouse game. 

Vamosi: So it’s an arms race -- you develop ant-cheat method, they develop a system to defeat it. Hmmm. Here have we heard that before?

Paterson: The analysis of what's going on with the mouse is only a part of it, they're also looking at system hooks, they're looking at processes that are being injected they're looking at file sizes on the machines all these things to try and counteract the cheating stuff, but on the biometric side of it absolutely as user input that they're trying to look at and saying, you move the mouse in an exactly straight line to that guy's head, you know like to, you know, or to the hitbox of that user. And they can also spot like randomization, that's not truly random because the entropy of these things can't be all that random because you're always going for the hitbox right if the user has like a consistent 94% hit rate across 10 rounds, and you have a pretty good idea that this person is cheating. So, And that can be analyzed through the biometric input. 

Vamosi: So here’s the intersection with their talk at SecTor. Given that humans are erratic and capable of infinite randomness, can a machine ever hope to approach that? Here is the crux of their presentation at SecTor

Macorin: Yeah, no, he had mentioned randomness and like he hit the nail on the head there, it's it's really, you know, machines today. I think we are getting better at simulating human behavior. And, and I think we're gonna get to a point where, what Ian just described with. With these, you know, interfaces, trying to, to maybe prevent or or try to detect abnormal usage, I think that might start coming to an end.

Vamosi: What's the difference between AI and ML?

Paterson: Stealing a saying, if you have a PowerPoint deck, then you have machine learning and each have Python you have AI, I think that's the difference. Just as the expert, that's just a bad joke that I really enjoy around there.

Macorin: Yeah, well, you know, I think that AI is really used as an umbrella term to, you know, talk about machines that make decisions, and a decision on a computer can be really simple right, it could be like lipstick. And to be honest with you, like everybody has their own different idea or definition of what AI is but at its core, what I believe is that artificial intelligence are just computers making decisions and those decisions can be very complex so they can also be very simple. 

Vamosi: Then what is Machine learning. 

Macorin: When we're talking about machine learning, though. Now we're actually talking about models and and these models can be different kinds of models right you have gradient boosting we have RNNs we have deep neural networks we have all kinds of different things, and that's where I like to kind of like draw the line. Where, okay, AI is an umbrella term. What do you mean by AI? Right so, that's a question that I typically ask when someone says, oh we use AI. What do you mean, use AI, like, like, what kind of AI. And then, machine learning within the machine learning space. Now you're talking about many different kinds of bottles and all these kinds of models are used for different things and there are some models that are used for you know natural language processing to convert speech to text, and there's other kinds of models that are meant for like image recognition, and there are other kinds of models that are really good at like predicting tabular data, and there are other kinds of models that are really good at predicting time series data. So when we talk about machine learning, you know, we're talking about a bunch of different models that are used to predict activities or generate activities based on historical events.

Vamosi: So machine learning can be thought of a templates, models, that you employ for specific goals.  So are there systems that are designed to defeat bio?

Macorin: Yeah. So systems that are designed to defeat biometrics, maybe, I don't know, Ian are, are you familiar with any ways that people are. Well, first of all, those were people actually using biometrics in general. I mean maybe I could dive deep into, you know how adversarial machine learning can be used for behavioral biometrics.

 Paterson:  Sure, yeah, I mean, if we think about our daily use of most websites, you usually are, you know any any of the big sites you are usually running up against some sort of CAPTCHA, whether or not it actually jumps out and gets in your face or not, is based on a bunch of behavioral things that are happening on your machine. That may or may not include your input. 

Vamosi: So we have some real world examples of this.

Paterson:  One of them, for example, a well known example is Ticketmaster claimed to try and stop scalpers and bots from gobbling up all the tickets and reselling them. I don't really believe that they do that but they, they do claim to have technology in place and use that technology aggressively, someone of that technology is looking for the user behavior on the web page right so when you're on the web page there is actually JavaScript running in the background that is capturing mouse input mouse movement and things like that and looking for you jumping from one spot to another. Right, so using a bot to fill fields and fill forms, right. That is one of the most common ways that technology is being used to defeat. You know, the intended use of these sites is by a normal person filling in fields and buying tickets. Other major sites are using it to like eBay to stop people from sniping auctions and stuff like that. And then it's become kind of a, I don't want to say commoditize but a fairly common security feature for people to put it in place. Similar systems in place in front of other sites that have like rewards programs or coupons or anything like that, because you want to stop abuse you want to stop the ability for automation to that defeats the human interface, you know, intended human input on the site from either, you know, defrauding the site, giving the user ability like stuff credentials or stuff, you know, try to test accounts multiple times, all those things so that's really the most common implementation that we see right now on the web is those, you know, botnet, defeating last type of implementations where you have some sort of layer of usually JavaScript running on a site that that's interacting with the user inputs, Justin, do you want to add on that or,

Vamosi: Okay, so there’s bot busting javascript. So there’s stuff running int eh background to keep the criminals from using automation to intimate humans, to steal our Ticketmaster tickets.

Macorin:  yeah, yeah and and you know so, so we got those kinds of solutions running in the background, or at least large organizations will like Ticketmaster because you know they say that they don't like scalpers. But there's a real strong push today with, with, you know, organized crime and nation states that bypass this kind of stuff. And that's just the reality. You know, so, what's, what's happening is that a more advanced implementations of this kind of technology which is behavioral biometrics are taking a look at how humans interact with computers, you know, they're, they're detecting if if certain behavior is human and if other kind of behavior is not if, if an. If an adversarial actor wants to simulate user behavior. They use very similar techniques that a behavioral biometrics firm would use to detect abnormal usage. 

Vamosi: This is not the first time we’ve seen security tools used against a victim. 

Macorin:  And typically what happens is, if a behavioral biometrics firm were to come in and say hey you know, we're going to protect you with account takeover and you know, we're going to do this kind of stuff. There's like a training period that's required, depending on what it is.

Vamosi: Right. Machine learning, not machine figuring it out on your own, which would be AI. Machine Learning has to be taught, over and over, until a pattern is formed.

Macorin:   So if it's multi-factor, often the user needs to enter their email and password a few times, you know, 345-1012 times depending on which machine learning model they got. If it's continuous authentication. Now maybe user needs to use a computer for a long period of time and when I say long it's very much usage based, so it could take a few days, it could take a few weeks, who knows, it really depends on what machine learning model they're running. 

Vamosi: The same, then, is true with attackers-- they need to train their systems as well.

Macorin:  However, as an attacker, You can use the same techniques to simulate human behavior. So, if I want to simulate my own typing for example what I would do is if there's something that I have done. You know I set up a keylogger on my computer and I record every single key, along with a timestamp associated. 

Vamosi: HEre’s an example where a good acto-- a researcher --r is using a bad actor’s tradecraft -- a keylogger-- to learn more about the criminal hackers out there.

Macorin:  And what that gives me over a period of weeks, or well over, you know, over a period of one week. It gives me a really good typing pattern, and it allows me to really see, you know how I type on my computer. How I type different kinds of words, how I press the shift key, how long I press the shift key for, and I do the same thing what I miss is how I move my notes, right. And what I can do is I can feed that kind of data into a machine learning model and output simulated key activity with the same kind of, you know, tempo and rhythm that a real human meat would use.

Vamosi: [PAUSE}  Okay, that’s creepy, and yet ridiculously simple. This all is beginning to sound like a classic arms race.

Macorin:  that's ultimately where it's going to go. We've seen machine learning platforms being used by adversaries in order to automate. Let's say phishing campaigns like very compelling phishing campaigns, targeted ones. Bought ads that are being used on social media sites, and on, you know, professional networking sites like LinkedIn, in order to target specific groups, specific actors, specific individuals or specific groups of individuals, I should say. And those are all being done through natural language processing, And, you know, automation, using machine learning so that's already happened. 

Vamosi: So far we’ve talked about machine learning how we type and behave online, but this has potential as well in other areas. Such as training systems to automatically detect malware.

Macorin:  No doubt, they're also using machine learning for other purposes that you know we would use on the security research side like exploit development and research right so if that's happening, and we're going to move towards machine learning as part of the authentication process, then it only tracks if your adversary is also going to take that technology and look to reverse engineer how you're using it and figure out how to use it against you. Now, it's going to take them some time and there's going to have to be weaknesses in the implementations and all those things that factor into, you know, normal adversarial offensive conditions behavior, but we'll get there. And soon it'll be really hard to tell. A user at the keyboard, or at the mouse from a bot or, you know, AI, ml driven. I guess entities at the keyboard and mouse, because of the use of machine learning from the adversarial side. 

Vamosi: BOOM! That’s incredible to think that a machine could replicate human behavior online to conduct say account takeovers at financial institutions, a machine could emulate human randomness to the point where the anti-fraud systems wouldn’t necessarily pick up on them. And this is starting to happen today, with voiceprint technology.

Paterson:  That's where we will eventually end up. It's not going to apply for every type of authentication, but they will eventually be able to emulate a lot of it, especially anything that's human inputs. So I'm sorry. No, just to go back to the example I used earlier with the Voiceprint thing. We had somebody reach out to us and actually asked us. Could you under some circumstance, proof that the voice on a recording is legitimate. And we reached out to some companies that work in that space right who do Voiceprint and we're like, Would you trust your system. If this was for a court case, right, because this was the initial inquiry. Would you trust your system to potentially validate that this is the person who, you know, we believe it is on the recording. Nobody wanted to like put their system on the line for that particular purpose. I don't blame them for not wanting to take that risk but, You know, we've seen the power of that system that Adobe has right and that can basically take a couple voice clips and then create a very compelling. You know, natural speech emulation so there's another example of that system where it. I don't know if it would hold up. Right, I'm curious I was calling my bank and tested myself just to see if it if it forced them, you know

Vamosi: What he’s talking about is something called Adobe Voco is an unreleased audio editing and generating prototype software by Adobe that enables novel editing and generation of audio. Dubbed "Photoshop-for-voice", it was first previewed at the Adobe MAX event in November 2016, but is not commercially available. To perform such a “photoshop for voice” remains a very expensive operation. For the moment, the average JOE doesn’t have to worry about these attacks. Yet.

Paterson:  Yeah, I think that early on. We're probably gonna see very targeted attacks, And you're right, you know, these, these models need to be trained in the cloud, you know, like it needs a lot of memory and GPU and all that stuff. However, once a model is trained, it can be pretty small, right, and it could be deployed on a flying device with with a relatively low memory footprint. So so that's that's one area of concern. 

Macorin:  I think that over time, too. You know there are going to be problems that that come to the surface where maybe over time what we do is we collect enough data over a period of time to say hey, you know, based on the first few keystrokes that this user is inputting, we're able to determine that this is kind of his tap typing pattern. Overall, without the need to collect one week's worth of data for that one user, just because we've already collected so much data from so many other users that now we're able to make those kinds of predictions. So, everything when it comes to machine learning, everything is related to data, and everything is related to the amount of data you can collect, and the amount of time it takes to collect and how expensive it is to collect, and early on I think it's going to get, it's going to be expensive and over time I think those costs are going to decrease and it's going to be more commoditized.

Vamosi: So you could have an internet of things device, small, simple, dedicated to learning how you type, how you speak. You could have a device that routine answers spam calls or responds to emails -- without your direct interaction. 

Macorin: Oh, I'll share a little bit more about, you know, what may or may be a fun story that's going to happen in that sector. The demo that we're going to be showing is a live adversarial machine learning, on, on one of the largest multi factor authentication providers out there, just to show you know that this is happening today. This is not something that's far fetched. This is not something that you know is going to happen a decade from now like adversarial machine learning models are here today, and it's here to stay, and they're only going to get better with time. And it's really really important for for organizations, you know, that are implementing like, like, I feel that a lot of people use AI, and they label, they use it as a blanket term to put over all these different things. And at the end of the day, even when you are using a, you know, it doesn't  mean it's going to be more secure system.  

[Music]

Vamosi: So there could be devices that our fingerprints, have our iris scans, have our voiceprints. What are we to do about that?

Paterson:  Is the problem that once you have that biometric fingerprint. It is unique to the human you can't revoke it, and this is the problem with biometrics, right, like I can't train myself to type differently are good but it'd be very awkward, right, and you know to use the mouse differently, Just the way that I can't change my irises and I can't judge by that was going on seven on you, sharing them off. So, these are some of the challenges around biometric authentication that we as an industry have to think about. And, you know, if we get to the point as Justin said where based on five to 10 keystrokes, you can very quickly ascertain how our users going to type and make a predictive model around that with like, I don't know 95% certainty which would probably be good enough for the system, right, because you don't want these things being so false positive written that they're always chirping, then you're back to square. So.

Vamosi: People do change, however. We’re not steady state. We have accidents, which could change our faces, or our fingerprints. Or just age, naturally, and deviate just slightly from the 20 year old version of ourselves captured in 1s and 0s.

Macorin:  Yeah, I think that, you know, when when when it comes time to implementing these systems I think it's really important to always you know, be very data focused and not only to rely on, on the first, first amounts of data collected to build the model but also maybe to continuous continuously improve that model, over time, right, over a period of weeks, months, years and over time to maybe that initial data is off, and it's okay to dismiss it. If those variations in data aren't too far off right at the end of the day. I don't know if people age, like, I'm not an aging expert by any means, but as people age I'd, I would assume that people do so slow. Over time, and, and you wouldn't cause too much of a variation. I haven't given that, you know, an incredible amount of thought, I'm, I'm not sure, you know, if you have any thoughts about this whole aging thing and, and, you know, psychology behind this.

Paterson: No, I will say like there is absolute, like validity in what you're asking, Rob. So in the keystroke analysis stuff. I'm honestly not too sure but you're right, like, you know, I don't know if people would eat slow down, probably over time right like just the dexterity your hand starts to go and I think it might not even be age for some of us it's carpal tunnel because we've been stuck a key or it's for 20 years, right, this is the this is the new minors lung or something like that you know we've all got RSI. But no, for example like the gait analysis stuff I was talking about absolutely somebody gets a hip, knee replacement right it's going to change their gait permanently and you're going to have to retrain the model for those individuals. Same goes for, you know, things like people who suffer some sort of facial deformity, you know in an accident or something like that, you're going to have to retrain the model for those individuals for efficient racks so you know they're there, they're not foolproof systems I think some of the interactive ones like the keystroke and mouse and stuff like that, I think to your point, what you'll see is more of a gradual degradation of it. So the question then is, how frequently do you renew, essentially the initial key, right that you create or the initial hash value or whatever, however you want to describe it, the seed that was used to fingerprint that person. How frequently should you update that, you know, should it be, maybe once a year, sort of thing, because you don't want to do it too frequently.because if you do then eventually have an attacker or some, you know, some, or some other user that's able to emulate that user, You could end up training it that the adversary is the real user, if you do it too frequently.

Vamosi: Yeah, so sometimes you’re opening the door for an emulation attack by having a user frequently update their settings. Maybe there should be some standard deviation model so that if an attacker did try this, it would be discounted or at least challenged further because it didn’t fit the aging curve that had been modeled out.

Paterson: But yeah, some sort of true up of the, of the user against that initial seed over time certainly wouldn't make sense and systems like these and we don't take that into account these days with traditional authentication systems we just trust the user is the user forever. In most systems right. You know, even most identity and access management practices, it's like, it's a, there's a validation that the user maybe still works with the corporation, right, but if the person does continue to log in every day. Then there usually isn't some sort of need for the station because the account hasn't aged out. You know if anything people become fearful of disabling accounts and be logging in every day for 20 years because it might break something. So, you know, that's something that we don't do today with traditional authentication with the biometric behavioral based authentication, we do need to do it to factor in for what  you've described there.

[MUSIC]

Vamosi: Use case?

Paterson:  I have a cool use case that, you know, that I ran across several years ago. So there's a company out of Toronto that was developing a biometric a EKG band right like wearable, just like your Fitbit sort of thing. And the purpose of it though was authentication and I thought this was amazing. I was working at the time in a hospital setting. And authentications a huge problem in those settings for a variety of reasons you have swivel chair, access to multiple systems you have, you know, a nursing staff that's on continuous rounds throughout the building and meaningful log in at different terminals across the place. You know you got volunteers, you've got different shift schedules, etc etc so the person sitting at a keyboard at any given time, your ability to ascertain that that person is who they say they are is like very low compared to most traditional nine to five type of shots where somebody has an assigned keyboard and mouse, you know, and a huge part of the problem that we were trying to solve for always was the cost of doing password resets. 

Vamosi: So here’s an example of where to do good security, you need strong passwords, but sometimes people just forget or get confused or … whatever. So company IT departments spend a lot of time doing password resets, which, if you’re on the receiving end of that ticket, is not interesting work. And it’s costing the company a lot of money. So what if there was a better way to authenticate the user?

Paterson:  So, you know, that organization had a service desk of say 15 people, and the metrics that we were tracking showed us that 50% of the time, that those 15 people spent was dedicated to password resets, right, the math to that is terrible when it comes to the cops. And so I started looking at these alternative methods of authentication that could potentially work in a hospital setting, fingerprint scanners at the time were not a common item like it's still hard to get a keyboard with one built in these days that I'm aware of is on a laptop. We didn't want to pull our laptops through the building because it's too easy to steal and things like that, you know they age out quick. Or get damaged. The other thing too is we had tried some fingerprint scanners like the USB type standalone ones at one one point in time, and we found that a certain percentage of the population they didn't work very well with. 

Vamosi: Oh, yeah, this is a dirty little secret about biometric solutions: they don’t always work with every ethnic type in the world -- be it skin color or even the density of the skin. Often it’s not until these things get out in the real world that manufacturers realize that they just didn’t test it on enough people.

Paterson:  So this was interesting. We found that certain part of the nursing population, primarily people from like an Asian heritage. The fingerprint scanners wouldn't recognize them properly. And we don't really know why it was just like a common trait, and we think, I don't know, something to do with like skin density or something like that right like on their hands or something like that but it just didn't work right for that particular segment of the nursing which is a big segment of nursing population so it's not like we hadn't tried some stuff. But the other problem with that too is just like, like sterility of those scanners and stuff like that, like there's people walking around with gloves all day, you don't want to have to take them off necessarily to touch this thing and then suddenly they got to go wash their hands again. No other reasons so,

Vamosi: Right, so you’re wearing gloves to stay sterile and then you have to peel them off to swipe your fingerprint so you can log some information on your laptop. That’s going to get old real quick.

Paterson:   but this this EKG bracelet came out and it would allow for authentication to devices continuously over wireless protocols, as long as you were wearing the band and you were in proximity to it, and I thought that was a really cool solution. The band is tied to the user. And authenticates the user based on the EKG which is unique. So if somebody steals your band, and puts it on his document send a kid to you because you don't have to see an EKG is that person. And then your ability to authenticate to the machine is based on proximity. Obviously you'd have to try and manage that, you know, the radio frequency challenges and proximity of machine you want to make sure, as I went back to earlier when I said intent to interface with device then would have to be established right. 

Vamosi: Right. This is a classic shoulder surfacing situation only in this case it’s electromagnetic. 

Paterson:  You couldn't have somebody standing 50 feet away from the machine, who has authorization to use that device, and I sidle up to the keyboard while you're standing, looking the other way and I log in because they happen to be standing close. That would be a problem so you'd have to work out how to solve for that and make sure that the user has to interface with the machine in some way. So multi factor authentication here, but it, it seemed like a unique offering and way to solve for that password reset problem that was no cost it honestly causing that costing that hospital, you know, probably in the neighborhood of millions of dollars based on some of the math that I had done, you know, over a period of a year or two, based on the number of people you have time lost the, You know downtime on machines or inability access machines that are medical and have billing purposes, all those things right. If the person who's supposed to be running the lab analyzer gets locked out right then that lab analyzer can't do their can't do a thing and that's that's billing for the hospital, so I thought that was a really interesting use case that company, I think it was very early when it started reaching out to them talk to them. From what I've heard later on they went on to work in the pharma space. And so the application there is similar to what I had suggested, and it's people working on the production line in the pharma space so you can use the device established non repudiation that the technician who, you know, put the materials necessarily make the pharmaceutical into the machine and went from machine, a row, you know, part A of the lines are below the line is the same person that's supposed to be there, because their identity is tied to the bracelet which is constantly authenticating them. Right, and I think that's a great application for for that type of that type of device so that's a cool use of behavioral biometrics for authentication purposes, you know, and you could also the behavioral part, you can apply anomaly detection around the way that that user is constantly authenticated to different parts of the process right so if you know that employee, a only ever interfaces with station A B and C, and machines A B and C, and suddenly they're over at machine D for 30 minutes doing something weird, you've got an anomaly and you can investigate that right, are they on a long Coffee Break chatting to somebody or you know, are they actually tampering with the aspirin that you're making. So, you know, that gives you that opportunity to identify unusual behavior, because you have you have that analysis and you can tie them back. Tie the user back to the normal behavior. So that's one example of what I thought was a really cool application of behavioral biometrics and where it could be used for authentication, at a continuous level.

{music}

Vamosi: Best Practices?

Macorin: I think that, you know, based on, on, on the, the Generate like cuz I, I've done like keyboard Generation Next Generation, Based on the generation that I, there are areas of interest that may allow an organization to detect adversarial like repetitive adversarial machine learning attacks, maybe not one, but like, repetitive ones may be able to detect it. So, so that's an area of interest, you know, where, where potentially an AI can, can you know, take a look at another AI and say you know what's happening. But overall I think that having a multi layered approach to cybersecurity is really the only way to really secure ourselves, because no one layer is going to do it. I think that behavioral Biometrics is, is a really nice layer to have, because it does add, you know, security, to systems that are already secure and that already have a good use case, best practices and all this kind of stuff. But, you know, given the right environmental factors. It can be bypassed. So So multi layered approaches for the wind i

Paterson: I work for an offensive security company we don't fix problems we just make problems first. No, I agree with Justin defense in depth, you know, it's been around for a long time, it's, there's a reason people are believers in it. Any good security operations program, certainly is going to take advantage of multiple controls. I do believe that authentication, has seen some good overhauls, in the last decade MFA adoption is great and now we're getting into the use of devices like UB keys and using standards like Fido. Those are terrific, and very hard to defeat systems for authentication for, you know you asked a great question earlier Rob like man on the street, is this something that they should be worried about. Not so much high value, individuals, people who work in the intelligence community and people who are part of, I guess, at risk. I don't know, Like groups, you know like bitter targeted journalists and stuff like that. Yeah, you're starting to get into this kind of stuff is going to become problematic for you and so you need to adjust your threat profile. And the way that you behave to account for the possibility of these things, you know, going back to the Voiceprint one. I see that becoming a problem for example for executives, you know, who may be people who give a lot of presentations and stuff like that so it's really easy for me to get a hold of your voice and, Oh, and to potentially fake to your bank, that I am you, right and then I want to open a new account, or a new card or something like that. That's going to be problematic some of the stuff that we're talking about here, like with the keystroke analysis and mass analysis. It's gonna be a blended approach like a blended attack utilizing other factors to gain access to systems initially and then implement these types of methods to defeat these biometric control systems. You know, there's, there's, or these types of adversarial approaches are going to be used against, as we've talked about the public facing web, you know services that, that we rely on that for sure is going to happen. And so, those types of systems are going to have to step up their game to be more aware and better protected against these things. And then, you know, the high risk individuals who have a particular threat profile, they're gonna have to adopt additional controls that also factor for these types of things.

Vamosi: I’d really like to thank Justin and Iain for coming on the show to talk about behavioral biometrics and how machine learning and artificial intelligence is getting better at simulating human activity online. Which only means that we have to get better at creating our anti-fraud technology. Which means … ah, you know this is cyclical battle don’t you? Nonetheless, we need research such as this to stay ahead of the bad actors. 

Let's keep the conversation going. DM me @Robert Vamosi on Twitter or join me on Reddit or discord. The deets are available at The Hacker Mind 

The Hacker Mind is brought to you every two weeks commercial free by ForAllSecure. 

 For The Hacker Mind, I remain the biometrically skeptical Robert Vamosi

 

 

 

Share this post

Add a Little Mayhem to Your Inbox

Subscribe to our weekly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem