Pen Testing for AI-Created Apps: Updating Your Approach

Presenter:

Transcript:

Hello. Welcome everyone to Houston Con 2025, our second session or our second speaker here in track six. Our speaker is John Dixon. He is presenting for us Pentesting for AI created apps. Updating your testing approach John is the CEO of Byte Whisper Security and internationally recognized cybersecurity leader with 25 plus years of experience. He is the former principal at Denham Group, leading its successful acquisition by coal Fire in 2021.

He is an Air Force veteran, serving as an intelligence and cyber officer with AF, IWC and AF CERT, and he is an active researcher and speaker on the convergence of AI and cyber security. Since 2018. So everybody, please welcome John Dickson.

Thank you, thank you, thank you. Okay, this is going to be fun. First of all, I think this is my third or fourth session. This is one of the most fabulous regional conferences. If now now national conference. So thank you, Michael Farnham and the organizing group for putting this on. What I'm going to do, I hope, I hope, is to really make you think differently about an overused term that is so overwrought with meaning or no meaning.

Now, penetration testing, that it almost invokes that conversation every time you use it. I would talk a little bit about how that changes with AI, and I do hope that you all ask questions. We're not big enough. We don't have enough people here or it's overwhelming. So particularly the folks up front feel like like you want to ask questions.

It's okay. The I mentioned that about penetration testing. I try to use, not use certain words in my vocabulary or expressions. I don't use the word shift left or the expression shift left. I don't use that anymore. There's a bunch of them, and I actually don't use penetration testing anymore because my next it's just been overused.

And I've been in the business for a long time, and I'll explain that. But, got a little bit of background there. I am, I, I'm CISSP since 98, so 4649 is my number. So I'll buy you lunch if you have a lower number than that. I usually never buy people lunch. That was 98. I was an ex Air Force CERT person doing emergency response stuff when there was no EDR or SEM.

It was all like external windows. So, I really know Unix, so did know Unix back in the day, but I've been in the business for a long time, and I've seen this arc of, of the use of the term penetration testing or testing so much. And in the most recent past, it was an app set guy.

So hard core apps, that guy, DevOps person, and really working with the fortune 500 on matters of software risk for the last 15 years mentioned that I've been in AI since 2018. That's probably a little bit of a stretch, but a true story. 2018 I put in a CFP at RSA on AI, and I got it. And then I was like, oh crap, I have to really learn it now.

True story. And it was like a use cases for AI in the enterprise. Like like with no ChatGPT to do it for you. So I to like, learn it in the next year. I got accepted on one. It's pretty funny to how to vet vendor claims about AI 2019, when I was really machine learning. There's really nothing, right?

And again, it was like a self improvement bucket list thing. Learn AI and the way I did it is I got accepted at international level conferences and then oh crap, had to learn it. And that's how I got into it. True story, but I'm happy I did because we started to get a lot of questions about it at denim Group when we were doing denim, doing app assessments.

Well, what about this? What about ML like? Well, we don't have a good answer, so we learned it. So that's me. I did this last night. That's ChatGPT. Yes. That's, put face on the cool. He's second cowboy. That took like two minutes. So I'm also the kind of nerd that loves the the little AI hacks.

But I also have great stories on hallucinations and the badness of AI. For the record, I use it every day, all the time. And I'll just tell you a great OWASp. How many people know os open web applications? Everybody. Almost everybody. If not. Okay, so when we first started by whisper, we got asked to do a training class on OWASp top ten for LMS.

And for those that have ever done curricula developed like presentations and training classes. Very laborious process, right? Very time intensive. So ChatGPT is out there. I said, hey, build me a eight hour training class with an outline and, put it in this format so I can send it to the client as a proposal because they just wanted a straw man.

So I did it. I was like, I felt really smart. I was like, cool, there it is. Bam bam bam bam. I sent it to our CTO, and he's CTO for one reason, because he's far smarter than I am. And he said it back and said, go back and read it closer. And what he had done is to take it off top ten for labs.

And the first six were absolutely right spot on the right ones. But seven, eight, nine and ten were from other OWASp top ten list. They just put him in there. So there was a lesson there, which was great for certain things. But absent of human checking, you're setting yourself up for really embarrassment. And the classic now is we do this, I do it to hey, I need you to put a blurb out there on LinkedIn about this post that another vendor did.

Like the vendors call me and say, hey, could you repost this, this, this vulnerability that just came out on a genetic AI? Our research is out there. I would like you to repost it, which is flattering. So I go and do that. So what's my first tendency is ChatGPT and it almost always gets it wrong. So half the time you can't do that or it'll put another vendor's name in.

That happened one time. Whereas like yeah, it's not that. So okay, so let's talk about penetration testing. I think I was doing my first penetration testing, working on a team in 1997, with a company called Triton Data Systems that no longer exists. And we did a network security test, or I already called a pen test. It looked like a lot of effort, a lot of manual testing.

And we did a lot of dumpster diving and social engineering at the time, which is pretty cool. Like that was all this is, if you remember the movie sneakers, like, oh, we got to do that too. Social engineering, dumpster diving, kind of nasty, kind of dirty. Putting your consultants in harm's way. But that's what it that in that word a lot of network, a lot of network, not a lot of network because that's what existed at the time.

And really what you were doing was trying to define the trust zone and understand everything outside of it. So for those who are old enough, I remember seeing a couple of presentations where they used the old wagon train metaphor. Here's our wagon train, here's all the good guys in here, and here's everything outside of it that's bad. So this is before zero trust.

This is before APIs and connectedness. So you it was pretty straightforward. And but building blocks in spite of that are despite it the definitions of pentesting still varied wildly. Oh. Do you mean like a really a hard core manual pen test? Do you mean like, just run, a scanner? What do we mean? So even in those early days, now, 28, 27 years ago, there was still the first five minutes of every protocol, of every pen test was what do you mean by pen test?

You want this, you want that? Because on the vendor side, we're scoping pen test. And it's like, is that two weeks, three weeks or not at all? And I'll give you an example of one that we did at denim Group. That was pretty cool. It's now it's published so I can talk about it. We did a penetration test of DARPA, a test environment, a DARPA, a very cool like environment.

What they did was it was a test environment, and there was zero test coverage, like no tools. And that's the kind of thing that back then was pretty straightforward, but, like, you're looking for anything but. So the definition of is it a DARPA test or is it am I running tests that existed? If you fast forward to like 2004 ish, that's a, a little bit of a, a time frame where you start to see more functional code shown up on the website, websites.

We had dot. Net, you had JavaScript, Java running on, on websites. You see companies like at stake if you remember them found stone that were like revealing the first injection flaws, then testing or penetration testing or whatever became a little bit app centric. Oh yeah, you can do the network stuff. We know that network stuff being really TCP port configurations and other, you know, patching.

But yeah, I really want you to do, you know, penetration testing of our apps. Okay. So now you start to see things diverge a little bit, and you hear the term assessment used more frequently. The interesting thing I didn't mention about did not mention about penetration tests which exist is there was the early the first 5 to 10 years, people would just do a pen test and get root and that was it, right?

Subscribe to our newsletter

I routed you, I proved I could. So if I rooted you on day one, if I charge you as a vendor, $40,000 for a pen test, if I routed you on day one, that's my pen test, right? I routed you, so then it became a little bit more methodical with with apps. And you had this term assessment where I'm like, it's inferring a little bit more completeness and looking over the entire attack surface a little bit more.

But you still had network and that were, penetration testing became more commoditized. You see the rise of scanners, it kind of look like that. And koalas, others that are out there. And for the record, if you go back to 1997, true story, some of the Apple, some of the network security test or pinch tests that we did were $100,000.

I mean, true story, $100,000 network tests that look like koala scans, you know, like like now are just automated, right? So these things have evolved. The constant is the right side. The definitions themselves still, you know, still wildly differed. And so the first part of every discussion usually is okay. What do you mean by a pen test. And how long is it manual or is it not.

And true story. I mentioned, in the intro that my company got, acquired by coal fire. A lot of people know who coal fire is, like the first three months of that post acquisition discussion is. What do you mean by a pen test? Because they differed. It was almost like a religious debate. It differed wildly between our team and their teams.

So that was interesting at the time. So what what what do you do in a pen test typically, you know, I, I would always say define what threat you're talking about right. What is the perceived threat. Are you talking about the PLA or the Russians or script kiddies. And we'll kind of scope it accordingly. We're going to conduct some kind of reconnaissance footprinting.

We're going to go find out what's out there. Now, a lot of this stuff in in 2025, I would say you just give it to us like we're going to get it anyway. So just give it to us instead of, you know, an outside attacker with no knowledge. I'm an outside attacker with some knowledge because it it it it makes it cheaper, I should say less expensive.

Some scanning manual testing, particularly in the app world. I mentioned the DARPA test. We also did a tremendous amount of testing for one of the major cloud providers. And you know what the test coverage was for automated tools for their environment? Almost zero. So like what we did for every test was the first two days were whiteboarding and then threat modeling to find out where surface area even existed before we ran anything.

So. So the more sophisticated, unique you have environments, the more likely you're going to have to spend more time thinking than scanning. If that makes sense, then exploitation that's fallen out of vogue, by the way. You know, now, I assume if I got have a higher critical, I assume that I could get in in the early days, we'd have to prove that we got in by putting an image or something on somebody's web servers.

And in our case, it training data systems. It was a Barney image. We had a, you know, a tilde Barney dot jpeg. We put on everybody's root directory in their web server just to prove that we could do it. Now, the chance of disruption like that's less in vogue. And then reporting and remediation. And one of the trends now is less reporting.

You know I don't need to think report anymore. I really need the quick and dirty, done. So we're looking for coding flaws that, that look like that injection flaws, cross-site scripting. Misconfigurations. The one thing that I will say over again, the last ten years is the most egregious. The top ten scariest vulnerabilities we found were not misconfigurations or even coding flaws.

They weren't SQL injections. They were the crazy architectural flaws where somebody trust an input. Our trust in API input are, you know, you can traverse client data because of the way you implemented off on the server. So so a lot of this stuff is again looking at it Misconfigurations mistakes. Oh, we open up a TCP port. We forgot to close it out or we wrote code wrong way.

But the real scary ones are the architectural ones. The ones that you won't get with any automation. Hints. Back to the manual testing. And there's a theme here. And the good news is, is if you do it over, you can rinse and repeat and do it continue. So there are some strengths to to penetration testing obviously. Right. I mean you can find stuff before the bad guys do.

That's that's the general thought here. Right. And once you define and get that protocol down and figure out, okay, this is what I mean. This is what you mean, then you can do it over and over and over. In theory, and I've seen certain larger clients, they'll have a, an established group of companies that they trust and have vetted to do testing.

So they'll move them. It's usually the same suspects. I won't name names. And you'll see sometimes from a supply chain standpoint, say, oh, we're going to do work together. I'm going to use my trusted vendor to do a pen test of your environment. So the trust mechanism is the is the vetted vendor are the vendor collection. So and once you do this, you still get different shades and different variations because it's human beings plus automation.

But you kind of get it in a repeatable process. You kind of doing stuff and it's effective. I mean, you're you're generally addressing risk. I would say you're finding stuff before the bad guys do. The downside of it is, again, still not universally accepted. So when you if you get anything out of this presentation, the one thing would be when somebody says, oh, we need to do a pen test.

Your response is what type of pin? What type of test? What do you mean by that? Let's talk. So the depths of testing can vary. It still does. I've seen ones where there are really audit driven. You know, again, I've been on the vendor side for most of my career. And a great example would be, oh, who's who is the actual client buying it.

Oh, it's the VP of audit. The VP of audit. Do you think the VP of audit has different desires, than like the actual VP of security? Of course they do. Like like they want many times surface level or checkbox. And as I mentioned, the cloud vendor could care less about, cloud vendor. We do work for you care less about checkbox.

They really want to find the crazy vulnerabilities before they get out there. And by the way, that cloud vendor, like many of the sophisticated ones out there, had internal testing and scanning in the SDLC that internal testing teams, they had external testing and then they had a bug bounty program. So like 4 or 5 levels of testing, like if you're doing 4 or 5 levels of testing, you don't care about checkboxes, right?

So penetration testing as another checkbox is bad. We've seen these programs become static and then they're like fire and forget where they're hey, we've been doing a pen test in this new company by whisper that we're doing we're doing AI driven testing. Like the first discussion we have is, oh, is that the same as penetration testing?

And that's what generated the whole thought behind this presentation is they're they're not they're not unlike each other. There is overlap. But it's funny because a lot of these clients have their unique pen test budget. They've been doing pentesting. They've been using the same profile. My point being is after this session, you should ask ask that question what do you mean by pen test and think differently?

I would argue almost all the penetration testing that we bumped into and seen does not address the incremental or additional risk that I prevent presents at all. At all.

Okay. So fast forward to now. New company Bite Whisperer. We're doing, as I mentioned, AI testing, threat modeling, hard core stuff. Still doing networking app. We still see that a lot. Sometimes you see the segregations where a network test is like a standalone thing, an app test or the what. But really what happens is the AI part is part of an application test because it's part of an application with the data below it.

And again, the constant is still varies crazy between the different ones in between industries. I mean, oil and gas is different than financial. I mean, it just is. And even within banking sectors you talk to the big banks versus the community banks, different testing approach, different appetite for risk. So here's where it's going now. And talk about this accelerated by oh my gosh, the craziest, fastest implementation of a new technology in the form of AI.

I was on a panel at RSA in April with a guy named Anton Chavkin from Google. A lot of people know him and we had a great quote. I loved it, said, hey, we don't even have enough practices now to have best practices. Like everybody knows this is new, but what's happening is because CEOs are afraid of missing out, fear of missing out.

Like everyone's going a million miles an hour an hour. Personal hands on experiences. They're doing so without security. As a planning consideration. It might be an afterthought, but it's not a planning consideration. So what? What's happening? Here we are again, this feels a lot like 2004 from an app standpoint. We're creating an attack surface without really understanding the underlying stuff.

And if I didn't think that was the case, we have usually about a test a week that comes in. We had one two weeks ago where, classic apps like we got privilege escalation, we got to we routed them within like two days and they're lamb was in the cloud and public facing which which which really wasn't a security risk.

But if you wanted to generate a like 4K 60 minute video off of their large language model, you could do that and they would not know it until they got a $10 million bill from anthropic. So there's new, new things when talk about that. Okay. Another key point are these things matter. There's like a perception that we see and I hear in conversations almost weekly though, it's I like easy button magic.

It's magical auto magic right. All this stuff matters even more in a world where you don't understand the sequences, you don't understand what it's doing. And one of the other things that came out of the RSA panel was the fact that we generally think the complexity of AI is making it harder for us to understand the risks. Let me say that again, there's a bunch of people that are trying to compare AI to mobile AI to cloud migrations.

AI is different from the standpoint. Is it actually a bit more complex? But all these things that we've known and learned to love over time matter even more. So concepts like defense and depth, like least privilege, matter more. Okay, so with apologies to the Monty Python's Flying Circus, for those who remember that a now for something completely different, this is where we go and veer off into the the weird world of AI and security up.

So I went a little fast. There, see if I can go backwards. Okay. How is it different day to day to day to like who cares about bias if you're you don't have a large language model? Who cares about all these different things? It really is a data science problem as you can do some crazy stuff with data that you couldn't do before, and you don't have to worry about, we're talking about non-determinism and randomness and how that's antithetical to our compute model from the 50s until three years ago.

We're talking about hallucinations. Everybody understands this. I think I gave an example of one. They're easy to understand, easy to see sometimes. But guess what? If you're pulling code from an API call to OpenAI, you're not going to see that hallucination, right? Okay. Unintended bias. You can manipulate the inputs also. And here's another thing auditability explainability.

You have to say some of us do have to go to auditors and say, well, how did you get that conclusion? Oh, I don't know. It was a Lem that doesn't cut it. And by the way, the big LLM producers know that they're trying to pull that through and fix that a bit. But let's talk about Nondeterminism for a bit.

How many people are willing who wants to stand up and give me a definition here? Okay. I won't call on James Cooper or I won't call on Mary Dickerson or the other one. Okay, so Nondeterminism, I should say determinism is this idea that if you put an input in, you get the same result every time, right?

So Nondeterminism is when you put in an input, you get a result, you put in the same input, you get a different result, put in a prompt, you get a response back, you put it in a different the same prompt. You get in a totally different. That is anathema to our compute model. And that's actually a bigger problem in software development than hallucinations, I would argue.

So and it varies in small ways. I give you these examples and it takes an eye to look at them. If you're doing, training tech or if you're doing a presentation or something for LinkedIn, it's not a big deal. But again, think of a world where you have if statements, you understand the logic, you can go through the logic and then you have an API call out to ChatGPT or OpenAI and it comes back so understood understood logic, logic, logic, randomness, logic, logic logic.

That's the problem that we're talking about right now with, with Nondeterminism. So a couple of things I'm not going to I'm not going to do justice to any of this stuff, but it does, in fact, create a new attack surface that you have to consider and build into your testing plan model exploitation, data poisoning, and adversarial inputs.

And I explain each of those, and again, when I say apps that are, that are using AI to generate code. So think of the Copilots also the ones that hit an API and pull in whatever the result is to okay, so model exploitation. You can create inputs and prompts by prompts by tricking the model to reveal stuff that it shouldn't.

There's there's systems now in third party systems that prevent that. But it's safe to say the data scientists that many times create these things don't envision the abuse cases that are out there. So having that abuse case, thinking about the models, you can extract stuff that you weren't supposed to personal information. You know, I think we all know there's training on internal corporate data.

Is is a is a no no, specifically outlook oh 365 because you could then extract whatever the heck you want. HR data, all of it. So this is the probably the biggest one. I mean, like five years ago, who cared about data models and data science? Now it's the biggest thing, explaining the model itself.

Data poisoning is another one, where you can inject malicious inputs, you can start to corrupt the data. You can train it to do certain things. The, the one thing that I would just point out to everybody that we know on the, on the bad guy side is they are putting up vulnerable code all over right now to train LMS, just blindly putting it up, with the hope that all one of the the LMS will train on that data and then ultimately.

So that's just the real problem that we have is we trust the outputs from ChatGPT as if it were gospel. Right? It's not. And this is an example where you actually have people that are out there trying to maliciously put things, throughout the world, adversarial inputs, model degradation. You can you can get it to produce biased outputs.

You can target certain bad things that occur. And, and again, it reflects poorly on the company. Is this really a security vulnerability? You could argue maybe. Maybe not. But like, it certainly looks terrible. And a starting point for those who don't know, the OWASp top ten, the classic one is the prompt injections, getting the models to do things that it shouldn't.

And look at LMH six, the one that we're when we get into a generic AI excessive agency, we're putting too much trust in the outputs. We're giving too much agency to the AI to do certain things without, without it. By the way, our company does a lot of AI policies, ironically, not by choice. It's like we get pulled into it like we to do a and I was doing one about a year ago where I couldn't figure out what was missing in this particular, policy.

And then it hit me, they didn't have any human in the loop requirements for any of their critical systems. And these guys were a electrical provider. They were a utility. And their AI policy had nothing to do about the excessive agency for certain things. And, oh, by the way, to do I worry about Grammarly on the desktop.

Kind of. But do I worry about you putting in the generation and distribution system and Lem that without a human looking at the outputs and, by the way, little aside out here somewhere is the, WarGames Whopper. Have you seen the the Whopper? How many people have seen war games recently? Like, okay, I watched it with my six year old daughter about six months ago.

That movie still makes complete sense. What did they do in that movie? They pulled all of the missile crews out of the silos and just plugged it in. The Whopper, that's a human in the loop. They took the human in the out of the loop here, and they gave the Whopper excessive agency just to make decisions on nuclear strikes on its own.

All this stuff is still relevant, and it's funny. I would recommend you go back and watch that. Yeah. Okay. So put it all in context. AI powered apps still subject to the classic security things, maybe even more so. You have to pay attention to the additional attack surface. You can't just point a scanner at it. Not now, at least.

Maybe in the future. And I would say that penetration testing, along with actual understanding with within the SDLC, are kind of the baseline ingredients of this. Okay. Here's a little chart kind of the difference between all of it. I would argue the tools are different. I'd say the tools on the AI side are very immature. We're starting to see those come out.

There's probably more for protections than there are for testing per se. The biggest thing right now is just having people that understand and understand how to build the threat models. So what we do and what we see the best practice is more whiteboarding, more threat modeling, less hitting the scanner. Less automated testing. Right. Right. Now for this.

So, I mentioned the the testing that we did at my last company for the big, cloud provider. I would say, again, maybe 50% of any project was thinking about the attack surface whiteboarding, collaborating due to a threat model to find out before before we do anything. Otherwise you're just, you know, scanning and getting zero. Okay. I can't have a presentation about this without touching on agent AI, and we're not going to do just this either, for the record.

Subscribe to our newsletter

But let me just talk a little bit about this. The think about what an agent can do, right? An agent can do stuff in sequence and in order to do those things, the classic one is to go make a reservation, go pay for it, go put it on your calendar, go send an email, to my spouse or loved ones, do like 6 or 7 discrete steps.

But the main thing about agent AI is it has to have privileges and it has to have to, obviously has to have access to everything. The challenge is, again, we see these implementations without any concept of threat modeling or even, what I call abuse testing. Like, like, okay, what this third step right here where you make this call over here, what does that do?

What privilege does it have on the other end? So the the lack of understanding the lack of of of honestly deliberate thought, is such that it makes it really interesting. So what we tell people to do is like, come up with an approach, define scope. Do you threat modeling again whiteboard. Whiteboard okay. This is what it does.

Let's understand what it does because once we do now I'm going to know where the weakness is. This is where we're going to spin. You know nobody has unlimited amount of time to test okay. Based upon what the threat model. Here's what we're going to spend most of the effort here and here, because that's where the likely problems are going to be.

Off and off become important again, as I mentioned, input validation. That's unsexy, but totally rude. But also, you're looking at everything and looking at the validation of the stuff that goes into it in the supply chain. So how do you do that? Some of that's manual right now, like malicious malicious input tools are starting to emerge.

You can run some grep and and snake sneak to look at the dependencies. This is basic building block Appsync. Fuzzer has become cool again, obviously, in understanding what you're fuzzing before you do it. And then there's a lot of shell commands and custom stuff. Key recommendations, threat modeling. How many people do threat modeling a regular basis?

Oh, thank you back there. Just you maybe smile. I would it's like flossing. You probably should do it, but nobody does. But in AI, it matters even more so because again, like, here's a question. If you're outsourcing your pentesting, how do you even scope it? And if the vendor comes back and says that's that's $100,000. And no, it's not.

Yeah it is. Here's why. But revamping your testing approach to probably pick up more of the data poisoning, or the AI stuff, look at the emerging frameworks. I asked this question every time, and I pick on everybody. How many people have read the nest AI risk framework? The whole thing? Is that a yes? You read half.

Okay. Two and a half. That actually tracks with what I asked. At RSA, we had probably double the amount of people in the room. We had six people for 4 to 6 people said yes. And it's not even though it's been out there. But the reason I say that not to pick on people. And thank you for reading half I want to what what prevent you from reading the second half?

It's a lot better, by the way. It gets better at the end. For the record, the lack of sleep. No. If you want to go to sleep, read the first half of the notes at this rate. But the reason I. I'm saying this is everyone invokes the AI risk framework as if it were a gospel, like, oh, it's, you know, like but nobody's actually read it and it's boring as hell.

It really is. But but, so right at the top, I do recommend the annexes are good, but you know what? Speed. Read it so you can say you've read it. And next time somebody asks the heck yeah, I read it. Or if you read it now, is it over? Is it do a refresh. Absolutely. What I like is again, probably my true north for AI right now is OWASp top ten for lambs, which by the way, have been, adopted and also updated last couple of years.

So they're not static. OWASp asp yes. How many people have heard of ASB's? Okay, half of it now. Oh, you you'd read the whole thing this time? I'm giving you a hard time. This is this is improv right here. What we're doing? No, ASP is the application security verification standard. And what it does is it allows you to do apples to apples.

Comparison of application testing. To answer the question, how much? Right. So ASP is actually pretty good. Between that and the last top ten for Elms, like okay, I've triangulated I know what we're talking about. And then on the adversarial side, the miter in it stands for adversarial threat landscape for Artificial Intelligence systems, which is why they have an acronym Atlas.

I don't know if they had the acronym. And backed into that, but either way, it's on the adversarial side. It's actually good to think of. So when your boss is asked like, what are the bad guys do? You can read the atlas and be able to say that, yeah, there I am again. That's pretty frightening. I have time for questions and hopefully an answer.

Two so yes sir. Here, let me, get the oh, thank you very much. You also read the nice I know. Oh, no. What I was going to say is that I discovered that using LMS, you can have the same input and get the exact same output of the seed of the generation is the same. So for locally hosted LMS, like stable diffusion or large language models, I discovered that if you give it the exact same seed with the same prompt, you get the exact same output again.

Yep. That's true. That's a good point. Good kind of point. Yeah. And with retrieval augmented generation you can kind of get know what you're getting back to rank as well. So there's ways to circumvent that. My point is is that most people don't do that. The vast majority of people don't in a situation like that, does that make the seed preparatory information and therefore more important to be, kept safe?

Yeah, that's a good point to yes.

Thank you for that note on genetic. I, do you envision a world in the future where you'd have all these thousands of agents running around and they'd need those same kinds of credential checks and certificates that humans do these days? Wow. No, I think I think there will be vendors that go in this space and, and solve some of those problems, but I'll give you an example.

Right now, the MCP protocol, passes session ID information, the URLs. That's got to get fixed. It the default off and off is not great. Those are things that I think will be fixed, but again, the point being is don't take it for face value and ask questions and ask questions like that. So thank you. Yes, sir.

Hi, John. Thank you for reinforcing that. Great power comes with great responsibility, right? We all need that reminder. I have a twofold question. With the evolution of the technology where we are seeing different trends. Right. So we are seeing AI. You're seeing platform ization and stuff. Right. Do you think that from the pentesting point when you're doing all these pen test, is it are you seeing more gaps or are you seeing that the businesses are becoming more resilient specifically for more gaps?

I'm sorry, more gaps. Lots of more gaps, right? I mean, big time I mean, that I just tell a personal story. My company got acquired four years ago. My wife told me I'm never working again, and, I didn't have to work because of that, but because of the patterns of behavior that we saw with AI. I was encouraged by a few clients who said, no, no, we need guys like you to help us right now because we're recreating some of the negative patterns that happened with application development 20 years ago.

And my point earlier was the fear of missing out is, is creating these unintended consequences, which is just additional attack surface. So know and we are seeing the same thing that, people are when we look at it from the adoption point, we there are a lot of things that we are lacking. So the part two for that one is, now we have the AI within the tool and as a tool.

Right. So different ways, from the pentesting point, do you think it's getting better or worse? I think you answered first. I answered this before, but let's say if you do find the vulnerability right, and you say you're doing a test and you see something, do you do we know, enough about how to patch it? Our businesses, you think are ready because we are now in this world where we are using AI at such a fast pace, do we understand how to patch it?

Oh. That was a that was a long question. Let me, I always have this fear is a vendor. That one test we're not going to find anything like this will be the this will be the client test where we have like 1 or 2 loads and some additional, like now we like this is created like an entire new set of vulnerabilities and attack surface.

I think my what we're seeing is that the imperative, the business imperative move quickly is generating more attack surface. And that's reflective in our, testing and external testing. It's reflective in the conversations we're having. We have certain organizations that are doing quite well, but others are just struggling because the VP of dev is like, is empowered to go fast because of CEOs, fear of being extinct, the company being extinct.

Subscribe to our newsletter

And that's created all kinds of weird stuff. So I think we we continue to find stuff. And I don't have that fear that I'm out of a day job. So. Yes, sir. I have a question regarding the the non-deterministic nature of it. Like since for any given that for a given input, you can have different output, right?

Yeah. So how do you test against that? Because first of all you can't cache it. It's hard. It's bloody hard right now. I mean there's no real good answer. How do you test against it? I don't have a great answer either right now. Because, I think what we do is that's the manual part of the manual pen testing is trying to look at it, but, like, I don't think there's an automated way that I'm aware of if anybody has it, but I don't I don't have a great answer there for you.

Okay. So on that, because the range keep sharing. Right. Because if you send it to another system, the rest get bigger and bigger. But eventually you have a huge range. Yeah. Let me see a Steve. Maybe Steve can answer that one better. Oh you can. Okay, okay. No, but I do have a question for you. Do you think that LMS will evolve with influence of people like yourself and other smart people?

Will people start training these other Lims to do security better? Yeah. And incorporate secure methodologies into application development. So, so, so inherently provide you with secure code. That's not today but in the future. Do you think that'll change? Yeah. We're we're at the early the front end of the early part of it I think. And the the interesting thing there is a somewhat dated video out on, on YouTube from the people that created The Social Dilemma.

It's called the AI dilemma. And it was maybe two years ago. Their central theory was the big lamb. Producers are moving so quickly they're not considering anything else. They're really there's a there's an arms race between them. I agree with that. I, I think they're trying to fix things that I just heard on the way here. The open AI is, has a, like a kid checker feature.

Now that you can opt in, if you have children that are on ChatGPT. I mean, but but that's like years late from meta and years late from everybody else. So I think it'll get better. But the only way to see the a better is more. And more abuse cases that come out. I hate to say it. So. So we have three more minutes and a couple more questions.

So, apparently this is something that I heard recently. You mentioned that you've got, you know, adversaries out in the world who are seeding GitHub and other code repositories with known insecure code, right, in hopes that it gets into training data. Apparently, you've also got like there's these groups that, for example, when a new model drops, they get the system prompt, they get the jailbreak almost instantly.

Right. They're doing a similar sort of seeding, apparently, where they are also adding essentially model backdoors, right? Where, you know, yellow banana with purple dots and that like unlocks something. Yeah. Right. Like some code word or something. Right. So, and, and and I mean, it feels like there's, it's like, man, the scope of that is huge. Like, what are the implications of that?

And so I'm just wondering if maybe you could, I guess how do you threat model that risk of trusting, you know, GPT clod, whatever model it is from the major provider that are doing the training on the internet data set. Right. So okay, so I would add in threat modeling, you're looking at really three things ingress, egress and trust zones.

Right. So now you're adding to the ingress like the trust of it. And what is it doing. And asking those type of questions a little bit more. But right now what I would say, we just did a survey. Only two people raise their hand about threat modeling. So the real problem is people aren't doing it. Not, you know, they're not doing it.

If they're not, if they're not doing it, if they're doing it all, it's not very realistic. And it's not catching those architectural flaws that we see that are the scariest. So, I mean, I key takeaway do it. You know, do it. First step to recovery is threat modeling. Right. Sorry. So my question is I'm wanting advice on how we can and, educate, leadership and give them, like the time that it puts in to prepare for like copilot and things like that.

I mean, how can we say, okay, give me three months to do the threat modeling to implement to do this and that when there's show like gun how to start today. Yeah. Now the instant gratification of these things, I mean, they, they sound so great and wonderful, but they also have that dark side to you. Yeah. I mean, that's that's where you don't have to make a strategic decision.

When do you get on what what project are you able to do this on? I mean, I would I would advise anybody here. It's almost like cybersecurity in general. Don't be the person to get in front of that one. You know, like, hey, we're going to do AI. Except for what so-and-so said, we had to put controls in place.

So I would advise you not to jump in front, but like find when the opportunity exists to, in a small way, start to convince. I think I there's a few other people that I know in here I won't call on you. But I think I've seen this ability to start to get the technical side, to start to ask the questions like, shouldn't we be doing this?

Or should, hey, we're about to roll out a new MXGp service. Can I borrow you for five minutes to just tell me what I should worry about? That question alone is a watermark point in in the organization, but getting to that point takes a long time. So my my dance I we can grab grab afterwards. But you know anybody that jumps in front of and says no, we can't do that because it controls.

Yeah. It's not you know, like within your department. Yeah. It's like they're so quick to bring apps in. It's more internal it. So everybody's like pushing pushing onboarding all these you know AI API rest, injecting all that data. But there's no like, you know, real safeguards or like you said, like putting applying permissions to things. And. Well, I mean, my my first question is an outside person would say what's the risk appetite of the organization?

I mean, you may be up against a, you may be in a wow US place where they're the risk averse. Like, I'm not risk averse. There are risks there. They'll just do anything. Yeah. Grab me afterwards. Like, it's not a great answer there. It's at that moment you realize you're a sales and marketing professional and not an IT person anymore.

When you're trying to convince people to do stuff on your behalf and a fear of I may break in. Unfortunately, we are out of time as well. So come and grab me after if you're interested in the stuff. Thank you. Thank you everyone!