
Blog Post
September 18, 2025 | From the experts
By Matt Holland
I may have shot myself in the foot by declaring this blog series to consist of 3 parts. The rapid acceleration of AI’s impact on business makes for so many interesting topics to write about. I’ve grown intrigued with the business impact of AI, the challenges it has on gross margin as both an opportunity, but also predictability of operating costs. For example, when it is the best decision to utilize third party vendors that provide access to cutting edge frontier models, versus what does “good enough” look like and what can you host with open-source models on capitalized hardware. It’s turning into a very fun problem to solve. We’ve learned that the latter set is more powerful and suitable than most give it credit for, which I expect will put pressure on the giants investing billions in building frontier models (e.g. OpenAI, Google, xAI, etc).
Then there is the whole “who benefits most from AI?” The billion-dollar juggernauts or start-ups? Honestly, I don’t know which camp I’m in yet. There is certainly a growing sentiment that we are entering into the golden age of entrepreneurship driven by rapid prototyping and productization fueled by the utilization of AI. But that leaves me to ask the question - if your product and company are a derivative of AI prompts, then what is your moat? What prevents another start-up or massive company from generating the same things? Who owns the intellectual property generated in that case? I find this all fascinating and a challenge to predict. At the end of the day, I suspect it will probably come down to product taste, as it always does. And that’s not easy to find or build, regardless of whether or not you have R2-D2 in your corner.
OK, enough pontificating, this is supposed to be a blog about artificial intelligence and its effect on cybersecurity, not entrepreneurship or maximizing gross margin.
Part 1 of this series from 2023 talked about my prediction of what we could expect with AI affecting malware authoring, and I believe it has held up over the past few years. Yes, we see rapid generation of malware being something in the wild, but as I pointed out, mass numbers of malware variants are not a significant impact on the world of cybersecurity when they all need to use the same techniques as defined by operating system restrictions. My conclusion was that AI would not really change the game for malware generation, at least not yet. It has certainly increased the quality and quantity of phishing attacks, but not really with malware (i.e. the bits that run on a computer).
Part 2 of this series from 2024 discussed the realistic challenges of utilizing AI to get to a point where endpoint agents never need updating. As I pointed out in that blog, it comes down to data volumes that cover the execution space of malware, and this problem has not yet been solved. Although Field Effect is on the precipice of releasing our NGAV update to our endpoint agent which includes a machine learning pipeline, when it comes to heuristics, the conclusions I make in that blog have also held up.
The inspiration for this blog, part 3 of the series, is a bit different than my previous grumpy old man angle. I recently attended REcon in Montreal, and one of the presentations gave me pause. Not because the presentation was bad (it was actually quite good), but because when I stitched together the presentation content and experience from my first company, it led me to a conclusion that I think we should all be concerned about. The name of the presentation was Reverse Engineering Patch Tuesday, by John McIntosh (@clearbluejar), a security researcher at Clearseclabs. John talked about his experience and approaches to automatically identifying patched vulnerabilities in Microsoft’s regular “Patch Tuesday” updates, and the success he has had at matching code changes to Common Vulnerabilities and Exposures (CVE) reports.
I can tell you right now that part 3 of this blog series is going to be different. I had considered talking about the utilization of AI to generate highly quality phishing and social engineering attacks at scale, but I feel those are already firmly on the cybersecurity industry’s radar. Grab your tinfoil hat, and a stiff drink…
I’m going to open up a bit about my first company, Linchpin Labs (sold in 2018). You’ve probably never heard of it, and that was by design, but we became the world’s leading company at developing end-to-end intelligence tradecraft for Five Eyes governments (in partnership with a company called Azimuth Security). This is out in the open now as the buyer, L3 Harris, has a nice little website describing the now-combined entity.
One of our product lines were sets of vulnerabilities (called “exploits”) that could facilitate remote hacking into a bad guy’s device – and when I say “bad guys”, I mean the type that would strap bombs to themselves or engage in human trafficking. Really bad guys. And yes, we were absolutely certain of the ethical use of the exploits.
Generally, exploits come in two flavors: 0-days and n-days. 0-day exploits are those for which a patch has not been issued, because they are unknown at the time of discovery. From an offensive hacking perspective, these are very hard to find and generally like gold when you find good ones. For example, a Field Effect security researcher, Erik Egsgard, discovered a trove of Microsoft 0-days back in 2021 (we called it Blackswan) and went through the disclosure process with Microsoft so they could be fixed via a Patch Tuesday. This set of exploits would have been worth 2+ million dollars on the open market in 2021, and likely even more today.
Discovering 0-day vulnerabilities is probably one of the hardest aspects of security R&D, and definitely not for the faint of heart. Additionally, if you are a company tasked with finding vulnerabilities of this type, or an attacker utilizing them, when they get patched it is nothing short of gut-wrenching. I know, no sympathy expected.
n-day exploits are those that are already known and have been patched. They are much less expensive to find (de-risks part of the R&D process), but don’t have certainty of quality until they are prototyped. Also, the starting point is a CVE description and a patch binary, but one does have that point in time to start looking.
The other characteristic is that n-days generally have a limited window given they become operationally useless once broad patch coverage has been attained, but if you are fast enough, they do have applicability. The other scenario is patching doesn’t always happen for various reasons, so an n-day’s lifetime may vary depending on the platform, operating system, software that has the vulnerability, an organization’s patching plan, etc. But that window of time for exploitation exists – even Chrome updates take two to four weeks to completely roll out worldwide.
For the above reasons, we started developing a product line of n-day vulnerabilities for Android devices in 2016. Apple’s patching cadence made n-day productization a waste of time, so we focused on Android devices due to the menagerie of hardware vendors that were generally terrible at patch management and cadence (I’m looking at you, Samsung).
We encountered significant real-world challenges when trying to bring this product to market:
The conclusion here is that what makes n-day productization viable is the rapid identification of suitable n-days, the rapid productization of those n-days, and how those time frames align with the anticipated broad patch-adoption timeline.
Finding and weaponizing n-day vulnerabilities in 2016 was difficult and time consuming. While we successfully brought n-day based attack chains to market for our customers, these challenges were not easy to solve and ultimately led to the product’s cancelation.
Before I dive into the details of his presentation, I want to quickly acknowledge and thank John McIntosh for his willingness to share his slides and take the time to discuss his research with me. He is extremely knowledgeable and has some cool things coming, so please keep an eye out for what he’s got cooking!
As mentioned prior, John’s presentation triggered a bit of an “oh my” moment in my brain. He has invested significant R&D in trying to automate the discovery of what was actually fixed with a patch to Microsoft Windows by automatically analyzing CVE descriptions to identify affected binaries and by comparing the patched binary before and after. By identifying the code that was actually fixed, it provides a potential correlation to what a vulnerability actually was and how it might be exploited. While patch diffing isn’t a new concept, I believe the success that John has achieved is an automation milestone.
Here are some examples of the types of vulnerabilities:
Slide 18 from Reverse Engineering Patch Tuesday, by John McIntosh
As you can see, these types of vulnerabilities are dangerous little fellows.
John has created WinBinDiff as an evolution of his previously released Ghidriff, an opensourced binary diffing engine built using Ghidra, the NSA’s opensource reverse engineering framework. Ghidriff enabled automated patch diffing, and WinBinDiff expanded on that foundation by integrating freely available CVE vulnerability data, automated correlation logic, and additional reverse engineering methodologies to automatically extract vulnerability information with considerable success. For example, he was able to identify the buggy code associated with CVE-2024-38063 Windows TCP/IP Remote Code Execution Vulnerability that has a CVSS score of 9.8. For those who don’t keep up on CVE or CVSS terminology, that’s a very dangerous vulnerability.
While I won’t pretend to completely understand how John has built this bit of technology, he has proven that by using CVEs as a compass (his analogy), and by analyzing old and patched binaries, automatic discovery and extraction of critical vulnerabilities is possible at scale. Furthermore, he has effectively solved the first challenge of rapid n-day vulnerability productization that I have described in the previous section.
Let’s add in the current state of AI-assisted code generation, and things really start to get interesting. Over the past year, vendors have certainly made great strides with AI models and applications to allow developers to utilize AI for writing code or generating other content.
In my opinion, the two most “popular” code generation AI applications (at the time of writing) are Cursor and Claude Code (Anthropic). We tried out Cursor about a year ago and it wasn’t bad, but we weren’t blown away by it. I’ve heard it’s really improved and is now very good, but I don’t know it well enough to comment on it further.
Our experience with Claude Code has been much more positive, although quality has dipped a bit recently, presumably due to its popularity and strain on back-end inference resources. This is just me theorizing. However, our experience validates that an industry-wide shift in utilizing AI for generating code or other technical content at a very low cost is under way. We have a running joke at Field Effect that we wish we could clone our amazing development team, and the advancement of AI and code generation over the past year gives me optimism that effective “output cloning” will be possible soon. Amazing developers won’t be replaced, but they will be effectively multiplied.
The other consideration is that open-source models are very close in quality to the above vendors. One of our developers spun up Deep Seek on a MacBook and tasked it with the same code generation task, and in his opinion it did about a 75% as good of a job as Claude did. Fast forward a few years and high-quality code generation on lightweight infrastructure in your server room that challenges the code quality of today’s frontier models will be reality.
The key takeaway here is that we are in a new era of not only how developers write code, but also the ease of which harder tasks can be accomplished via AI and the scale on which it can be done.
Taking this into consideration, I am absolutely certain that you could train an AI model, be it a frontier model or open-source model, to take the output of John’s work to produce rapid prototypes of n-days, and then to full productization. It would take some work to stitch it all together, but I would bet a shiny 50-cent piece on it.
Let’s do a quick review of the challenges of productizing n-days into highly reliable exploits:
By stitching together techniques and approaches similar to John McIntosh’s research, and modern AI code generation tools, these problems can be solved. I speculate that the process of ripping apart a software patch, matching the set of CVEs to the differences between the old binaries and the new binaries, and productizing reliable exploits, will take under a day. Maybe less. It will really depend on the models used (frontier vs open source), and how well trained they become over time to repeat the process.
Consider the vulnerabilities patched in Microsoft products since the start of the year. Using the Microsoft Security Update Guide, we can see 207 vulnerabilities with a CVE score above 9.0, with 96% of them being 9.8 or above, with several hundred more marked as "Critical”. Almost all of them are remote code execution bugs or privilege escalation bugs. This is a staggering potential n-day attack surface.
Another consideration is Windows 10 and 11 share a lot of code in common. When Microsoft discontinues updates for Windows 10 (currently planned for October 2025), there will be an ongoing set of n-days in Windows 11 that become forever-days in Windows 10. Obviously at that point all organizations should upgrade to Windows 11, but the odds of that actually happening are quite low. According to Gemini:
“There is no exact current number for the devices running Windows 10, but as of August 2025, it accounts for about 43% of the worldwide Windows market share and holds a 31% share of all traditional PC users. With the total Windows market share at roughly 1.6 billion devices in 2023, this means there are an estimated 688 million to 750 million Windows 10 devices currently in use.”
That is a lot of potentially perpetually exploitable computers.
One more thing to keep you up at night – software evolution continuously happens and historically this is a good thing. However, in the future, every single Chrome update, Patch Tuesday, or any other vendor software update, could turn into an exploit frenzy of n-days that can be rapidly extracted, productized and deployed. Defending against this onslaught could be incredibly daunting.
Here’s one thing I know about intelligence agencies – they do things at scale remarkably well. If my grumpy brain has done the math on this, then I would suspect intelligence agencies are already active in this space and have been for years.
Here’s one thing I know about criminal hacking groups – they have grown to emulate intelligence agency approaches remarkably well. I suspect the more advanced criminal hacking groups are active in this space as well.
One logical opinion is that frontier model companies should be responsible for policies on the usage of their subscribers. This sounds like a reasonable request, but the application of such responsibilities would be challenging, not to mention potentially entering grey legal areas of what is a realistic outcome vs legal expectation. The world of social media is still grappling with censorship vs free speech, and I can’t imagine the conversation being any less complicated for frontier model companies.
I think that at the end of the day, this will land on the shoulders of software vendors, IT teams and the cybersecurity industry. The only way to counter this is to patch software rapidly, very rapidly. This is of course good practice, but patching all software on fleets of computers is non-trivial, and most patching frameworks include staggered patches or delays in patching to build in room for the patch to soak worldwide first. Extremely rapid and immediate patch adoption is the brass tacks of what will be required to counter this threat.
Strap in, it’s going to be a wild decade.
Matt Holland is the founder and CEO of Field Effect, where he leads a mission to make world-class cybersecurity accessible to organizations of all sizes. Before launching Field Effect, Matt spent over a decade in the Canadian intelligence community, designing and running national-level cyber operations and working closely with Five Eyes partners on some of the world’s most advanced cyber defense challenges. He later co-founded Linchpin Labs, a software development company known for solving some of the toughest problems in cyber defense and secure communications.
Matt has deep expertise in systems architecture, secure software development, and the gritty realities of defending critical networks. He writes (and occasionally rants) about cybersecurity, technology, and the gap between industry hype and reality—always with a focus on what actually works for the businesses that need it most.