Security Intelligence
November 9, 2024 | From the experts
The Brass Tacks of AI and Cybersecurity - Part 2 of 3 - Endpoint Agents
By Matt Holland
AI is a powerful automation tool but not a cybersecurity panacea for endpoint agents
About a year ago I wrote this blog about the realistic impact of Artificial Intelligence on offensive cybersecurity, and I’m happy to say it largely holds up. I was compelled to write that blog due to strong opinion on the part of a security researcher that made unrealistic claims about how AI was going to bring doom to the world of cybersecurity, and I just had to weigh in. I can’t stand fearmongering. The sky has not fallen, AI hasn’t led to immeasurable malware strains, and I don’t have to eat a slice of humble pie for my contrarian perspectives.
Well, it has happened again – more unhelpful, sensational messaging that is harmful to the industry and nowhere close to realism. Not what you would hope for or expect from a leading cybersecurity vendor. I’m talking about this article: SentinelOne CEO: Cybersecurity Shouldn’t Require Constant Updates.
This type of rhetoric creates false expectations for cybersecurity buyers, rather than enabling them to focus on the things that really matter. On the bright side I don’t have to hear, “Matt, when are you going to write part 2 of your AI and cybersecurity blog trilogy…” from our marketing team anymore.
I normally don’t read content or articles from other cybersecurity CEOs. I keep my focus on what Field Effect is doing to improve, and how we help our customers and partners. However, I had so many friends who are endpoint coders around the industry and in the intelligence space reach out for my thoughts that I gave in and read the article. It was like there was a collective groan and eye roll in the endpoint community.
Before I continue, I want to add a bit more background as to how and why I make some bold statements in this write-up. In previous career roles, I led teams that have written memory-only, threadless implants (sexy term for intelligence agency malware) that do not create processes, load dynamic libraries, incur direct (or attributable) file activity and exfiltrates data over comms channels that are network-based. My skepticism of cybersecurity vendors that claim they can “detect and block all malware” is naturally quite high, regardless of whether AI is used or not.
Evolution, Not Revolution
Let’s be clear – there is no concept of “next generation” in cybersecurity, regardless of how many times you hear it over and over and over. Cybersecurity is a continuous evolution, and the introduction of AI is no different – it is a gradual evolution, not a quantum leap in capability, regardless of what cybersecurity CEOs, industry analysts or the stock market want you to believe.
I do agree with the interviewee that, “building a more-resilient system embedded on the device” is a good thing. But it’s neither a unique view nor “next generation”. It is what every endpoint company should be building, but that does not negate the need for tactical and maintenance updates. AI-powered endpoint agent systems do not make endpoint updates unnecessary, no matter how hard one clicks their heels while wearing ruby slippers.
I remember about 10 years ago another endpoint company made a similar claim about not requiring endpoint updates “because of the power of the cloud”, and they rapidly became entirely irrelevant in the industry. I won’t name them because there is no point, but as any experienced endpoint agent developer or team would agree, endpoint agent updates are inevitable. AI doesn’t change that.
OK, grumpy old man rants aside - how does AI practically benefit the defensive aspects of cybersecurity? The focus of this blog will be endpoint agents, and then part 3 will focus on other technology layers. Luckily, AI does help in quite a few ways and it all starts with the basic premise that AI is a tool, not a solution. That’s key. That means while AI is helpful, it is not the be-all and end-all of endpoint protection.
To use a tool, you first need to know what it is good at. Let’s ask ChatGPT 4o for its opinion on the two most relevant subsets of Artificial Intelligence in the world of cybersecurity: Machine Learning (ML) and Large Language Models (LLMs).
Question: What are large language models good at?
Paraphrased Answer: Natural language understanding and generation, question answering, creative writing, code assistance, conversational agents, data analysis, education and tutoring, and content moderation. While LLMs are versatile and powerful, they also have limitations. They can sometimes produce incorrect or biased information, lack true understanding or common-sense reasoning, and may require careful prompting to deliver the desired results.
Question: What is machine learning good at?
Paraphrased Answer: Pattern recognition, predictive modeling, natural language processing, recommendation systems, anomaly detection, automation and optimization, healthcare and medical diagnosis, autonomous systems, gaming and simulations. While machine learning is powerful, it also has limitations. ML models require large amounts of data for training, and the quality of their predictions is dependent on the quality of the data. Additionally, they can be opaque, making it difficult to understand how decisions are made, leading to challenges in interpretability and accountability.
Sounds like a pretty great tool to improve cybersecurity, right? But the devil is in the details.
Oh, one more thing before I get nerdy. As of today, Machine Learning is the most applicable subset of AI for cybersecurity, so for the remainder of this blog, when I say AI, I am referring to Machine Learning. I will dive into Large Language Models in my next blog.
The Challenge of Intent
It is important to understand what makes detecting/blocking malware an ongoing challenge, and why claims of AI-based systems being able to detect and block “any malware or cyberattack” are ludicrous.
The first thing to be clear on is that malware is just software. I know movies make it look like a scary thing flying in a 3D world, but at the end of the day, it’s just software. Additionally, it typically has many characteristics in common with typical apps or software installed on a host, with the exception being exploit shellcode or launcher code. Luckily, there are very distinguishable attributes that make some types of malware stand out like a sore thumb (i.e. richly featured, intelligence/military systems). But those types are rare to encounter, especially for the vast majority of businesses and organizations in the world.
Most cyberattacks consist of, or look much more like, normal software or activity on a host. For example, consider an attack that involves an RDP entry point, utilization of PowerShell or a legitimate back-up software utility, etc. as part of a ransomware attack. This is what most businesses will encounter, and unfortunately, it doesn’t look much like malware at all. I think our SVP of Service Delivery, Pat Smith, describes it best: “the only difference between threat actor behavior and normal admin behavior, is intent.”. I agree with this wholeheartedly and it aligns with our experience of being in the trenches of MDR delivery for over seven years.
The attack itself described above would be visible by analyzing process, session, DLL and network activity. But things get fuzzy if all the processes and DLLs show up regularly when an admin is remoting in from home to perform maintenance, and network awareness is less conclusive if a cloud proxy is being used – which is quite frequent. Often the challenge then boils down to – what is the intent of the person at the remote keyboard?
So how does one gauge intent with AI or other endpoint approaches? Aside from being in the head of the person at the remote keyboard, the only option is to expand the aperture of data that is analyzed to gain as much environmental and situational awareness as possible. This means looking at data types well beyond process activity, dynamic library activity, or network activity – this is the easy stuff.
The additional types of data are considerable in both type and volume and include but are not limited to: Registry activity, file system activity, object handle activity, thread activity, memory activity, handle activity, ETW feeds, network packet data, general state of the host’s UI or hardware input, and many others. This is a ridiculous amount of data, and there are significant hurdles to overcome while training an ML engine that can process it all (discussed in the sections below). Lions, tigers and bears – oh my.
Let’s look at some real-world data. I spun up a vanilla 64-bit Windows 10 VM with no software installed, installed the Field Effect endpoint agent, and let it sit for an hour. After an hour, I issued one of our diagnostics commands to list the number of times our kernel callbacks were called – this represents the amount of data that was generated by the idle machine, the results of which are in the below diagram. Note that the types of events captured below are process, thread, dynamic module loading, registry, file system and some network events (those to support connection tracking). I’ve highlighted the events in red that are associated with process, dynamic modules, thread and socket activity, which constitute more common data feeds to transport from a computer to a central location.
In just one hour of time the idle machine generated almost 56 million events. In order to gain security relevance, each event would need to track associated process(es), object identity (e.g. file path or registry path), and other metadata. This would easily be more than 100 bytes for each event with objection deduplication/server optimizations – but let’s assume 100 bytes on average. This is the amount of data that a company of 100 computers would produce in a single week.
Over 85 TB per week for 100 idle, simple, computers. The above amount of data is similar to each computer constantly streaming a 4k movie, 24/7, nonstop. If the machine was actively in use (i.e. not idle with the screen locked), or a server, this would only be a drop in the bucket. Additionally, this doesn’t include other data feeds captured, such as Event Tracing for Windows (ETW), handle activity, memory activity, or other types of deeper runtime inspection. Including these types of data would be another order of magnitude in data volumes.
The bottom line is – there is more data than can be feasibly transported to a single location for central analysis. If you are thinking to yourself “why not just use the cloud?” the physical toll on business computers and networks to collect and transfer that magnitude of data to the cloud would be immense, and they would grind to a slow halt. This fact cannot be ignored or hidden behind the latest shiny object.
Then there would be the cost of cloud storage and processing, which has slowly become very expensive over the past decade. Unless you are Microsoft or Google, funding the back-end storage and compute for this task would be infeasible. And Microsoft and Google are much more occupied with using their compute resources for LLM training, than they are with cybersecurity data sets.
Cybersecurity isn’t like a social network where all of the pertinent data is already collected and server-managed. So if you are thinking “well, Tweetties and FaceTubes can do it, why can’t you?”, it’s because social network data is already centralized, and endpoint data is not.
What are the conclusions that can be drawn at this point?
- The amount of data produced by computers that is relevant for security analysis is considerably higher than most people realize. The above set was for an idle virtual machine, and much more data is produced in real life by an average computer. This fact never gets highlighted. Honestly, if you’ve never worked on an endpoint agent team, you likely wouldn’t be aware of the scales of data we work with and analyze – at runtime.
- The ability to gauge intent from the above data set is questionable and would likely require additional data sets (i.e. more context) to establish a very high confidence measure of intent. So, the above data set is not a slam dunk, and additional data types would likely be required.
- For AI to be effective at addressing the challenge of attacks that look like normal Administrator behavior, it needs access to, and to be trained on, massive amounts of data – likely over a TB per day, per machine, in a realistic business environment.
If one were to successfully, at scale, get this data in front of an accurate AI training process, this would likely produce a great outcome. So, if the data is accessible, what makes this so difficult?
Back-End Data Analysis and Automation
When I see vendors boasting things like an, “AI-powered endpoint agent”, I always ask myself, “do they mean AI on the agent, AI on the back-end crunching data, or both?”. We have seen examples of both approaches in the industry over the past decade, but it’s always hard to gauge their true effectiveness.
I will say, however, that I have seen first-hand the amazing ways in which AI can be utilized as part of a cybersecurity back-end (server and analysis engine). When you think about what a back end needs to manage - millions of endpoints, millions of cloud accounts, thousands of network sensors - the suitability of AI is very clear. According to ChatGPT 4o, Machine Learning is very good at:
- Pattern recognition
- Anomaly detection
- Automation and optimization
- Autonomous systems
These strengths align with the benefit we have seen at Field Effect. We are a company of under 200 employees (at the time of writing), and I can say that we would not be able to provide extremely effective MDR at the scale that we do without heavily utilizing AI in our analysis and fleet management processes. I know of other cybersecurity companies who are taking the same approach, so I would consider the utilization of AI for back-end data processing and analysis to be industry standard at this point.
But there are gaps and challenges when it comes to effective ML running on a back end. According to ChatGPT 4o above, “ML models require large amounts of data for training, and the quality of their predictions is dependent on the quality of the data”, and congruently, the quality of the training. And therein lies two challenges when it comes to cybersecurity:
- Training an ML engine with data that is relevant and unique to malware, or at least is sufficiently representative of malware activity and separable from benign behavior, and;
- Transporting enough data observed by a sensor (in this case, let’s assume an endpoint agent) to facilitate a training process.
Both of the above challenges are associated with data and its suitability and accessibility to an ML training process.
Recall in the previous section that in order to detect the most relevant cyberattacks today which affect the majority of businesses in the world, we need to focus on common administrator patterns rather than hand-rolled malware. To train models that detect malicious intent in the use of these tools and patterns requires access to vast quantities of data. Transporting such data volumes from endpoints to a single location is largely infeasible. This creates a significant challenge to training performant models that generalize well.
You may be thinking – why not just transport a live stream of all of the data required for that deeper level of event and situational awareness to establish intent? Well, there is a reason why every time I suggest collecting more data for back-end analysis, our Technical Director of Platform, Andrew Stevenson, gives me a blank stare and utters words along the lines of, “I will physically crush you if you do that. That will kill operational performance. We don’t want that, stop suggesting dumb things, Matt.”. i.e. it is massively expensive for host and server CPU/disk, and network bandwidth.
Consider the implications of the storage and network usage penalties if we compare transferring process, DLL and network socket data versus all of the relevant event data to expand situational awareness to help solve that “intent” problem via AI (which is still likely an underestimated quantity). Using the above data set example, this is where we’d land:
Clearly, the quantity of data goes up considerably in this scenario, which has major host and network performance, storage and cost impacts. One cannot just ignore those things and declare that an AI model “catches all the things”. If your scope of an “effective solution” is based on an insufficient data set, then it is inheriting the limitations of that data set, and the effectiveness and confidence of your solution is put into question. Or at least it should be – we need to keep the industry honest.
Another consideration is that the most sophisticated malware doesn’t generate events that are easily detectable. More on that in the next section where I tackle forward-loading AI into endpoint agents.
Next-Generation Endpoint Agents? Not So Much…
In the previous section I made the clear case that to effectively train an AI model back-end and feed it the data it would need to make informed decisions, the amount of data extends well beyond what is operationally feasible to transfer and store. The obvious proposal from that is - why not forward-load AI into the endpoint agent itself? Great idea – and welcome to the last 10 years of endpoint agent marketing!
Tongue-in-cheek comments aside, this is where I see the biggest disconnect between endpoint vendor marketing slogans and reality. It makes me grumpy. This concept is a logical one in a vacuum, but in reality, it has the same challenge as a back-end solution (data volume and training), with the added challenges of user experience and CPU limitations.
To reiterate, an effective AI solution requires training, regardless of where it resides. If one were to train an AI model on an endpoint agent, the following would be required:
- The person at the keyboard would need to be evaluating data sets that are too large to transfer to a back-end (as mentioned earlier), and making good/bad decisions. The problem is, there are only a limited number of humans capable of this task on planet Earth – more than 10,000 but less than 100,000 would be my guess. Asking the average person to train an endpoint’s behavior has been tried before, and it’s a terrible experience. I remember Comodo back in 2009 utilizing an interactive “good / bad” confirmation pop-up window, and it didn’t take long for me to silence the mechanism entirely during testing because the noise quickly became overwhelming and annoying from a user perspective.*
- The volume of data sets is bonkers - in the range of hundreds to thousands of data sets being generated per minute requiring analysis. Even a highly skilled analyst with a solid set of automation scripts at their disposal would take minutes of analysis per data set.
But let’s say I’m wrong about this. Let’s say somebody has figured out how to either train an AI model on the endpoint or transfer a pre-trained model to all of the endpoints - the problem then becomes runtime performance. And once again we hit a showstopper.
If you have been on a team that builds an endpoint agent and have experienced first-hand the amount of runtime data that you need to process, then you understand this challenge immediately. If you have not, then it is easier to downplay this challenge/blocker/unfortunate reality. This is where the art of endpoint development kicks in – making real-time processing decisions with extremely tiny processing time slices. Imagine processing billions of events an hour. You don’t exactly have a lot of time to make a real-time decision to block or escalate data to an asynchronous decision-making stage. Doing this with typical processing paths, policies and signatures is challenging enough.
The assumption that an AI model can do this efficiently with the limited resources of a 10-year-old workstation or server is where I must declare shenanigans. People rarely acknowledge that a vendor’s customers’ typical host, where the endpoint runs, are older hardware configurations. Even with an AI model that could theoretically identify any threat, I am extremely skeptical that real-time decisions could be made that would not grind a host to a crawl. Maybe 10 years from now when separate AI processors are standard and AI number crunching could be offloaded this could be possible, but not now, and certainly not to the extent that warrants the level of vendor claims currently polluting the industry.
Lastly, what about malware that doesn’t create easily traceable events? Not all malware creates process or threads, or writes registry keys or files, or even has executable memory. How could an AI engine that is only aware of process, DLL and network activity detect this type of malware? It could not.
In fact, I would make the argument that the simple way to defeat any AI system (now or 10 years from now) is to restrict detectible artifacts only to spaces where the volume of data is so great that it thwarts the AI training limitations. It’s that simple. This also highlights why having humans in the analysis mix will continue to be the difference-maker for the next decade. Only highly trained threat hunters can find this type of malware.
So What’s the Brass Tacks Then?
Utilizing AI as part of a cybersecurity technology stack, or to facilitate the evolution of one, has definitely proven to be positive, but the effectiveness depends on factors that most marketing claims leave out.
When it comes to endpoint agents, I expect that companies are slowly making progress on this data volume vs. location vs. AI training challenge regarding larger data types and data sets discussed above. I know Field Effect is making considerable progress, so I assume others are as well. But any company that claims to have it solved today – our dear friend “physics” would argue otherwise.
It is important that we, as an industry, are open and honest about how AI is being utilized so as not to mislead the buyer. While it is a powerful automation and scaling tool, it is not a solution or the golden key it is marketed to be. And it certainly does not negate the need for regular endpoint agent updates.
* Comodo is not a competitor of Field Effect, nor was it a competitor of my company at the time. I'm not throwing shade at Comodo, just reflecting on my experiences with testing their product as it was in 2009. I'm sure they are nice people.