Artificial Intelligence/Machine Learning (AI/ML) models need more transparency and explainability, the lack of which makes it difficult to understand how they arrive at their conclusions and can degrade security. While achieving transparency and explainability can be challenging, failing to do so can require users of the AI/ML models to place blind trust in their output. When such AI/ML models are relied upon without considering how they can be subverted through a malicious act, there is a risk of weakening overall defence-in-depth.
Deployment concepts integrating AI/ML into decision-making processes are becoming increasingly common with a push to leverage the technology to reduce costs and increase efficiency. Organisations wishing to receive the most benefit from this will follow through with a reduction in the importance of direct sensing and reliance on human training, intuition, authority, and experience. This would then require more trust in the output of AI/ML models as they are leveraged directly within the performance of critical functions.
Before this happens, we should seek to understand how the models could be compromised — no matter how unlikely that compromise might seem. This is a non-exhaustive attempt to brainstorm how this might occur across attacks directed towards training data, supporting computer-based infrastructure, the model and its execution, and the acceptance and standardisation process.
Training Data
Training data is a set of examples used to train a machine-learning model. Training data teaches the model to recognise patterns and make predictions or classifications based on the training data set. Malicious actors might target the training data set to compromise the model’s decision-making.
Malicious mislabelling through attacks against the data set or labelling process for supervised models.
In a supervised AI/ML system, models are trained with labelled data to “learn” the relationship between inputs and outputs.
An example of such a training data set would be many photos of fruit, each with a description or label noting its type, such as apple or orange. These images and labels would then be loaded into the model and used to compute patterns visible within all photos of a common type. The model learns to recognise patterns in the fruit images during training, associate them with the labelled category, and then use that knowledge to identify new unlabelled fruit images it hasn’t seen before.
Suppose the training data set or the labelling process is compromised, and these labels can be manipulated. In that case, the model could be trained to return wrong decisions on wide-ranging or specific inputs.
Injection of adversarial examples into the training data set.
Similarly, a malicious actor could intentionally manipulate the training data set by injecting adversarial examples, which may be indistinguishable from regular inputs in review but are designed to mislead the model. This could later be combined with adversarial inputs to increase the reliability in which they can subvert the model once it is deployed.
Undermining the statistical assumptions of the model by silently manipulating training datasets.
An AI/ML model is designed to learn from the statistical patterns and trends in the training data set. A malicious actor could manipulate the training data set to undermine the statistical assumptions of the model, resulting in a model that does not accurately represent the data and is not fit for purpose. This may be particularly relevant for models running on limited datasets, for example, a model designed to support the calculation of flow rate at a specific point in an industrial process without a sensor.
Deliberate manipulation of captured data to affect the fit of the model during retraining.
AI/ML models are retrained over time with new data; a malicious actor could manipulate the captured data, combined with one or more of the above techniques, to affect the fit of the model during retraining.
Unauthorised access to retraining data compromising confidentiality.
The training data may be fundamentally sensitive, revealing flaws or limits of performance in a model or the function it is designed to support. A malicious actor may be able to utilise this in preparing an attack against the model or the supported function.
Supporting Computer-based Infrastructure
AI/ML models do not work alone; they are surrounded by supporting infrastructure such as servers, databases, storage systems, networking equipment, and software tools to support the training and execution of the model. Malicious actors may target these to compromise the model’s output indirectly.
Compromise of supporting software in the input/output chain of the model.
It will likely be uncommon to directly provide input or receive output from the exact process of training or executing the model. Instead, various supporting software may provide web and network services, inter-process communication, and loading input and output data in and out of pre- and post-processing functions.
A malicious actor may attempt to compromise any part of this infrastructure to manipulate the integrity of the data loaded into the model, or the output returned (e.g., through a man-in-the-middle attack). This can be done in a way undetectable by the system or user of the model, exploiting any implicit trust at play.
Compromise of operating system functions leveraged by the model for interaction with data sets.
An AI/ML model may use many features the operating system provides. A malicious actor may seek to exploit vulnerabilities in the operating system’s system calls, file handling, shared libraries, and memory management to inject malicious data sets into the model.
This is noteworthy as without explainable models, or within models that are considered proprietary, there is an implicit trust required that may outweigh the authority, intuition, and experience of personnel in a decision-making process, e.g., the AI/ML model is provided by a vendor, the model is accurate, but through such a compromise its decision-making ability is compromised. An operator notices this, but the model will pass all integrity checks. Who would you trust?
Manipulation of input order through the randomiser function.
During the training process of AI/ML models, the data is often randomised to reduce biases that might be generated towards a particular subset of data. A malicious actor might seek to compromise the randomiser function used to load the training data into the model, allowing them to manipulate the order of the inputs. This could lead to incorrect relationships between inputs and outputs or be used to magnify the reliability of adversarial examples and inputs.
Manipulation of test input/output through pre- and post-processing functions.
Pre- and post-processing functions can be used to manipulate training and input data and the resulting outputs from the model. A malicious actor might seek to compromise one of both sets of functions to allow undetected subversion of the model while outputting data that either deceives systems and personnel or holds up to interpretation standards; however, the model has been silently compromised.
The Model and its Execution
AI/ML models are fundamentally computer programs. They are represented in code that is executed by a computer-based system. A malicious actor may seek to compromise either the code itself or its interpretation by computers.
Compromising the programming or code of the model itself.
A malicious actor may seek to modify or inject malicious code into any processes that form part of the model. This would allow any form of tampering to occur to subvert the model. Such an attack, however, should be readily detected through existing means of integrity checking.
Exposure of the model code in either source or object code form.
The source code or compiled version of the model may contain confidential information that, if accessed by a malicious actor, could be used to identify vulnerabilities or extract sensitive information that could support another attack.
Compromise of operating system functions leveraged in the execution of the model.
The object code for an AI/ML model needs to be loaded for execution by the operating system and potentially a series of interpreters. A malicious actor may seek to exploit vulnerabilities in the operating system’s system calls, file handling, shared libraries, and memory management to inject malicious code into the execution of the model itself.
Compromise of serialisation functions or storage used for persistence of model state.
Some AI/ML models use serialisation functions to provide for state persistence. These functions generate a representation of the state of the complex data structures within the model and store it in a form that is easily saved into a file or transmitted. When this data is again loaded (deserialised), the model resumes executing from its previous state. A malicious actor could compromise the functions or the stored serialised form to tamper with the model and subvert its subsequent reload and execution.
Acceptance and Standardisation
AI/ML models will take time and resources to develop. Like with conventional computer modelling codes, some will become more prevalent than others. A malicious actor may exploit this natural filtering.
Deliberate promotion of vulnerable models that can be exploited with specific knowledge.
There is a risk of deliberately vulnerable AI/ML models being published, which may appear complete but contain weaknesses that can be exploited with specific knowledge. A similar situation would be the allegations that a backdoor allowing an actor with knowledge of the backdoor to decrypt affected communications was crafted into the Dual EC DRBG pseudorandom number generator during the standardisation process.
A well-resourced malicious actor may publish a model that achieves a desired goal but embeds an element that can be exploited later. Such an attack can be supported by proxies and unwitting advocates, leading to the widespread adoption of the compromised model. This could be occurring now.
Conclusion
A malicious actor will constantly attempt new attacks to achieve their intentions and seek to subvert functions and the systems that perform them by intelligently triggering the worst impacts at the most vulnerable moments. We must consider every potential point of failure in a system, even those unrelated to its design. This requires examining every aspect from multiple angles, building an understanding of how it could be exploited, and securing it according to how the technology will be used and the unacceptable consequences of compromise of its misuse. This is a brief attempt to consider how that exploitation might arise for AI/ML.