UK: ICO Publishes Draft Guidance on AI Auditing Framework

On 19 February 2020 the ICO published its draft guidance on the AI auditing framework for public consultation, which is open until 1 April 2020.

What is the draft guidance?

The draft guidance sets out best practice for data protection compliance for artificial intelligence (“AI”). It clarifies how to assess the data protection risks posed by AI and identifies technical and organisational measures that can be put in place to help mitigate these risks.
The draft guidance, which is over 100 pages, is not intended to impose additional legal obligations which go beyond the General Data Protection Regulation (“GDPR”), but provides guidance and practical examples on how organisations can apply data protection principles in the context of AI. It also sets out the auditing tools that the ICO will use in its own audits and investigations on AI.
The ICO has identified AI as one of its top three strategic priorities, and has issued previous guidance on AI, via its Big Data, AI, and Machine Learning report, and the explAIn guidance produced in collaboration with the Alan Turing Institute. This new draft guidance has a broad focus on the management of several different risks arising from AI systems, and is intended to complement the existing ICO resources.
The draft guidance focuses on four key areas: (i) accountability and governance; (ii) fair, lawful and transparent processing; (iii) data minimisation and security; and (iv) the exercise of individual rights. We have summarised key points to note on each of these areas below.

Who does the draft guidance apply to?

The draft guidance applies broadly – to both companies that design, build and deploy their own AI systems and those that use AI developed by third parties.
The draft guidance explicitly states that it is intended for two audiences; those with a compliance focus such as DPOs and general counsel, and technology specialists such as machine learning experts, data scientists, software developers/engineers and cybersecurity and IT risk managers. It stresses the importance of considering the data protection implications of implementing AI throughout each stage of development – from training to deployment and highlights that compliance specialists and DPOs must be involved in AI projects from the earliest stages to address relevant risks, not simply at the “eleventh hour”.

Key Themes:

1. Accountability and governance

The ICO highlights that the accountability principle requires that organisations must be responsible for the compliance of their AI system with data protection requirements. They must assess and mitigate the risks posed by such systems, document and demonstrate how the system is compliant and justify the choices they have made. The ICO recommends that the organisation’s internal structures, roles and responsibility maps, training, policies and incentives to overall AI governance and risk management strategy should be aligned. The ICO notes that senior management, including data protection officers, are accountable for understanding and addressing data protection by design and default in the organisation’s culture and processes, including in relation to use of AI where this can be more complex. The ICO’s view is that this cannot be simply delegated to data scientists or engineering teams.
Data Protection Impact Assessments (“DPIAs”). There is a strong focus on the importance of DPIAs in the draft guidance, and the ICO notes that organisations are under a legal obligation to complete a DPIA if they use AI systems to process personal data. The ICO states that DPIAs should not be seen as a mere “box ticking compliance” exercise, and that they can act as roadmaps to identify and control risks which AI can pose. The draft guidance sets out practical recommendations on how to approach DPIAs in the context of AI, including:
Key risks and information the DPIA should assess and include. This includes information such as the volume and variety of the data and the number of data subjects, but also highlights that DPIAs should include information on the degree of human involvement in decision making processes. Where automated decisions are subject to human intervention or review, the draft guidance stresses that processes should be implemented to ensure this intervention is meaningful and decisions can be overturned.
How to describe the processing. The draft guidance sets out relevant examples on how the processing should be described, for example, the DPIA should include a systematic description of the processing activities and an explanation of any relevant margin for error that could influence the fairness of processing. The ICO suggests that there could be two versions of this assessment – a technical description and a more high-level description of the processing which explains how personal data inputs relate to the outcomes that affect individuals.
Stakeholders. The draft guidance emphasises that the views of various stakeholders and processors should be requested and documented when conducting a DPIA. DPIAs should also record the roles and obligations applicable as a controller and include any processors involved.
Proportionate. The DPIA should help assess whether the processing is reasonable and proportionate. In particular, the ICO highlights the need to consider whether individuals would reasonably expect an AI system to conduct the processing. In terms of proportionality of AI systems, the ICO states that organisations should consider any detriment to individuals that may follow from bias or inaccuracy in the data sets or algorithms that are used. If AI systems complement or replace human decision-making, the draft guidance states that the DPIA should document how the project will compare human and algorithmic accuracy side-by-side to justify its use.
Controller/Processor Relationship. The draft guidance emphasises the importance and challenges of understanding and identifying controller/processor relationships in the context of AI systems. It highlights that as AI involves processing personal data at several different phases, it is possible that an entity may be a controller or joint controller for some phases and a processor for others. For example, if a provider of AI services initially processes data on behalf of a client in providing a service (as a processor), but then processes the same data to improve its own models, then it would become a controller for that processing.
The draft guidance provides some practical examples and guidance on the types of behaviours that may indicate when an entity is acting as a controller or processor in the AI context. For example, making decisions about the source and nature of data used to train an AI model, the model parameters, key evaluation metrics, or the target output of a model are identified as indicators of controller behaviour.
“AI-related trade-offs”. Interestingly the draft guidance recognises that the use of AI is likely to result in necessary “trade-offs”. For example, further training of a model using additional data points to improve the statistical accuracy of a model may enhance fairness, but increasing the volume of personal data included in a data set to facilitate additional training will increase the privacy risk. The ICO recognises these potential trade-offs and emphasises the importance of organisations taking a risk-based approach; identifying and addressing potential trade-offs and taking into account the context and risks associated with the specific AI system to be deployed. The ICO acknowledges that it is unrealistic to adopt a “zero tolerance” approach to risk and the law does not require this, but the focus is on identifying, managing and mitigating the risks involved.

2. Fair, lawful and transparent processing

The draft guidance sets out specific recommendations and guidance on how the principles of lawfulness, fairness and transparency apply to AI.
Lawfulness. The draft guidance highlights that the development and deployment of AI systems involve processing personal data in different ways for different purposes and the ICO emphasises the importance of distinguishing each distinct processing operation involved and identifying an appropriate lawful basis for each. For example, the ICO considers that it will generally make sense to separate the development and training of AI systems from their deployment as these are distinct purposes with particular risks and different lawful bases may apply. For example, an AI system might initially be trained for a general-purpose task, but subsequently deployed in different contexts for different purposes. The draft guidance gives the example of facial-recognition systems, which can be used for a wide variety of purposes such as preventing crime, authentication, or tagging friends in a social network – each of which might require a different lawful basis.
The draft guidance also highlights the risk that AI models could begin to inadvertently infer special category data. For example, if a model learns to use particular combinations of information that reveal a special category, then the model could be processing special category data, even this is not the intention of the model. Therefore, the ICO notes that if machine learning is being used with personal data, the chances that the model could be inferring special category data to make predictions must be assessed and actively monitored – and if special category data is being inferred, an appropriate condition under Article 9 of the GDPR must be identified.
Fairness. The draft guidance promotes two key concepts in relation to fairness: statistical accuracy and addressing bias and discrimination:
Statistical accuracy. If AI is being used to infer data about individuals, the draft guidance highlights that ensuring the statistical accuracy of an AI system is one of the key considerations in relation to compliance with the fairness principle. Whilst an AI system does not need to be 100% accurate to be compliant, the ICO states that the more statistically accurate the system is, the more likely it is that the processing will be in line with the fairness principle. Additionally, the impact of an individual’s reasonable expectations need to be taken into account. For example, output data should be clearly labelled as inferences and predictions and should not claim to be factual. The statistical accuracy of a model should also be assessed on an ongoing basis.
Bias and Discrimination. The draft guidance suggests specific methods to address bias and discrimination in models, for example, using balanced training data (e.g. by adding data on underrepresented subsets of the population). The draft guidance also sets out that a system’s performance should be monitored on an ongoing basis and policies should set out variance limits for accuracy and bias above which the systems should not be used. Further, if AI is replacing existing decision-making systems, the ICO recommends that both systems could initially be run concurrently to identify variances.
Transparency. The draft guidance recognises that the ability to explain AI is one of the key challenges in ensuring compliance, but does not go into further detail on how to address the transparency principle. Instead, it cross-refers to the explAIn guidance it has produced in collaboration with the Alan Turing Institute.

3. Data minimisation and security

Security. The draft guidance highlights that using AI to process personal data can increase known security risks. For instance, the ICO notes that the large amounts of personal data often needed to train AI systems increase the potential for loss or misuse of such data. In addition, the complexity of AI systems, which often rely heavily on third-party code and/or relationships with suppliers, introduces new potential for security breaches and software vulnerabilities. The draft guidance includes information on the types of attacks to which AI systems are likely to be particularly vulnerable and the types of security measures controllers should consider implementing to guard against such attacks. For example, the security measures recommended by the ICO to protect AI systems include: subscribing to security advisories to receive alerts of vulnerabilities; assessing AI systems against external security certifications or schemes; monitoring API requests to detect suspicious activity; and regularly testing , assessing and evaluating the security of both in-house and third-party code (e.g. through penetration testing). The draft guidance also suggests that applying de-identification techniques to training data could be appropriate, depending on the likelihood and severity of the potential risk to individuals.
Data Minimisation. Whilst the ICO recognises that large amounts of data are generally required for AI, it emphasises that the data minimisation principle will still apply, and AI systems should not process more personal data than is needed for their purpose. Further, whilst models may need to retain data for training purposes, any training data that is no longer required (e.g. because it is out of date or no longer predictively useful) should be erased.
The ICO highlights a number of techniques which could be used to ensure that AI models only process personal data that is adequate, relevant and limited to what is necessary. For example, removing features from a training data set that are not relevant to the purpose. In this context, the ICO emphasises that the fact that some data may later be found to be useful for making predictions is not sufficient to justify its inclusion in a training data set. The ICO also suggests a number of additional risk mitigation techniques, such as converting personal data into less “human readable formats” and making inferences locally via a model installed on a user’s own device, rather than this being hosted on a cloud server (for example, models for predicting what news content a user might be interested in could be run locally on their smartphone).

4. The exercise of individual rights

The draft guidance also addresses the specific challenges that AI systems pose to ensuring individuals have effective mechanisms for exercising their personal data rights.
Training Data. The ICO states that converting personal data into a different format does not necessarily take the data out of scope of data protection legislation. For example, pre-processing of data (transforming the data into values between 0 and 1) may make training data much more difficult to link to a particular individual, but it will still be considered personal data if it can be used to “single out” the individual it relates to (even if it cannot be associated with an individual’s name). The ICO states that in these circumstances, there is still an obligation to respond to individual rights requests.
Access, rectification and erasure. The draft guidance confirms that requests for access, rectification or erasure of training data should not be considered unfounded or excessive simply because they may be more difficult to fulfil (for example in the context of personal data contained in a large training data set). However, the ICO does clarify that there is no obligation to collect or maintain additional personal data just to enable the identification of individuals within a training data set for the sole purpose of complying with rights requests. Therefore, the draft guidance recognises that there could be times when it is not possible to identify an individual within a training data set and therefore it would not be possible to fulfil a request.
The draft guidance highlights that, in practice, the right to rectification is more likely to be exercised in the context of AI outputs, i.e. where an inaccurate output affects the individual. However, the ICO clarifies that predictions cannot be inaccurate where they are intended as prediction scores, not statements of fact. Therefore, in these cases, as personal data is not inaccurate, the right to rectification will not apply.
Portability. The draft guidance clarifies that whilst personal data used to train a model is likely to be considered to have been “provided” by the individuals and therefore subject to the right to data portability, pre-processing methods often significantly change the data from its original form. In cases where the transformation is significant, the ICO states that the resulting data may no longer count as data “provided” by the individual and would therefore not be subject to data portability (although it will still constitute personal data and be subject to other rights). Further, the draft guidance confirms that the outputs of AI models, such as predictions and classifications about individuals would also be out of scope of the right to data portability.
Right to be informed. Individuals should be informed if their personal data is going to be used to train an AI system. However, the ICO recognises that where a data set has been stripped of personal identifiers and contact addresses, it may be impossible or involve disproportionate effort to provide the information directly to individuals. In these cases the ICO states that other appropriate measures should be taken, for example, providing public information including an explanation of where the data was obtained.
Solely automated decisions with legal or similar effect. The draft guidance sets out specific steps that should be taken to fulfil rights related to automated decision making. For example, the system requirements needed to allow meaningful human review should be taken into account from the design phase onwards and appropriate training and support should be provided to human reviewers, with the authority to override an AI system’s decision if necessary. The draft guidance also emphasises that the process for individuals to exercise these rights must be simple and user-friendly. For example, if the result of a solely automated decision is communicated via a website, the page should contain a link or clear information allowing the individual to contact staff who can intervene. In addition, the draft guidance provides explanations on the difference between solely automated and partly automated decision-making and stresses the role of active human oversight; in particular, controllers should note that if human reviewers routinely agree with an AI system’s outputs and cannot demonstrate that they have genuinely assessed them, their decisions may effectively be classed as solely automated under the GDPR.

What should organisations do now?

While the draft guidance is not yet in final form, it nevertheless providers an indication of the ICO’s current thinking and the steps it will expect organisations to take to mitigate the privacy risks AI presents.

It will therefore be important to follow the development of the draft guidance carefully. In addition, at this stage it would be prudent to review how you currently develop and deploy AI systems and how you process personal data in this context to help you prepare for when the draft guidance is finalised. Some practical steps to take at this stage include:

Reviewing existing accountability and governance frameworks around your use of AI models, including your current approach to DPIAs in this context. In particular, DPIAs for existing projects or services may need to be conducted or updated, and risk mitigation measures identified, documented and implemented;
Considering your current approach to developing, training and deploying AI models and how you will demonstrate compliance with the core data protection principles, particularly the requirements of fairness, lawfulness, transparency, and data minimisation;
Reviewing the security measures you currently employ to protect AI systems, and updating these if necessary depending on level of risk; and
Ensuring you have appropriate policies and processes for addressing data subjects’ rights in the AI context, including in relation to solely automated decision-making.

Next steps

The ICO is currently running a public consultation on the draft guidance and has specifically requested feedback from technology specialists such as data scientists and software developers, as well as DPOs, general counsel and risk managers. The consultation will be open until 1 April 2020.