Confidentiality of conversations with AI: what happens to the data we enter as prompts?

Back

Many of the issues raised by the use of Artificial Intelligence systems relate to the protection of intellectual property, in particular

the indiscriminate use of information that may be covered by intellectual property rights for the training of AI systems;
the possibility or not of granting protection to works generated by AI systems.

We have already addressed these issues in two of our previous insights: “Protecting intellectual property and developing AI systems: can they coexist?" and "Who is the author? Human or Artificial Intelligence?

The purpose of this article is to explore this issue in more detail by analysing what happens to the input data provided by the user when interacting with the algorithm, and whether and how it is used once it has been provided.

Prompting and Retrieval Augmented Generation (RAG)

So-called conversational AI models such as ChatGPT, Claude, Gemini, as well as those used to create images or music such as Midjourney and Suno, are getting us used to having a dialogue with software. The purpose of such a 'conversation' is to get a particular output that the user wants by asking questions or giving specific instructions, both of which are inputs containing data or information.

Today's AI systems can take in and process any type of input, from normal text strings provided by the user, such as when an instruction is given or a question is asked, to documents, images, audio files and spreadsheets.

Such input information, which is used to generate the output of the AI model, is commonly known as prompts (although there is no universally agreed definition of this term) [1]. Prompts, therefore, directly influence the response (output) provided by the AI system. This means that it is possible that an imprecise formulation of such an input by the user will lead the algorithm to provide an output that is equally imprecise, both objectively (e.g. containing errors) and subjectively (not responding to the user's request).

In some areas, therefore, simple prompting is not enough and more complex techniques such as Retrieval Augmented Generation (RAG) are needed to obtain more relevant answers and avoid model hallucinations. This technique provides the pre-trained model with additional information from a source outside the user, such as a database or a collection of documents. The AI system will therefore combine both the information provided by the user and the information found through the external sources in order to provide a more accurate and complete answer.

Consequently, an AI system accesses documents, data and information both found on the web or in the additional source that may be used for RAG, and provided by the user at the time the question is formulated. Similarly, these and other types of data are used upstream in the training of the model itself.

It is therefore necessary to ask what happens to this information and, in particular

Do the information and data we provide to AI models remain confidential?

It is difficult to answer this question in the abstract for all AI systems.

It may be useful to start by analysing the terms and conditions of some of the most widely used systems, complex documents that are accepted by users at the time of registration, often by quickly flagging them.

Let us take OPEN-AI (https://openai.com/it-IT/policies/eu-terms-of-use/) as an example:

“Our use of content. We can use your Content worldwide to provide, maintain, develop, and improve our Services, comply with applicable law, enforce our terms and policies and keep our Services safe”

Or Anthropic (https://www.anthropic.com/legal/consumer-terms)

“Our use of Materials. We may use Materials to provide, maintain, and improve the Services and to develop other products and services”.

It is clear that the acceptance of these conditions implies that any AI system selected may use the data provided as input by the user in order to "provide, maintain and improve" the service, whatever its nature. It is therefore essential to read these conditions in order to

choose the system and/or type of subscription that offers greater guarantees with regard to the confidentiality and protection of the information provided, or alternatively
be aware of what types of data and information should not be entered as prompts when using the system itself.

In fact, special attention must be paid to what is provided as input to the system, since personal data or confidential information may be revealed. This may be information protected by personal data legislation (such as Regulation 2016/679) if it relates to identified or identifiable natural persons, industrial or intellectual property or, more generally, information on which so-called exclusive rights can be claimed. In fact, they may contain elements protected by copyright, databases, patents or trade secrets: all of which are subject to exclusive rights protecting the right holders.

There is another aspect to consider. In many cases, and particularly in the context of a business activity, the user of an AI system may need to process personal data. Consequently, as a data controller or data processor under the GDPR, he or she will have to comply with the applicable rules in this regard, with a view to balancing the rights and interests of the parties involved, first and foremost that of privacy. Such data may, for example, belong to customers, suppliers, employees, etc.

So what happens when this type of data is made available as input to an AI system?

Again, it is important to refer to the terms and conditions. For example, OPEN-AI specifies that:

“You are responsible for Content, including ensuring that it does not violate any applicable law or these Terms. You represent and warrant that you have all rights, licences, and permissions needed to provide Input to our Services”.

As AI system providers state that they use input data to provide, maintain and train their services, the party providing personal data as input to the system could therefore be in breach of applicable data protection law.

In addition, it is likely (and indeed often the case) that the user is also subject to contractual constraints, such as being appointed as an external data processor under Article 28 of the GDPR, signing a non-disclosure agreement or being subject to deontological confidentiality obligations. In all these circumstances, therefore, one would be in breach not only of the applicable data protection legislation, but also of such agreements.

Finally, in the specific area of healthcare, the issues we have just identified are further amplified. In fact, the following scenarios could arise.

On the one hand, a healthcare professional might use an AI system as an aid to diagnosis, to support bureaucratic activities in routine clinical practice, or to manage clinical trials.

On the other hand, medical device manufacturers might want to integrate such systems into a software as a medical device (SAMD) as a central element in achieving its medical purpose.

Leaving aside the enormous legal issues raised by the latter scenario (in terms of regulation, user and manufacturer liability, and the processing of personal data), even the mere use of such systems by healthcare professionals in the course of normal clinical practice raises several questions.

In particular, careful consideration must be given to ensuring that the prompt to be entered into the system does not contain any personal data about the patient (and thus data that could in any way identify the natural person being treated). This would not only violate privacy regulations, but also those requiring professional or medical confidentiality.

In addition, data that are protected in some way for the benefit of the healthcare institution where the doctor is carrying out their activities, for example in relation to scientific research, may not be included.

What to do?

The first step is to carefully assess the terms and conditions of use of AI systems and to choose the system that offers the greatest guarantees in terms of use and dissemination of data, including the choice of paid subscriptions where appropriate.

In the workplace, this choice should be made by the employer or its appointees, with precise instructions to employees and collaborators on which systems may or may not be used to carry out work activities.

These instructions should also include guidance on what documents, information, files, etc. may be used in the input phase and what precautions should be taken.

These are not simple assessments, but judgements that need to be made carefully when using an AI system, to avoid running the risk of breaching contractual or other confidentiality obligations.

To this end, it may be helpful to look at Article 53 of the AI ACT, which provides for:

The adoption of company policies and codes of conduct that provide for the protection of intellectual property and, in particular, ensure that the reservations expressed by right holders are identified and respected in an appropriate manner, for example by means of tools that allow automated reading in the case of content made publicly available online (Art. 3(4) Dir. EU 2019/790).
The creation and publication of a summary document of the content used to train algorithms.

[1] S. Schulhoff et al., The Prompt Report: A Systematic Survey of Prompting Techniques

23/09/2024

Protecting intellectual property and developing AI systems: can they coexist?

Eleonora Lenzi

If data is critical to the training and development of AI systems, how are the owners of intellectual property rights protected? Can intellectual property rights be reconciled with the 'data voracity' of AI systems?

Artificial Intelligence Intellectual and Industrial Property

15/10/2024

Who is the author? human or artificial intelligence?

Eleonora Lenzi

Many digital works, but also many inventions, are now created using artificial intelligence. If these creations have the characteristics typical of a work of the intellect, such as creativity, novelty and so on, who should be granted the intellectual property rights? This is by no means a trivial economic question.

Artificial Intelligence Intellectual and Industrial Property