Subject Guides: Artificial Intelligence and Libraries: AI and Privacy

This content is drawn from a report authored by the AU Library's Artificial Intelligence Exploratory Working Group. You can read the groups full report covering the current state of AI and making recommendations to library leadership in the American University Research Archive.

AI and privacy

When engaging with an AI platform, a user could (and arguably should) ask the following questions – some of which are the same as for other Internet-based tools:

What organizational entity is operating this AI? As noted in the Overview of the Current State of AI, some platforms provide little to no information about the company behind it.
What motivates the organizational entity to offer this AI? Is it to make profits by directly charging for its use, or by gathering, analyzing and then repurposing or reselling (even anonymized) information about its users, for experimentation and development, or …?
How much information about myself (even if just email address) do I have to provide to that entity to even become informed about the AI’s functionality, pricing, and offerings? Some platforms will not even display cost or operational details without prior registration, also as noted in the Overview.
How much (more) information about myself do I have to provide to actually use the AI?
If the AI interface is a chatbot, will it (attempt to) emulate “human” identity, and will that make me as its user inclined to reveal more about myself than necessary and/or than I would to a more “traditional” query system (like an Internet search engine)?
What are the privacy and related policies, and which country’s applicable laws, that will govern the handling of the information about myself that I chose or am required to provide?
Will the queries and/or data that I submit to the AI to generate results be stored beyond the time required for that generation? If so:
- Will those submissions be connected to my login (if any) over time, including over different authenticated sessions?
- Will the submissions (“questions”) be tied to the generated results (“answers”) and to my identity over time?
- Will I be able to delete any or all of the above stored data if I want to; and if so, how? Or will I be able to select a period after which it is automatically deleted?

Considerations of privacy are strongly connected with information security. Two presentations of that relationship (emphases added):

“AI security involves implementing security measures, audits, and controls that prevent unauthorized access and malicious manipulation of systems. On the other hand, privacy in AI focuses on safeguarding personal data, including obtaining proper consent and preventing unauthorized disclosures.” (Villegas-Ch & García-Ortiz, 2023)
“Privacy and security are two intertwined concepts. Generally speaking, privacy refers to people’s personal or sensitive information and their rights to prevent the disclosure of such information. Security refers to how such information is protected (Rao et al., 2023).”

Potential violations of personal privacy may already occur in the development of AI tools, with varying degrees of threat based on the data sources used: “The data used to train the AI models may contain personal and sensitive information, such names, addresses, and medical histories (Villegas-Ch & García-Ortiz, 2023).” “Despite the advances made in AI security and privacy, there are still gaps and open areas of research that require attention. For example, as the adoption of pre-trained models increases, it is essential to address the sensitivity of the data used in training and how to protect sensitive information during use (Villegas-Ch & García-Ortiz, 2023).”

Following are examples of AI technology-related privacy concerns from several rather different domains: data about students; geospatial information systems; and consumer finance.

Related to the privacy of student data in the AI space, the following definition may be considered: “Privacy is commonly viewed as the right of individuals to maintain a personal space free from interference or invasion by other individuals or entities … Data privacy refers to the claims of individuals that information about them should not be accessible to other individuals and organizations and that when data is in the possession of a third party, the individual must have the right to control the data and its use (Huang, 2023, p. 2581).” The increasing reliance of educational institutions on external providers for various academic and administrative functions may lead to “… serious issues with protecting student privacy because vast amounts of educational data are . . . controlled and in the possession of third-party institutions, with schools, teachers, and students acting as simply passive data suppliers.” An important first step for Institutions of Higher Education may be to raise awareness of the issues among its student body – to “… assist students in comprehending the fundamentals and features of big data and AI technologies (Huang, 2023, p. 2585).”
Some knowledge and technological application domains have almost inherent privacy concerns. One example are geographic information systems (GIS), which typically store location information at various possible levels of granularity about the entities of concern, which may and often do include persons: “In geospatial domains, privacy and security often concern sensitive geospatial information such as home location, workspace, Points-of-Interest (POI) preferences, daily trajectories, and inferences based on such information. In the lifecycle of building and utilizing GeoAI foundation models, we identify a series of potential privacy and security risks that exist around the pre-training and fine-tuning stages with geospatial data, centralized serving and tooling, prompting-based interaction, and feedback mechanisms (Rao et al., 2023).” Here again, the very development of AI-based systems already involves human privacy risks.
The U.S. federal government’s Consumer Financial Protection Bureau (CFPB) has explored “… how the introduction of advanced technologies, often marketed as ‘artificial intelligence,’ in financial markets may impact the customer service experience” with the increasing deployment of chatbots by financial institutions. Chatbots can “… raise certain privacy and security risks” (here again the connection), but financial institutions favor them because “… chatbots deliver $8 billion per annum in cost savings, approximately $0.70 saved per customer interaction.” Related to the above questions an AI user might ask about what happens to their “questions” and their “answers,” the CFPB notes that as chatbot adoption has grown in the financial industry, “… some institutions, such as Capital One, have built their own chatbot technologies by training algorithms on real customer conversations and chat logs.” Furthermore, in a well-known problem of generative AI that takes on increased significance when it can greatly affect consumers’ financial health: “For conversational, generative chatbots trained on LLMs, the underlying statistical methods are not well-positioned to distinguish between factually correct and incorrect data. As a result, these chatbots may rely on datasets that include instances of misinformation or disinformation that are then repeated in the content they generate.” Lastly, the design of chatbots to appear human may lead financial customers to surrender private information too easily: “Conversational agents often present as ‘human-like,’ possibly leading users to overestimate their capabilities and share more information than they would in a simple web-form. It is then quite dangerous when users share personal information with these impersonation chatbots. In recent years, there has been an increase in scams targeting users of common messenger platforms to get their personal or payment information to then trick them into paying false fees…” (Consumer Financial Protection Bureau, 2023)

Some scholars appear resigned to loss of privacy as being inevitable: “In the age of algorithms, issues such as the leakage of private information, asymmetric power of knowledge, covert operations, and algorithmic infringement are inevitable, …” (Huang, 2023, p. 2580) The question perhaps becomes, if some degree of privacy loss is inevitable, how can a user of information systems (including AI) best mitigate these issues? That could be addressed in informational materials and events offered by the AU Library (see What role(s) does the library have? below).