Generative Artificial Intelligence, Print Collections, and Legal Information Preservation

Wonder what those three things have to do with each other? More than you might think at first.

As many of us have recently experienced, I was asked to add some thoughts about the latest developments in artificial intelligence to a state of the library presentation I gave earlier this summer. While trying to give a nuanced answer in a short amount of time, I fielded questions about why I focused so much on collection preservation when some of the audience members’ law firms had emptied their libraries of print and others were setting up working groups on how to best use generative AI. I did my best to explain that an academic law library’s collecting role is different from a firm library’s, that firm librarians frequently count on us to supply access to things they can’t collect and which don’t appear in their subscription databases, and that the newly-available Large Language Model products show some interesting potential, but it’s not yet clear how revolutionary that might or might not be for the practice of law.

I suspect the questioners might not have been fully satisfied with my answers then, but within a matter of days, news broke of Mata v. Avianca, Inc., No. 22-CV-1461 (PKC), 2023 WL 4114965 (S.D.N.Y. June 22, 2023), which I hope provided a bit more context for consideration. The first but already not last “ChatGPT lawyer” case, Mata highlights many concerns about generative AI in its current state of development: incompleteness, untimeliness and unreliability in a given product’s training data; the tendency of LLMs to “hallucinate” information that doesn’t exist; and our own ability to suspend skepticism if the language outputs appear coherent and are presented confidently enough. Claiming to be unaware of those dangers, the attorneys in Mata swiftly had to face significant sanctions in addition to public shame, and firms and judges are now rushing to set parameters for the use of generative AI.

While Mata was playing out, my library was contacted by an area firm needing to copy from one of our print reporters an older Pennsylvania case they couldn’t get from the Big 3, exactly the kind of regular occurrence I’d pointed to in my talk. Both of these examples demonstrate a fundamental reality of law: it’s grounded in sources, and not all of those sources are going to be available online. How do you prove a case, statute, or treaty provision cited in a court filing both exists and says what the advocate says it does? Conversely, how do you prove it doesn’t exist and the advocate isn’t practicing law competently? What do you do when a case that is still good law somehow didn’t get digitized and uploaded to a place you can access? You go back to a reliable source, and at least for the foreseeable future, that relies on the continued availability of legal information in physical formats, held and preserved by libraries and archives.

This is why I think collection preservation is just as important as staying on top of technological developments, and why LIPA is a great resource for those of us whose work is balancing the two. As even ChatGPT can acknowledge, LIPA provides a critical forum for discussing the importance of legal information preservation, resources for those with collections needing preservation, and advocacy, which is good both for humans practicing law and for building better data sets for machine learning.

If you’re interested in sharing your thoughts about legal information preservation in the age of generative AI, please bring them to our upcoming member meeting on July 26, 2023 at 2:00 p.m. (Eastern). Click here to register (all are welcome) and if you’re not already a member, please consider joining!

Generative Artificial Intelligence, Print Collections, and Legal Information Preservation

Recent Posts