Batch Processing in Chat GPT

I am not sure if this is the correct place to post as this is my first one. If I am in the wrong forum, please tell me and I will repost there. I am using the $20 a month version of chat GPT and I use ChatGPT 4o as that is what comes up.

I am a Youth Services Librarian in a TINY house sized library with a staff of four. We are looking to do what is called in our field a diversity audit. That is where we go through every book and look at the author, main characters, and plot to see what type of diversity the book offers. That way it can inform our book choices. Diversity to us is, " Was the book written by someone LGBTQ+, black, Latinix, Indigenous, disabled etc. What is their socioeconomic status. "The same goes for the main characters. in the books.

Our YA collection has 1026 fiction and graphic novel books in total we want to assess. I could go in manually and use library authority sources like the library of congress, amazon, and library book vendors to access this information but would take me around 10 days of non stop research.

I am trying to send chat GPT the excel spreadsheet of books we have by title and author and have it do the leg work of pulling the information from the internet through websites I specify and then to put the information into the appropriate columns of the excel sheet so I can interpret the data as a human.

My current prompt has been ," Please research and tell me the Author’s Ethnicity & Cultural Representation, Author’s Socioeconomic Status, Author’s Gender Representation, Author’s LGBTQ+ Identity, and Author’s Disability. I also would like the Main Character’s, Main Character’s Ethnicity & Cultural Representation, Main’s Character Socioeconomic Status, Main’s Character Gender Representation, Main’s Character LGBTQ+ Identity, Main’s Character Disability. Please cite your sources and do not use Wikipedia or goodreads. When complete put the information into the correct columns of the excel spreadsheet and make it downloadable by me."

It will run a few books to start off with and do great, sometimes. I have to remind it which sites I don’t want it to use. Then I say something like, “Looks good, please do the next 100 books.” And it will only do 5 again and perhaps the same five books.

The ultimate goal for me is to write a prompt that any library in our system could use where ChatGPT will pull the information we need and put it into an excel sheet we give them so we can run our own collection assessments. The largest library in our system for comparison has 4,000+ YA books. Thoughts?
Jennifer

2 Likes

Welcome, Jennifer. What I would do is go into ChatGPT’s personalization settings and give the instructions there. From that point, I would start a new chat for each book. At the beginning of each chat, I would tell it to follow the guidelines of its personalized configuration, and then I would provide the book from which to extract the information. I would repeat this process, starting a new chat for each book. This approach is necessary because keeping everything in a single conversation can cause the system to lose certain capabilities that you always need to maintain.

1 Like

Considering the extensive number of books, 400, this approach I’ve given you is manual and works for a few. What you need is a program that integrates the API and automates the process, handling each book one at a time, extracting the information, and creating the files. This requires programming knowledge and cannot be done from the ChatGPT Plus application.

2 Likes

Hi David,
Thanks for your fast response. I am not a programmer by any means but am always willing to give it a go. In my personal life I use Linux Mint as my primary OS so I am familiar with that world a bit. However, I have a click by click guide I wrote to set up each new computer I have so I don’t mess it up when I partition the drive.

Would chat GPT be able code a program it could execute that would integrate that API and go book by book for me?

In the long run, I would like to have something I could share to other librarians and they could run with no knowledge whatsoever of coding or ChatGPT. If this is not feasible, that is okay. I can look for other solutions too.

1 Like

Hahaha. Yes, my friend, you can ask ChatGPT to integrate the API and run the code in the cloud, like Google Colab, etc. Sorry, I hadn’t thought about that. You can upload the books there and get the results. I don’t know the exact cost, but the API has a separate fee from ChatGPT Plus, although it’s usually not very expensive.

  1. Choose a cloud service to run the code and learn minimally how to execute the code and upload files.

  2. Create a ChatGPT API.

  3. Write the Python code using GPT to process book reading and extract the described conclusions.

  4. Prepare the cloud environment, upload the files, and test the code by running it there.

  5. Check the results with a book you already know and conduct tests with 2 or 3 others that you are familiar with.

  6. Review the code and improve it if necessary. If it contains errors, the cloud console will indicate them; pass the errors to GPT, which will help you correct the code, and then test it again.

Thanks for this. My first step is checking out as many books on Python as I can so I can become familiar with how to write my code, even though I want GPT to do it for me in the end.

I may end up escelating my project to the IT level of our library system as they may have an API key I can use for OCLC. OCLC is the Online Computer Library Center and is a not for profit that provides libraries around the world all types of information, metadata and access to resources free. WorldCat which is their catalog is part of this.

If the IT department of the library system has that API key I may not need to purchase it. They may also want to be included in the discussions as they will want to know how their key is being used.

However, ChatGPT generated a basic code for me and at a lay level, it looks simple enough.

1 Like

If you have a library network, you could talk to a university or training center that develops software and propose they create specific software for you. It would benefit them by helping them learn about this subject, and it would help you classify your books.

So we just had our staff meeting and I am not sure I am going to be able to escalate this to our library system IT people.

The question that came up in general was the ethics of having ChatGPT scrape sites for information. Do the sites we select want their materials on ChatGPT? I am of the open mind that if it is publicly available online, then anyone can use it as they see fit as long as credit is given in the proper way.

The other topic that came up was the subject heading of books being potentially helpful. As a librarian, when we catalog books, we give each book a subject heading so we know what the book is about. This can be helpful when trying to decide where to place a book in the library. Our system is odd in that, in years past, they got rid of all the subject headings so it is every library for itself as to where the books go. I mean I always have the final say for my books in my collection but the heading is a good general guideline to follow.

I think I am going to have the AI write me python code that will scrape specific sites for each title and then notate what information I need. Where it can’t locate information it will notate that and I could do a deeper dive if necessary. I would be curious to see what type of information it can get from websites.

From there, I will see if we need to purchase an API key for us and that would solve the issues of a lack of data. In the end, I may have to go manual and go book by book for the collection doing my own research for each title and author.

It is quite strange because supermarkets follow a strategic organization pattern so that users adhere to a purchasing process: product placement, aisle length, etc.

In a library, something similar should happen when it comes to information. There should be a standard classification so that all libraries can organize themselves logically. This way, users could know exactly which area to search for the books they need. Perhaps proposing this as a first step—creating a standard organizational protocol at the library level—would be a good idea.

I am surprised that each librarian has to resort to reading the book and drawing their own conclusions to be able to organize it.

So I may have provided slightly incorrect impression due to the fact that I am an insider and I don’t think about things the same way as others any more in a library.

Libraries catalogue books for their system using two methods. The Library of Congress Cataloguing System and the Dewey Decimal Cataloguing Systems. There are others out there like the Brian Deer Method, but these are the big two.
Library of Congress is usually used more in academic/corporate settings and Dewey is your standard public library.

Both have manuals that dictate how the numbers on the sides of the books are to be used for cataloguing purposes. We also can explore something called a MARC record which comes with the book for more information.

However, as a librarian, I also have the ability to use my disgression as to where a book goes because I know my community. I want books to be found easily for people so while I may think a book is a graphic novel, our library may put in the series section to make it easier for a patron to find.

Libraries have vendors that we use to purchase books. Unfortunately, we purchase books sight unseen most times. I have to purchase based on patrons wants/needs, industry reviews, and what I feel I need for my collection based on the data I can get from my system or systems outside mine who have purchase the material and then cataloged it.

At my library, we us a vendor called Ingram. They provide me with a dewey call number for the non-fiction books and genres for the fiction books. They usually get the dewey wrong for non-fiction so I have to look at the subjects, and other libraries who have cataloged the book to determine what I should use as the call number. I can also use the large and dense book the dewey numbers to decide what call number I should use.

Fiction just follows our libraries standard for YA of " YA Doe," the first three letters of the authors last name.

Subject headings are useful and our vendor Ingram can get me those, but going manual for 1026 books meant that I could only process 82 a day if that was all I was doing. I didn’t want to do that and rely only on one source. It would be MONTHS if I cross referenced it anywhere else.

Most of where we get our reviews from, School Library Journal, Kirkus Reviews, Publishers Weekly, etc and they have their material online and in print form. The online version is the same as the print, just not with everything so neatly bound.

My thinking was, that if, that material was freely available, some of it would provide diversity data about the books and authors. For example, a VERY diverse book that literally ticks all the boxes is called Brooms by by Jasmine Walls, illustrated by Teo DuVall.

The review from Kirkus Reviews is thus," ix witches get caught up in the excitement and danger of illegal broom racing in an alternate historical Mississippi.

In a world where children with magic are taken to the colonizing government’s schools, sisters Mattie and Emma are drawn to broom racing’s potential to help support their family and save themselves. Luella, whose magic was bound by the government after an incident in the residential school she was forced to attend, is their teacher and guide. Their team, the Night Storms, includes Billie Mae and Loretta, who read Black, and Cheng Kwan, who is transgender. They fight discrimination and overcome sabotage to win their races and their freedom. The author’s note describes the cast’s identities: Mattie and Emma are Choctaw and Black, Luella is Mexican and Choctaw, Chinese American Cheng Kwan speaks Cantonese, and Loretta, who uses a leg brace, has had a stroke. Emma, who is deaf, uses sign language and some lip-reading. This range in representation accurately highlights the broadly diverse experiences of folks in the South. There are also loving and intentionally presented queer relationships and experiences, highlighting the fact that there has always been and will always be room for queer folks in our communities. DuVall’s beautiful art is clean and luminous but so understated that it’s hard at times to follow the setting and characters through the panels. A scrapbook-style epilogue showing the witches’ futures is heartwarming and uplifting, however.

A fast-paced race to a satisfying, winning end. (Graphic fantasy. 13-18)"

The review tells me that the main characters are LGBTQ+, disabled, and also black, asian, and indigenous.

Since this is publicly out there on the web, I was hoping the program I am currently having ChatGPT write will scrape these sites, pull out the relevant information from the reviews and then put it into the correct columns of my spreadsheet.

But, there may be limitations on the end of ChatGPT. I hope this clears things up a bit more and explains my methodology and library standards.

1 Like

Another issues has cropped up. The script is fine that chatGPT pumps out for me, but I am viewed as a bot and this is bothersome as yes, technically I am a bot. But I have honorable intentions. Am I wrong to think this is the end of the road for me with this query?

1 Like

Of course, the website does not allow extracting information using verification systems.
I can’t really give you much more advice, my friend.
I imagine you can find a lot of information directly in GPT itself, as its training includes many books, but it might not have the most recent updates.

Perhaps it can implement features I’m unaware of. Here, other users might be able to tell you if GPT, through the API, has Internet access and can locate information from books, allowing it to extract details for filling out forms.

It is okay and thank you so much for all the advice. I was initially discouraged by the project I was given as I would have to go manual for everything. Literally going book by book, and locating the information. Even if it can’t web scrape ethically, I am seeing time saving tasks it can do to speed this process along for us. I can ask it about a specific book and it can provide me with information I can use. I could wish it could do more, but I totally understand the limitations and the legal necessity of them.

1 Like