Does it work with smaller languages?
I need to create 30 short dialogs in Lithuanian for my language learning app. They should contain only (or almost only) words from the top 1000 most frequent words. Bonus if some of them can be grammar-specific (e.g. most of the nouns are in the accusative case, or most of the verbs are in the future tense).
No idea, you’d have to try it
Thanks for your reply, I’ll check it out!
Given the immense amount of papers and databases we have nowadays in the field of biology and more specifically in cancer, do you think it would be possible to build a program that predicts the best combinations of drugs or genes to target the most relevant ones to reduce tumor growth? Another idea that occurs to me would be to use all these papers as input for a program that creates its own papers. The program created for writing novels I think would be a good start for writing reviews. It would be great to know your opinion on this and to turn it into an idea for a video.
I’d love to try that if you have access to some data
The most important cancer databases can be found in the following links:
TCGA database: https://portal.gdc.cancer.gov/
New users can not share more than 2 links so DepMap and GTEX links are incomplete.
There are a lot of free full text papers in Pubmed: cancer - Search Results - PubMed. I have access also to the papers that are not free but I don´t know how to share them. Let me know if you have any biological questions. Looking forward to knowing your thougths.
You might not be allowed to shared paid papers, but we can start with open source data. My initial idea is to do this in multiple stages. First, we would build the finetuning dataset with just the papers. I might summarize them to have their core insights, specifically looking for genes, molecules, proteins, therapies, diagnosis, and results. Thus the dataset would look like a list of text documents with basically just detailed abstracts stating things like “We found X therapy to be effective on Y cancer under Z circumstances”. Then I would use that data to synthesize some kind of QA chatbot data. Basically it would be a chatbot that you can ask “What are the latest therapies for Y cancer types?” and then you could also ask it to speculate on novel approaches or research directions. Since GPT-3 already knows a lot about cancer, it might be able to speculate on things like “Well, these two peptides often have similar agonists, so perhaps you could try this other therapy”
It seems like your goal here is to fine tune a model based on… base model completions. Am I missing something?
Will this fine tuned model perform any better than the base davinci-2 model that you are using to generate the training data set?
Yes, if you watch the sequel video I explain why you can still get benefits this way.
Thanks for offering! How amazing. I’d like to see how well GPT3 can write a summary and critical analysis of a short argument. Teachers could ask students to generate these, revise them, and reflect on their revisions. Here are two sample critical analyses.
That will almost certainly work. I recently did something very similar for creative writing, and I know from my ACOG experiments that GPT-3 is phenomenal at composing critical assessments and arguments. It’s actually frustratingly good at critique. It’s almost like it was trained on Reddit data
If you can give me at least 5 or so clean samples I can do a video with prompt engineering and/or finetuning.
Ha, it’s familiar with the pleasures of ripping something apart.
That’s fantastic–thanks! I’m working on preparing the samples.
Amazing that it can understand metaphor.
Okay, thanks for offering! Here’s a CSV of five sample essay/critique pairs.
(Sorry, I looked but couldn’t find a quick way for a non-coder to convert CSV to JSON, assuming that’s preferred)
And if it’s useful, here’s a prompt:
Write a thorough summary and critical assessment of the argument. The summary should describe the key ideas of the argument, including the main claim, key reasons, counterarguments, rebuttals, and limits. The assessment should discuss the strengths and weaknesses of the argument. What was compelling, persuasive, troubling, unclear, or problematic? Choose phrases like “Eligan argues…” throughout to show the writer’s purpose at each point. Write most of the essay in your own words, but consider using the occasional direct quote where the original word choice is critical. The introductory paragraph should include the title of the argument, the author’s full name, the argument’s main claim, and your overall assessment of the argument’s validity.
Super excited to see what GPT-3 does with this.
Okay cool, I’m looking at the samples now. And you’re okay if I make a video about this publicly?
I’ve been thinking about this problem for a while, I think this may be one of the most challenging tasks anyone has sent my way but the potential benefit is enormous. The solution I come up with might not be what you imagined, but I’ll do some experimentation to figure out what’s possible and where value can be added.
For instance, I’m thinking about how researchers must go through a literature review process, but it can take upwards of an hour or more to read each paper. If I can break that task into smaller cognitive processes then I can possibly save time and mental energy for researchers. In other words, my goal might simply be cognitive offload rather than total task automation.
It occurs to me that GPT-3 can rapidly skim a paper for key insights, proposed follow-up research, open questions, and so on. This task alone might make it easier for researchers to survey hundreds or thousands of papers, rather than dozens.
Also, for the sake of free information, I might try ARXIV which had comp sci as well as bioARXIV.
I’m wondering if a GPT-3 executive summary paired with a DAVINCI sized semantic vector (Embedding) would allow for higher quality search. I think this will be my first experiment…
Update: I seem to have run afoul of copyright requirements from NIH so that’s a no-go. Even their text-mining datasets seem to come with strings attached and various licenses.
So if anyone can help me find a legitimate and completely open access bulk download for cancer papers, that would be helpful.
Yes, I’d love that. I’m good with a public video! The goal for me is not automation but a tool for helping students think through summary and assessment by seeing an insightful but imperfect model and critiquing it. The value in the writing process is in how it helps us advance our thinking, no? So I’m asking myself how AI can boost thinking rather than substituting for it…
Okay cool. Yes, GPT-3 can be exceptionally challenging. Thanks for that clarity about your goal. I will try to make this video tomorrow or later this week.
Let me check out that data repository!