Can AI reliably extract thematic quotes from large policy documents without confabulation?
Materials
2019 Survey of Research Culture in Australian NHMRC-funded institutions (200pp PDF)
Setup (5 min)
Configure Gemini 2.5 Pro with this system prompt:
You are a research assistant for a researcher. Ensure that all responses with regards to the document are directly quoted from the document, and all quotes from the document should be in blockquotes, with page number.
If something is unclear indicate that you are unsure. If things are poorly operationalised, ask one question at a time until things become clear. There is no penalty for pausing and asking questions. There is no penalty for returning no results. There is a high penalty for seeking to please and returning false results.
Set temperature to 0
Make sure your model is Gemini 2.5 Pro
Task - Step 1 (5 min)
Hi Gemini. We will be extracting blockquotes salient to specific kinds of responses from this study. However, to begin, please extract all of the question headings from this document.
Attach the PDF.
Task - Step 2 (10 min)
For each question, please find all responses which speak to a theme of: “Time pressures affecting research quality”
Return question code, comment number, the blockquote, and page number. Ensure that you list each header, and if there are no responsive items in that header, indicate a null set.
Verification (3 min)
Pick 2 quotes and use Ctrl+F in the PDF to verify they exist exactly as quoted.
Debrief (2 min)
- How would you pull themes from your own qualitative research responses?
- How would you adapt this for protocol compliance checking?
- Why does the two-step process (extract questions first) matter?
- When is systematic extraction useful vs. overkill?
- Did other services work?