Accident reports to unified taxonomy: A multi-class-classification problem


I’m here to brainstorm possible solutions for my labeling problem.
Core Data
I have ~4500 accident reports from paragliding incidents. Reports are unstructured text, some very elaborate over different aspects of the incident over multiple pages, some are just a few lines.
My idea
Extract semantically relevant information from the accidents into one unified taxonomy for further analyses of accident causes, etc.
My approach
I want to use topic modeling to create a unified taxonomy for all accidents, in which virtually all relevant information of each accident can be captured. The Taxonomy + one accident will then be formed into one API call. After ~4500 API calls, I should end up with all of my accidents represented by a unified taxonomy.
The taxonomy has different categories like weather, pilot experience, conditions of the surface, etc. These main categories are further subdivided, e.g., Weather → Wind → Velocity.
Current State
Right now, I am not finished with my taxonomy, but I estimate that it will roughly have 150 parameters to look out for in one accident. I worked on a similar problem a year ago, building a voice assistant with GPT. There, I used Davinci to transform spoken input into a JSON format with predefined JSON actions. This worked decently for most scenarios, but I had to do post-processing of my output because formats weren’t always right, etc.

Currently, my concerns and questions are:

  • With many more categories now (150) compared to my voice assistant (14) and a bigger text input (the voice assistant got one sentence, now a whole accident report is up to 8 pages), GPT uses different categories than those defined in the taxonomy, or hallucinates unpredictable.
  • How to effectively get structured output (here in the form of a taxonomy) from GPT?
  • Would my solution even work as intended?
  • Is this a smart way to approach my goal?
  • What are alternatives?

For any input and thoughts, I am very grateful. Thanks in advance!


Hey there and welcome to the community!

So, forgive my ignorance, but while many of us are tech/language/AI experts to certain degrees, medical taxonomy (at least I think that’s what this is?) is not one of them lol. So, first and foremost, so I can ensure I actually provide helpful information, what exactly is a “taxonomy” and what does one typically look like? If we understand that, we can at least help solve any problems when it comes to getting GPT to provide text that looks like a taxonomy. Note I said looks like, because be prepared for that. If GPT can fool lawyers into writing good papers that end up being BS, it is likely able to fool people into thinking it provided a good taxonomy report when it hasn’t. I trust in your expertise in order to distinguish that.

Semantic sorting is something that GPT can excel well in, but there’s different levels and possible approaches to this, all with varying levels of effort/trust on GPT’s end, the amount of which can also depend on your on knowledge and expertise here. I’d be more confident giving you techniques that’s more hands-off because you have the knowledge to suss out the quality of answers in your field for example. Are you asking for help in how to approach this problem, AKA having GPT sort relevant information for you into digestible categories?

I’ll admit, I’m still a tad confused as to what your goal is, and I think it’s because I don’t know precisely enough about taxonomy stuff to know what you’re trying to do with the data you have. Phrases like:

I estimate that it will roughly have 150 parameters to look out for in one accident

is throwing me off from an AI perspective, and I can’t tell if your world’s jargon has its own meaning for parameters. You obviously know about post-processing, so you’re clearly familiar with some important LLM concepts, especially if you touched davinci before. Did you mean 150 categories? Also, are you trying to fine-tune? just prompt? create a custom GPT for this?

If we can bridge gaps in understanding here, I think I have some technique ideas that will help you, or at least guide you in the right direction. The problem I foresee though is that it’ll take a good amount of fiddling to ensure accuracy, and to prevent it from making things up, or pulling things together in weird ways. Sematic sort is one thing, but to amalgamate assorted data is another step in this problem. It might be wise to structure the data first, and then focus on creating a taxonomy from the structured database. This makes it easier for both the AI to handle and for the human to gauge quality and accuracy of the results.


Here are some thoughts:

  1. you may not be able to get your whole taxonomy classified in a single shot. You may need to split the task up into a multiple Panel of Experts approach. Remember that attention is limited, and paying attention to 150 things at the same time might be asking too much. Using a PoE for each major category may work, but you may need significantly more than 4500 api calls.

  2. you can reliably get json output with the correct prompt, that’s not a major issue IMO.

  3. I think it could work - although it depends what your downstream processing will look like. Depenting on what you’re ultimately trying to do, it might not make sense to have every document analyzed that thoroughly.

In conclusion, it seems like what you’re trying to do could work to achieve certain goals - but it should be said that there are a whole host of other cool techniques that could be used in classification, comparison, and aggregation tasks - but I think we might need to know what you ultimately intend to get out of the data to help you better :slight_smile:

1 Like

First and foremost, thank you for your detailed response! I’m working with accident reports from paragliding incidents. Instead of focusing on medical data (since pilots are typically dead after accidents), accident reports document what exactly happened and the circumstances involved. This includes details like weather conditions, eyewitness accounts, and the flight trajectory. Typically different people write these accident reports with different knowledge about a flight. So you can imagine the reports vary in length, detail and structure because there is no unified way how to write such a report.

My goal is to make informations from accident reports more usable for accident research, therefor I´m trying to convert it into a different format using a taxonomy. This means I´m creating various labels with properties that can appear in accident reports. I’m tasking GPT with classifying accident reports based on these labels, aiming to generate a table without the need to train a neural network.
I thought it might be good Idea to use the JSON mode feature to output results already in a structured format.

  • main category: wind
  • sub category: intensity, direction, sudden_change
  • sub category of direction: North, East, South, West
    acident report:
    Suddenly, strong winds emerged, causing the pilot to tumble…

Now, I want to examine not only labels from the wind category but around 150 labels. Right now I am trying to determine the best approach to achieve this (fine-tune, just prompt, custom GPT, etc.). If possible I would try to avoid fine tuning because of the effort creating examples.
I hope this clarifies any uncertainties.

1 Like

Thank you very much for your response!

You may need to split the task up into a multiple Panel of Experts approach. Remember that attention is limited, and paying attention to 150 things at the same time might be asking too much.

Thats a very good point. Once I complete creating my labels, I’ll definitely give it a try! In the grand scheme of things, the money spent on API calls is almost irrelevant because I’m not developing a service; it’s more of a one-time operation. Once all 4500 accidents are structured, the task will be complete

but I think we might need to know what you ultimately intend to get out of the data to help you better

So, my ultimate goal is to transform every accident report into a unified data structure. The resulting data will be used in accident research to gain better insights into accidents and possibly detect underlying patterns that could be avoided in the future. But one step at a time. Right now, my main focus is on whether it’s possible to obtain this structured data with GPT and what the best way to do so might be.

I’m aware there are many cool classification and comparison techniques out there, but allowing GPT to handle the heavy lifting is arguably one of the easiest approaches and can be applied to a wide range of domains

1 Like

Do keep us posted! It’s certainly an interesting project, and if you do it right it can certainly validate the technology for these types of applications.

1 Like

have you herad of something similar? Because I couldnt find anything where people interacted with the AI in such a way… For the most part they generate summaries or dont force the data into a unified structure (but thats necessary i.m.o.)
I will certainly Update this post if I run my fist tests!

1 Like

Oh it’s very common - that’s basically what json mode and function calling is all about.

but even before all that was available (i don’t trust it and still do it old school :sweat_smile:) - pressing davinci responses into a schema was already a relatively mature and well established pattern.

The big challenge is to get the system to behave reliably in an economical fashion. But if the latter isn’t really a concern, it shouldn’t be an issue :slight_smile:


To add onto Diet’s point - there’s a lot of common uses for these models that isn’t initially advertised so to speak. Summary tasks are like good introductions to the true power of these models.

Once you get into the more advanced usage and understanding of how LLMs work, and what their capabilities and limitations are, a lot of it boils down to them being really good pattern generators and identifiers. I like to say they’re adept of “fuzzy” problems, where there needs to be some kind of structured response, but the problem does require some degree of variability in order for it to produce a qualitative response.

Let us know how it goes!

1 Like

So you would also say my idea is possible?
Because I was thinking, maybe the other way round would be easier. First I let the AI do NER, RE, etc. and for every accident I get like a word cloud of semantically relevant words. But what do I do with that. My main focus is to get the unstructured text of an accident into a structured format using the LLM. So my Initial Idea is the best thing r.n. that I came up with.

Dear Florian LeFlob
Thanks for sharing your exciting project…is there any way that I can follow the progress/updates on this project…like some website, or you have some other means to do so…

Hmmm… I´ll think how to manage it but this project is part of my master thesis and the accidents which I want to investigate can´t be published.