Possible semantic coding and classification for stylometry research?

Greetings, everyone. So excited to be a part of the GPT-3 beta now that it’s finally opened up. I’ve been waiting for a year!

After diving in and playing around, I’m blown away by some of the outputs generated. My work is within linguistic analysis, specifically stylometry (analyzing authorship in ancient texts).

I was curious to get a professional opinion as to whether GPT-3 semantic encoding and classification could be utilized alongside my other methods for super-powering the ability to determine authorship of a given text based on certain inputs. I realize it’s a bit outside the normal use case for generation, but the possibility is very intriguing.

Thanks!

1 Like

Haha, thanks for the welcome!

So glad to hear it is relevant! By ancient, I primarily mean 0 - 1000. But I also specialize in 1000-1700.

The issue here, of course, is original language. Most of them are Latin and Greek. So determining authorship based upon translations is of course rather dubious. But to start, I could begin with the English works I have in mind.

Any resources or docs you would recommend I dive into to see how I might utilize OpenAI tools for this?

1 Like

I come from a research background and so the market for this is a bit unknown, to be honest. My focus is primarily in early Christian literature and manuscripts. Biblical studies has been fraught with controversy surrounding authorship of early 1st, 2nd, and 3rd century letters. Until now, it’s been primarily a statistical analysis. I’m hoping to bring some AI into the mix to allow for some greater clarity. The ramifications, of course, are massive across the scholarly and religious space.

1 Like