With respect to (2), it’s certainly doable to constrain the model to a point, but I worry that you may have issues where the model misunderstands or misinterprets the information it accesses and gives erroneous responses.
With respect to (3), with a product like this I do not think usage limits are acceptable or appropriate as they would run counter to the point of the entire product. It would be like a video hosting LTI restricting the number of times a lecture video could be played—it just isn’t done.
As with all things… It depends.
When UCLA was on a self-hosted custom Moodle-based LMS, a lot of stuff was built in-house, and they had direct access to all the data and could share it however they wanted.
There’s a big push to move to hosted LMS solutions right now. In the case of Canvas, that means everything is on Instructure’s servers. No one at UCLA (to the best of my knowledge) has direct database access, though the product owner at UCLA could ask their Instructure contact to run queries and reports.
So, for UCLA the API would be all that is available.
Honestly, I wouldn’t be surprised if this wasn’t a product offering already discussed in summer fashion at Instructure (though to be clear, this is speculation—I have no direct or indirect knowledge of this).
I think one avenue you might consider, which might be more fruitful would be to try to work with Instructure itself to bring such a product to market as a feature of the LMS or as a first-party LTI.
That said, if I were Instructure I would try to work with OpenAI directly on such a product.
Another, approach might be to look at building this as a plugin for individual users, where they’d connect to their class, the plugin could ingest all of the files/pages accessible to the user either via API or scraping, and continue from there.
But, then you don’t have a great monetization method.
Alternately, you could build it as a separate API based product where you charge the user based on usage and market directly to students. But I understand that’s not as attractive to a startup because, for instance, landing UCLA wins all 33,000 students at once and no one wants to be chasing all those individual students.
One other interesting facet of the individual approach though, if you collect their university, course, and professor details you can aggregate information over multiple quarters/years. Giving the model much more complete and robust information than it would have from just information available to the student solely up to that point.
It would also be possible to find similar classes from other universities and incorporate their data.
But then, of course, you are taking on a much more pirate posture, and would be traveling in less reputable circles like Chegg and CourseHero.
And at that point, you may as well just become an online test bank where students upload their midterms and finals with course and professor information and your product can use other products like Mathpix for OCR and WolframAlpha to generate step-by-step solutions. Then, it could even generate plausible alternate problems based on previous problems given by the professor. I know for a fact students would pay all the money for a service like that.
¯\_(ツ)_/¯
I don’t know. There are many possible ways to implement ClassChat, CourseTutor, GPT-TA, or ProblemPirate and almost all of them have a much lower barrier to entry than getting an LTI into use.