Technical Founding Team Member for Gen-AI in higher education startup

Hi, I’m working on a startup that is like ChatGPT but for students, embedded specifically with their class data. This way they only get answers specific to their class material. A good example for this is a freshman taking a Biology class asking the chatbot a question on cells expects a pretty different answer to a PHD student asking the same question. To get the data, I use the api’s provided by the Learning Management System Universities use like Canvas. This is not perfect since the api’s don’t give access to all of a students class data and so you’d need to work with Universities to ask Canvas or other LMS providers to provide all of the necessary data.

The sell is to give Universities management and monitoring features like the ability for Professors to monitor student queries, as well as limiting certain questions that may be on homeworks to stop cheating. These are all just what I think the most useful product would look like but I’m sure my conversation with the universities should tell me a lot more about what they’re looking for.

With all that being said, I have a pretty rudimentary prototype that I will be showing them to help them understand what the product would look like if they would like to move forward with a pilot program. And so I’m looking for someone who’s interested in this idea to help on the technical side. If all goes well, there will need to be significant changes made and features added to the current product. The current tech stack is Nextjs on the frontend, Python backend or Typescript, gpt-3.5-turbo, pinecone vector db. I’d like to use prisma and planetscale for db but those are not a must.

Let me know if this seems interesting to you. And I’m currently in SF.

7 Likes

For those not familiar with LMS

1 Like

Thanks for adding context, I didn’t know the meaning of LMS either not too long ago.

I, for the last two years, did work as part of the LMS transition team at UCLA where we moved from a custom Moodle-based LMS to a Canvas instance hosted by Instructure.

Of course, UCLA is not anywhere near a typical LMS user—even among universities—so take this with a huge grain of salt.

Getting an external tool (LTI) approved for use is, from what I’ve seen, a pretty intense often months-long process even for established LTIs. Big players like Box, Gradescope, and Piazza have the resources, experience, and trust to navigate this process quickly. But even TurnItIn was forced to cobble together a solution to remove AI detection from their service for UCLA and I believe some others who weren’t comfortable with it rolling out campus-wide without further evaluation.

There are some big challenges I see for this type of LTI.

  1. Data privacy. You’ll need to be rock-solid in this arena to even be considered. UCLA has a Third Party Risk Management (TPRM) review process which (as I understand it) is very intense.
  2. Liability for being wrong. This is a very different LTI than anything I’ve seen before. Your product would be essentially inserting itself into the education chain of responsibility alongside professors and TAs. What do you imagine happens when your AI gives a student information which is less than entirely correct, and the student relies on that information on a final exam?
  3. Scale and scope. Using UCLA again as an example,
    • Just the College of Letters and Sciences has about 33,000 undergraduate students
    • There’s on the order of something like 3,000 courses offered every quarter
    • Many courses will have hundreds of pages of course materials
      I would guesstimate you would burn through a minimum of hundreds of millions of tokens every day, and up to some tens of billions of tokens (honestly, I could even see this reaching into the hundreds of billions of tokens) during finals week.

So, as an idea in the abstract I think it’s great. As an actual product from a startup without massive Angel investor funding? I don’t see it happening any time soon.

Issue (1) is solvable, it’s just tedious.

Issue (2) is trickier. I don’t think GPT-4 is there yet in terms of being able to fill the role your proposing to the level necessary for extremely risk adverse universities to approve its use.

Issue (3) is why I don’t think you have much likelihood of success as a startup. If something like this was in place and worked perfectly students would be using it non-stop, that’s why it’s an amazing idea—it’s also why it’s a terrible idea.

The use would almost immediately spiral out of control and no university would sign a contract without some type of predictable pricing structure by class/student/etc.

For just undergraduates at one UCLA college I don’t think it’s unimaginable that such an LTI could burn through 500-billion tokens a year. That’s $20M–$30M each year in usage at current rates.

Even at the low-end of $20M, that’s about 1.3% of UCLA’s General Fund expenditures. That is a budget item that’s never being approved.

Now, all that said, I can say with near certainly that this will happen eventually, because this is the perfect use-case for the technology. I can also say with absolute confidence that you aren’t the only or even first person to have this idea—I’ve had conversations with colleagues about this exact idea as far back as December. All of us thought it was an obvious progression even then.

So, I would bet very real money than you’re not the only person working on it right now.

I don’t know how it will all shake out in the end, but I think we’re probably 3+ years away from the models being good enough to trust at the university level for this type of task and 5+ years away for the cost of compute and model efficiency to reach a point where it starts to make sense from a financial standpoint. I suspect things like what you’re proposing will be near universal in 10-years.

Again, UCLA is atypical in almost every regard, but students are the same everywhere and will absolutely use a product like this to death if it is any good so I can’t see it coming close to breaking even at any price a university would find palatable.

7 Likes

Thanks for your thorough response. My thoughts on you’re comments.

Issue (1): I’m willing to go through the process.

Issue(2): By first pulling all of the student’s class data and placing it in a vector db. The model only gives responses that it pulls directly from the class material and makes sure to always cite its sources. So by limiting the model to only answering education questions related to a student’s class, I think this can be solved. At worst, the model may cite an incorrect answer to a problem a student asks about, but that seems like the same problem Universities have with Teaching Assistants.

Issue(3): The plan is to conduct a pilot which should be very informative on how students will actually use the product and gives us a sense of how to price it. If universities are looking for some sort of predictable pricing structure, we can always limit students to a certain level of token usage.

Do you think the solutions I gave are enough?

And since you’ve worked at UCLA on implementing Canvas, do Universities partner with companies to give them access to data directly from LMS providers that is not available via API?

Thanks again for your feedback!

1 Like

Why do you need them (the universities and the professors)?
Why do you need the piece of art LMS systems?

I’m working on using AI to build a “learning companion”…

The Universities are the most suitable buyer. People, especially students rarely buy education software on their own and simply use whatever their school or university has decided to adopt.

I don’t disagree. I am trying to change this and go directly to the consumer i.e. the learner first. AI with chat turns the existing model of online learning on its head.
BEFORE: the content had to be created first
NOW: genAI can generate the content on the fly
BEFORE: the student pays to get their questions answered (to submit an assignment that gets looked at and graded, to attend a live class on zoom, to participate in a “cohort”
NOW: genAI can answer the student’s questions and follow-up questions as long as it’s needed

So again, I’m asking, why do you need to integrate with LMS? Why do you need to sell to universities?

With respect to (2), it’s certainly doable to constrain the model to a point, but I worry that you may have issues where the model misunderstands or misinterprets the information it accesses and gives erroneous responses.

With respect to (3), with a product like this I do not think usage limits are acceptable or appropriate as they would run counter to the point of the entire product. It would be like a video hosting LTI restricting the number of times a lecture video could be played—it just isn’t done.

As with all things… It depends.

When UCLA was on a self-hosted custom Moodle-based LMS, a lot of stuff was built in-house, and they had direct access to all the data and could share it however they wanted.

There’s a big push to move to hosted LMS solutions right now. In the case of Canvas, that means everything is on Instructure’s servers. No one at UCLA (to the best of my knowledge) has direct database access, though the product owner at UCLA could ask their Instructure contact to run queries and reports.

So, for UCLA the API would be all that is available.

Honestly, I wouldn’t be surprised if this wasn’t a product offering already discussed in summer fashion at Instructure (though to be clear, this is speculation—I have no direct or indirect knowledge of this).

I think one avenue you might consider, which might be more fruitful would be to try to work with Instructure itself to bring such a product to market as a feature of the LMS or as a first-party LTI.

That said, if I were Instructure I would try to work with OpenAI directly on such a product.

Another, approach might be to look at building this as a plugin for individual users, where they’d connect to their class, the plugin could ingest all of the files/pages accessible to the user either via API or scraping, and continue from there.

But, then you don’t have a great monetization method.

Alternately, you could build it as a separate API based product where you charge the user based on usage and market directly to students. But I understand that’s not as attractive to a startup because, for instance, landing UCLA wins all 33,000 students at once and no one wants to be chasing all those individual students.

One other interesting facet of the individual approach though, if you collect their university, course, and professor details you can aggregate information over multiple quarters/years. Giving the model much more complete and robust information than it would have from just information available to the student solely up to that point.

It would also be possible to find similar classes from other universities and incorporate their data.

But then, of course, you are taking on a much more pirate posture, and would be traveling in less reputable circles like Chegg and CourseHero.

And at that point, you may as well just become an online test bank where students upload their midterms and finals with course and professor information and your product can use other products like Mathpix for OCR and WolframAlpha to generate step-by-step solutions. Then, it could even generate plausible alternate problems based on previous problems given by the professor. I know for a fact students would pay all the money for a service like that.

¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

I don’t know. There are many possible ways to implement ClassChat, CourseTutor, GPT-TA, or ProblemPirate and almost all of them have a much lower barrier to entry than getting an LTI into use.

2 Likes

As you’ve said there are tons of problems that I don’t have answers to. It all really boils down to if the Universities are looking for a ChatGPT alternative they can give to their own students that they can monitor themselves, if they can then give me access to all the necessary data, and how much they’re willing to pay. Those will decide if there really is a business to be made here.

Thanks again for the feedback and clarifying questions.

1 Like

No problem!

I do wish you luck, I think it’s a perfect use case for the technology once it matures a bit more and inference costs come down be l by at least a few orders of magnitude.

If you do manage to put something together and get your LTI up and running in a live university canvas instance, do come back and let us know. I’d definitely be interested to hear the tale. I could probably even scratch together a few coins and send you a (cheap) bottle of bubbly to help you celebrate.

3 Likes

Yes, the same concept i am working on for SAT students to prepare for SAT exam. Currently i am using GPT4 but fine tuning data on davinci model, or another one would highly be helpful. currently ai for explanation of mcqs attempted by student.
If you have a bit information how can we improve it to production because students are already looking for such a resource,
your insights will highly be appreciated.
THanks

2 Likes

I’m a bit late to the party
But just wanted to back elmstedt’s experience, I work a large university, and my experience with this much the same.

Good luck from me as well, as previously mentioned, there’s some big players in this space.

I’ll suggest that you find a single problem to solve within the education space and focus on solving that well.

4 Likes

Thanks for the advice, that’s what I’ll try and do.

1 Like

:brain::mortar_board::bulb:Potential Collaboration on AI in Education

I hope this message finds you well. I am writing to you as a seasoned professional in the fields of Game Development and Machine Learning, with a particular interest in the application of AI in education.

Your recent post caught my attention, as your startup’s focus on creating a ChatGPT tailored for students aligns closely with my own work. My team and I have been developing a prototype that leverages AI to analyze and highlight subtitles for students, which has resulted in a tenfold improvement in the education process. Our work has been acknowledged and supported by Microsoft with grants for OpenAI credit and Azure credits.

Given my technical background, I believe that I could contribute significantly to your project. I have spent the last 12 years in Game Development and 8 years in Machine Learning. Moreover, I have been working with OpenAI Azure Cognitive Services for the past 18 months since its inception. I also have a proof-of-concept demo for education that could potentially be integrated into your project, and I am confident that combining my technical skills and knowledge of Azure/OpenAI could result in something truly impactful.

I would greatly appreciate the opportunity to discuss our mutual interests and potential collaboration further. I am available for a call or meeting at your earliest convenience. Additionally,

:coffee::bridge_at_night::blush:
I am based in San Francisco and would be more than happy to meet for coffee if you are in the area.

If necessary, I can also bring Aleksa Gordic from Google Deep Mind into the discussion.

Please feel free to contact me via email at yuriy.paramonov@comanG.org or directly message me on LinkedIn: LinkedIn com/in/vv-work).

I look forward to the possibility of collaborating with you and thank you for considering my proposal.

2 Likes

So great to hear you’re interested! I’ve sent you an email.

1 Like

This is a pretty important statement right here. I think more people should be moving away from the native OpenAI API and into the Azure one. I’ve been saying it for a minute, but the native one is not for production and only appropriate for research, low traffic experiments, and hobbyists.

Glad there are still people who want to make cool things on this site. Wish there was more “let’s collaborate” and less “it’s too hard” in the world.

2 Likes

@hamsaomar
@vv_work

Hey guys, I’m from the ed tech space, and have consulted for a couple higher-ed startups over the years. My partner’s also in higher-ed publishing, and has offered me some real insight into how universities and professors weigh the adoption of new learning materials. As you could expect, practically every discipline is talking about the ramifications of ChatGPT. What they have barely scratched the surface on is the myriad ways they can automate, and ameliorate the lives of so many professors, students and administrators with a well-encrypted, safe, and user-friendly tool tied to an open-source LLM (or Azure’s GPT API).

Although I’m based in Brooklyn, NY, and currently contracting as an Instructional Designer at Google, I’d still love to connect (since I’ll officially be on the market come end of September). I’ve been building LangChain-based content curation tools, mostly with the GPT 3.5 and 4 API, and have found a real knack for multi-step prompts, given my foundation as a poet, lover of languages, and mythology.

Please email me, or connect on LinkedIn (same name) if interested in conversing!

contact@tomdimino.com

2 Likes

I’m really interested in your idea for a ChatGPT-like startup for students. I think it has a lot of potential to help students learn more effectively and efficiently. I’m particularly interested in the idea of embedding the chatbot specifically with students’ class data. This would allow the chatbot to provide more personalized and relevant answers to students’ questions.

Here are some specific questions that I have about your project:

i)What tech will be used to stop cheating? Will you use natural language processing to identify questions that are likely to be used for cheating, or will you check the history to find cheating?

ii) the privacy problem
Will the data collection be a serious problem related to this project due to the privacy problem and the data protection law?

1 Like

I think it’ll be possible to figure out if a student is cheating based on the way they ask the question. I haven’t given much thought to the specifics of this feature yet but I have a rough estimate of what it would look like.

The data collection is the difficult part but wouldn’t be much of a problem if we work with the Universities to get that data and stay compliant with their rules.