Feedback on Limited Access to Essential Language Resources

What is being discussed:

Van Dale Dictionary: Key Points

  • Authoritative Source : Van Dale is a leading publisher of dictionaries in the Netherlands and Belgium.

  • Multiple Formats: Van Dale dictionaries come in various formats, including print, online versions, and apps.

  • Licensed Content: The full text of the Van Dale dictionary is typically a licensed product and not freely available on the web, meaning it’s not generally accessible through web scraping.

  • Full-Text Searchable: The online professional version of Van Dale is full-text searchable, implying that the entire text of the dictionary is digitized.

  • Not Primarily Public: Although some content may be present on public university library sites, it’s typically behind a paywall or restricted access.

That a Van Dale dictionary is a book in its primary form does not mean that it is or is not part of the training corpus, as OpenAI has taken liberally from all sorts of media to make training data. This also has public-domain editions by publication date. You cannot trust an AI that says it “currently only has access to the free version”.

The most important part is to understand that one single book does not make the AI speak Dutch or then be able to “look it up”. The AI knowledge is a massive amount of training on terabytes of data. Things like being able to tell you synonyms and translations are part of a holistic world understanding.