Occam - LLMs and verification

It has been suggested that I write a little about recent progress with the verification of the output of LLMs and try to get some engagement here. I think it is best that I write several posts and keep each to around a dozen paragraphs.

This initial post will serve as a starting point and I will keep it updated as I add further posts.

  1. Some context and a very brief history of the use of computers for mathematics in particular.

  2. A little detail about automated theorem proving. Concretely, the difference between automated theorem provers and proof assistants.

1 Like

The gist of it is that LLMs will be able to help humans to make contributions to mathematics and other forms of symbolic reasoning within the next two to three years. There is nothing hypothetical about this, the only reason for writing two to three years is that there is a finite amount of work still to be done on the requisite system. About ten years of work has already been done, by the way.

One point that I want to make early on is that I personally do not believe that AI systems that place greater emphasis on inference, such as Alpha Geometry, will make any more than incremental progress in the near future.

The other point that I want to make early on is that I believe that projects coercing mathematicians into writing proofs for verifiers such as Microsoft’s Lean to then be used as training data for LLMs are a waste of time and money. The proofs that LLMs produce when trained in this manner are unreadable. A proof that cannot be read by a human is anathema to me although I understand that some may disagree.

Despite this negativity I again maintain that the immediate future is actually bright thanks to existing LLMs and will only get brighter as these systems improve.

Bearing in mind that I am no historian, to round off this initial post I want to give some historical context to the idea that computers can “do math”, so to speak. This idea gained traction amongst computer scientists and some mathematicians, including Von Neumann, around the middle of the last century.

The belief was grounded on the assumption that mathematics could be reduced to the successive application of a relatively small number of rules. It was argued that since computers could apply these rules many times faster than humans so they could construct proofs largely by trial and error. There was genuine optimism around this idea although it was tempered by the concession that human insight and creativity could nonetheless not be so easily replicated.

This panacea has never come about despite innumerable efforts since. I maintain, however, that it is now around the corner by leveraging LLMs as they stand with no additional training, in tandem with verification.

in this post I want to very quickly cover computer assisted reasoning.

There are two main approaches: automated theorem provers and proof assistants.

  1. Automated theorem provers create their own proofs. This area has gained a more attention lately because of the efforts of the AI community. Alpha Geometry falls into this category, for example. Theorem provers have historically not done very well compared to humans but the research area, as far as I know, remains very active. The Vampire theorem prover is a good example.

  2. Proof assistants verify existing proofs, usually created by humans. Most of the current ones are based on type theories. Thus to prove a theorem you type it. I do not think it is worthwhile going into further detail here. A good example is the Isabell proof assistant. Others are the Coq proof assistant and HOL Light. The theory underlying Isabelle is called Higher Order Logic. HOL Light, as the name suggests is also founded on this theory. The Calculus of Constructions underlies the Coq proof assistant. A notable historical proof assistant is Mizar, which is based on set theory rather than a type theory.

Here are some additional points.

  • Only neophytes would claim that any of these systems are easy to use. Proof assistants in particular are notoriously difficult to get to grips with. They really are expert systems.

  • The distinction between these two types of software isi somewhat blurred. For example proof assistants usually have some kind of in-built or plug-in theorem proving capabilities. Isabelle has SledgeHammer, for example.

  • Proof assistants are arguably more suited to proving things about software than mathematics. However, all of the proof assistants above have been used for significant projects in formalised mathematics, some of historical significance. And Mizar, the odd one out, is really all about mathematics.

  • Despite their primary use case of verifying software, proof assistants are not used extensively in the software industry. Despite their many successes, many if not most software engineers are only dimly aware of proof assistants.

Recently the term “verification” seems to have been borrowed from the theorem proving community, to be associated with finding ways to assert the veracity of the outputs of LLMs. The question is: how to do you know that this or that LLM is telling the truth? In my opinion this shows a fair amount of naivety, to put it diplomatically, about the nature of verification and the nature of truth.

1 Like