![[sequential-monte-carlo.png]]
[image source](https://www.researchgate.net/figure/Sequential-Monte-Carlo-scheme_fig2_322302619)
A fun Bayesian approach to constrained sampling from LLMs.
Occasionally, when generating text, you want the generated text to meet a constraint. Maybe (1) each word must be no longer than 5 characters or (2) the emitted text must be a valid SQL query. I focus on constraints but this paper uses a very general framework and also supports wild things like prompt intersection: efficiently generating text which is simultaneously the highly-likely completion of multiple prompts.
Some naive/traditional approaches to constrained generation:
- **prompt engineering**: Iterating to find out the perfect combination of instructions and threats and bribes to convince the model to do the right thing. This takes a long time and usually requires some un-intuitive trick.
- **masking**: (à la [outlines](https://github.com/outlines-dev/outlines)) before sampling each token we mask out any tokens which would violate the constraint and renormalize. This _works_ but uses an incorrect probability distribution. e.g. in the no-more-than-5-letter-words case the token "meet" has too much probability. The word "meeting" is not allowed but some of its probability mass is captured in the "meet" token.
The principled approach is to compute a new token probability distribution. Somehow peer ahead and look at the universe of all strings (weighted by their likelihoods) which match our constraint. This is intractable but it is intractable in a very familiar way: posterior inference is exactly this problem and there are efficient methods for sampling from tricky distributions without needing to compute them. Sequential monte carlo is one of those methods.
The implementation looks a lot like beam search: we keep track of a set of "particles" (partial generations), try to extend each of them, and loop until we've found a high-probability complete string. The more particles we hold on to (the bigger our batches) the more accurate results we receive.
[Deep Learning](https://www.deeplearningbook.org/contents/monte_carlo.html) ($p. 589\, \S17.2$), [Probabilistic Machine Learning](https://github.com/probml/pml2-book/blob/main/toc2-long-2023-01-19.pdf) ($\S13$), and [the paper](https://arxiv.org/pdf/2306.03081) cover more background and the precise implementation details.