Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing grammar as a request parameter in completion/chat with go-side grammar validation #4525

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

richardanaya
Copy link

@richardanaya richardanaya commented May 19, 2024

Why is passing down grammars needed?

Relying upon the context of a prompt to dictate structure can be unreliable (because its dependent upon the model and generational randomness) and takes up context space. Grammar is a well proven way to constrain generational output, and in fact format="JSON" even depends on it, but format="JSON" allows no reliable specification large complex structures and can even be tricked with prompt attacks.

image

Why grammar and not JSON schema?

While JSON schema would make a nice future addition, there's interest in data structures outside of JSON (simple enum values, programming languages, etc.). Also, JSON schema generators will rely upon grammars fundamentally, so validating the grammar generated by JSON schema will also benefit from grammar checking.

Why not just pass along the grammar to llama.cpp?

I looked into complexities of passing along grammar to llama.cpp server. There's a few challenges:

  • llama.cpp server doesn't return errors when bad grammar is passed to it with streaming mode on. It gives an incomprehensible "unexpected EOF"
    image
  • the in memory model will be reused if the grammar is valid OR changed. BUT... the in-memory model appears to get reloaded if you give it a bad grammar and then follow up with a good grammar.
    image
  • it appears to work perfectly reusing in memory models just passing along a completely valid grammar (even a variety of valid grammars)

My conclusion from this given the advice of the community is that we do indeed have to do our our GBNF grammar validation on the Go server side to do our best at preventing passing down bad grammar.


In this PR i've created:

  • the functionality to pass along grammar in chat and completion mode
  • documentation in readme related to new property
  • prevention of using grammar and json parameters at same time.
  • validation code for grammars
  • extensive set of 30+ tests for grammar ranging from character classes, strings, internationalizations comments, etc.
  • tests of every known grammar on llama.cpp and also individual unit tests
  • no usages of regex to make clear understandable parsing

Edge cases:

  • i've probably not implemented the entirety of whats possible in character classes, but I have a limited subset compatible with the grammar listed on llamma.cpp. My assumption is most people's grammars will be less complex than these.
  • there might be some valid grammars I don't currently support (but to the best of my knowledge we support all the major publicly available ones including ones as complex as C programming language), I chose not to use a full on go parser library because I wanted the cognitive load of this code to be approachable initially (rather than every viewer of this code to have to learn a new library). if in the future, we want to replace it with a more formal technology we can and tests can be reused.

Examples of success:

Screenshot 2024-05-19 at 1 29 10 PM Screenshot 2024-05-19 at 1 44 22 PM

Example of failure:

Screenshot 2024-05-19 at 1 28 22 PM

I believe this PR satisfies #4074 with an acceptable amount of protection from sending invalid GBNF grammars with useful error messages.

@richardanaya
Copy link
Author

richardanaya commented May 20, 2024

I think i've added as many tests as I can think of to meaningfully add. I'll await feedback. @jmorganca

@mitar
Copy link

mitar commented May 31, 2024

Have you seen #3618? It adds both grammar and JSON schema option which is then passed to llama.cpp. I think it would be nice to combine those two PRs (especially tests from this PR).

return fmt.Errorf("grammar and format cannot be used together")
}

err := ValidateGrammar(req.Grammar)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to validate the grammar? Llama.cpp does that anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I showed in my investigation above bad grammars eject the model from memory and makes it reload. The streaming llama.cpp server has bad error handling. The go-side grammar check prevents that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, where is go-side in this PR? I see that you implemented your own validator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I wrote my own validator in the grammar.go file of the commit. I didn't want to obligate this project to some special parsing library, so I tried to just be as straight forward as possible to get some initial validation going. I was a bit paranoid the PR might seem too strange if I did something too esoteric. The suite of tests I think could be useful for whatever go validation evolves into.

I was aiming for something just broadly adequate at validation to protect the lamma.cpp server from the ejections of models. I noticed that a lot of PRs didn't get accepted because they were pretty simplistic pass throughs. Talking with someone from the Discord said that some lack of even basic protection over the servers state might have been the reason why. That's sort of why this PR evolved as it did.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought "go-side" is some library for grammar checking. :-) Lol.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S'all good

@richardanaya
Copy link
Author

richardanaya commented Jun 1, 2024

Have you seen #3618? It adds both grammar and JSON schema option which is then passed to llama.cpp. I think it would be nice to combine those two PRs (especially tests from this PR).

I have, but I think it suffers from the same problem as bad grammars that erroneous json schema cause model ejects. I think it would require a json schema validator. I think that's a big enough task it'd make sense to either leave that to another wrapper project to convert JSON schema to grammar, or put in separate PR. I'm interested in writing that separate PR, but i'd like to at least get this one finished. I think a lot of folks have been waiting on even just basic grammar support. Thanks!

@mitar
Copy link

mitar commented Jun 1, 2024

I think it suffers from the same problem as bad grammars that erroneous json schema cause model ejects.

I am not sure why I would have to pay for grammar and JSON schema validation at every API request? I find this strange. In general I would say that in my case, JSON schema and/or grammar is part of a trusted input and not provided by the user. At least validation should then be cached or something?

Also is the point of this validation to prevent malicious values for grammar or just accidental erroneous values? Because if the goal is to prevent malicious values, then validator should completely match llama.cpp would reject. Otherwise an attacker would be able to bypass this validator by crafting the value which passes this validator but is still rejected by llama.cpp.

So I am not sure exactly why is this validation needed?

@richardanaya
Copy link
Author

These are two valid concerns:

  • Cacheing could definitely help, I could add something small
  • You're right that a malicious hacker could find some way to bust the internal model. The goal of this PR isn't to be perfect security, its to get the ball rolling on getting grammar into the project and even get feedback and be generally aligned with the desired principles. As I specified above, my validator isn't a perfect representation of what's inside of llama.cpp's capabilities. To my knowledge, a spec of their grammar support doesn't even exist. So my validation is a subset and maybe has holes and is to what I could determine from their public documentation and public existing grammars.

Again :) I know nothing about the mindset of the project owners on what's holding them back from merging in grammar support. I did my best to make a PR in line with convos from older members in Discord to speculatively address their issues but also not make something too esoteric.

@richardanaya
Copy link
Author

@mitar added simple caching and some simple sanity checking around size of grammar

@richardanaya
Copy link
Author

@mitar I thought about your concern, I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

@mitar
Copy link

mitar commented Jun 1, 2024

I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

I do not get why would this be useful? You should maybe only make validation an option. But you should always pass grammar through if user wants to use the grammar?

@richardanaya richardanaya requested a review from mitar June 1, 2024 20:12
@richardanaya
Copy link
Author

richardanaya commented Jun 1, 2024

I now only process grammars if OLLAMA_GRAMMAR set to "true". That way custom grammar is opt-in for people okay with it's trade offs.

I do not get why would this be useful? You should maybe only make validation an option. But you should always pass grammar through if user wants to use the grammar?

My understanding the goal is to protect the model from being ejected by llama.cpp server. We should never pass down invalid gramma to the best of our ability. Therefore grammar being passed in is opt-in until the community is comfortable with the validation. In other words, we shouldn't let a vulnerability be the default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants