-
Notifications
You must be signed in to change notification settings - Fork 736
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
"from_pretrained" read wrong config file. not "tokenizer_config.json", but "config.json"
#1538
opened May 23, 2024 by
daehuikim
How to allow the merging of consecutive newline tokens \n when training a byte-level bpe tokenizer?
#1534
opened May 18, 2024 by
liuslnlp
How to Batch-Encode Paired Input Sentences with Tokenizers: Seeking Clarification
#1531
opened May 14, 2024 by
insookim43
Special token handling breaks idempotency of sentencepiece due to extra spaces
#1527
opened May 9, 2024 by
cat-state
Link to download the training text in
docs/source/quicktour.rst
is broken
#1526
opened May 9, 2024 by
14jdelap
BPE Trainer doesn't respect the
vocab_size
parameter when dataset size is increased
#1514
opened Apr 25, 2024 by
Abhinay1997
Extended vocab tokenizer merging text into a single string without spaces while decoding
#1501
opened Apr 17, 2024 by
savanth14
Discrepancy Between GitHub Release and NPM Package Version & Missing Dependencies
#1489
opened Apr 10, 2024 by
superBertBerg
Previous Next
ProTip!
Updated in the last three days: updated:>2024-05-30.