FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling ...
Fixing a lot of bugs.
To download the pre-generated dataset used in our paper, please run the following command: We then benchmark the decoding quality and perplexity of those decoding methods. Please check the Benchmark ...
Abstract: In human connection, nonverbal cues, especially body language, are extremely important. Although it might be difficult to interpret these subtle indications, doing so can provide important ...
Abstract: Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Auto-Regressive (AR)-based models implement the recognition in a character-by-character ...