BLT's Entropy-based Patcher vs. Tokenizer Visualisation
Enter text to visualize its segmentation according to different methods:
Byte Latent Transformer (BLT): Entropy-based patching plot and patched text. Spaces are replaced by '_' for viz purposes. Using blt_main_entropy_100m_512w.
Tiktoken (GPT-4): Text segmented by o200k_base tokens.
Llama 3: Text segmented by the meta-llama/Meta-Llama-3-8B tokenizer.