LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference