Athina AI - Athina AI Hub (Page 10)

research-papers

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Original Paper: https://arxiv.org/abs/2407.14057 By: Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi Abstract The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token

research-papers

Weak-to-Strong Reasoning

Original Paper: https://arxiv.org/abs/2407.13647 By: Yuqing Yang, Yan Ma, Pengfei Liu Abstract: When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervisions for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities

research-papers

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Original Paper: https://arxiv.org/abs/2306.00978 By: Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han Abstract: Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge

research-papers

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Original Paper: https://arxiv.org/abs/2407.11963 By: Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen Abstract: In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer

research-papers

Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation

Original Paper: https://arxiv.org/abs/2407.12216 By: Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, Huan Liu Abstract: Large Language Models (LLMs) are proficient at generating coherent and contextually relevant text but face challenges when addressing knowledge-intensive queries in domain-specific and factual question-answering tasks. Retrieval-augmented generation (RAG) systems mitigate this

research-papers

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Original Paper: https://arxiv.org/abs/2407.09025 By: Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang Abstract: Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language

research-papers

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Original Paper: https://arxiv.org/abs/2407.08223 By: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external

research-papers

Distilling System 2 into System 1

Original Paper: https://arxiv.org/abs/2407.06023v1 By: Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov Abstract: Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought (Wei et al., 2022), many such System 2 techniques

research-papers

Tree Search For Language Model Agents

Original Paper: https://jykoh.com/search-agents By: Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov Abstract: Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a fundamental challenge remains: LMs, primarily optimized for natural language understanding

research-papers

RouteLLM: Learning to Route LLMs with Preference Data

Original Paper: https://arxiv.org/abs/2406.18665v2 By: Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica Abstract: Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often

research-papers

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Original Paper: https://arxiv.org/abs/2407.01370 By: Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu Abstract: LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like

research-papers

Exploring Advanced Large Language Models with LLMsuite

Original Paper: https://arxiv.org/abs/2407.12036 By: Giorgio Roffo Abstract: This tutorial explores the advancements and challenges in the development of Large Language Models (LLMs) such as ChatGPT and Gemini. It addresses inherent limitations like temporal knowledge cutoffs, mathematical inaccuracies, and the generation of incorrect information, proposing solutions