Improving Performance of Local Chatbot with Caching
John Jenq
Authors Information |
Citation |
Full Text |
John Jenq
School of Computing, Montclair State University, Montclair, New Jersey, United States
Cite this paper as:Jenq, J. (2024). Improving Performance of Local Chatbot with Caching.
Journal of Systemics, Cybernetics and Informatics, 22(5), 96-100. https://doi.org/10.54808/JSCI.22.05.96
Online ISSN (Journal): 1690-4524
Abstract
Chatbots and the technology behind them are widely used in many places and in various ways. Retrieval Augmented Generation AI framework has gained its popularity by its linking of large language model with private dataset. It enables one to run AI locally and privately with the most updated information and knowledge. In this report, we aim to improve the local private chatbot response time by using a cache. From our experimental results, the majority of time spent during the query process is in the generation of the response. The response time can be significantly improved when there is a hit on the cache system which enables us to return the response to the user immediately without going through the generation step. In this report, we focus our efforts on improving the turnaround time of the generation step. The cache is organized into categories which can be used for efficient searching. User’s query information such as query string, embedding information, and its response are recorded and stored in the cache. Experiment results are presented and the issues of speed up of request response turnaround time is addressed.