Building Retrieval Augmented Generation (RAG) system poses several challenges, but perhaps the most daunting ones are text segmentation and vectorization. While organized knowledge bases like FAQs are relatively straightforward to handle and yield good results, real-world documents such as regulations, methodologies, and plans present unique difficulties. These documents tend to be lengthy, containing dispersed knowledge points throughout.When doing Knowledge Bases, the segmentation extraction of documents is very headache-inducing, and the matching effect is also very general. How did everyone do it?