← Back Home

RAG System for German Historical Speech

View Project Report Private Research Repository

RAG System for German Historical Speech, a project under Dr. Markus Mühling, Distributed Systems & Intelligent Computing Group , is a Retrieval Augmented Generation (RAG) application designed to retrieve relevant historical speech transcripts and generate context aware responses using Large Language Models. The system combines semantic retrieval techniques with vector databases and language models to enable efficient exploration and understanding of historical speech archives.


Responsibilities

Developed a RAG pipeline for retrieving relevant historical speech transcripts and generating context aware responses.
Processed, cleaned, and structured transcript datasets for efficient semantic retrieval workflows.
Built embedding based vector stores and integrated retrieval results with Large Language Models (LLMs).
Designed evaluation pipelines to measure retrieval precision, semantic relevance, and response quality.
Worked on information retrieval Documentation and semantic search strategies for improved query performance.

Technologies & Domains

Python Elasticsearch RAG Systems Vector Databases Arctic-Embed.2.0 Semantic Splitting Recursive Text Splitting Langgraph Embedding Models Natural Language Processing Agentic RAG
Source code are private due to NDA.