Lease Abstraction

Intelligent Document Analysis System

Lease Abstraction Interface

Introduction

I've created a privacy-first semantic search and AI analysis platform for lease documents, built with a microservices architecture using ChromaDB vector database and locally-hosted Deepseek LLM via Ollama. The application enables natural language querying across multiple lease documents while maintaining complete data privacy, enabling users to retrieve and summarize information.

Ideation & Problem

Legal professionals, property managers, and renters regularly work with lengthy lease documents, often needing to extract specific information across multiple contracts. Traditional keyword search fails to capture contextual meaning, and cloud-based AI solutions raise privacy concerns with sensitive legal documents.

The Challenge: Create an intelligent document analysis system that provides semantic understanding and AI-powered insights while ensuring all data remains local and secure.

Process

Technical Approach

  • Architecture Design - Implemented microservices architecture with three containerized services (Streamlit frontend, ChromaDB vector store, Ollama LLM server) for separation of concerns and independent scalability
  • Document Processing Pipeline - Built PDF text extraction → cleaning → sentence-based chunking (500 char max) → embedding generation using SentenceTransformer (all-MiniLM-L6-v2)
  • Semantic Search Implementation - Integrated ChromaDB with persistent storage for vector similarity search, enabling contextual queries beyond simple keyword matching
  • Local LLM Integration - Deployed Deepseek R1 7B model via Ollama for AI-powered analysis without external API dependencies
  • API Development - Created FastAPI endpoints for inter-service communication and RESTful operations
  • Containerization - Orchestrated services using Docker Compose with persistent volumes for data and model storage

Key Technical Decisions

  • ChromaDB over cloud solutions for zero-cost, privacy-preserving vector storage
  • Local LLM deployment to eliminate API costs and maintain data privacy
  • Sentence-boundary chunking to preserve semantic integrity while optimizing embedding quality
  • Microservices pattern for scalability and maintainable code organization

Design

Data Flow

  • Upload Phase: PDF → Text Extraction (pdfplumber) → Cleaning → Chunking → Embedding → ChromaDB Storage
  • Query Phase: Natural Language Query → Semantic Search → Context Retrieval → LLM Analysis → Structured Response

Queryin has two functions, one for just getting information related to the query and one to use the Deepseek LLM to take that information and garner an appropiate repsonse.

Simple Query that fetches information:

Lease Abstraction Interface

Deepseek LLM Response:

Lease Abstraction Analysis

Technology Stack

  • Frontend: Streamlit
  • Backend: FastAPI
  • Vector DB: ChromaDB with SentenceTransformer embeddings
  • AI/ML: Ollama + Deepseek R1 7B, SentenceTransformer (all-MiniLM-L6-v2)
  • Infrastructure: Docker
  • Text Processing: pdfplumber

Final Product

The final application allows users to look deeper into their leases without having to read the whole document. Users begin by uploading multiple PDF lease documents through an intuitive web interface, where the system automatically processes and indexes the content. Once uploaded, they can perform semantic searches using natural language queries like "What are the maintenance responsibilities?" and receive AI-powered detailed analysis with precise source document references. The system supports both broad queries across entire document collections and targeted searches within specific leases, providing flexibility for different use cases.

The architecture achieves 100% local processing, ensuring complete data privacy without ever sending sensitive legal documents to external servers. By deploying the Deepseek LLM locally through Ollama, the system eliminates ongoing API costs while maintaining full data privacy. Persistent storage keeps document embeddings available across sessions, so users never need to re-upload or re-process their documents.

With this application, reviewing leases is transformed from hours of manual lease review into seconds of search and analysis, while maintaining data privacy through fully local processing.