M4 Master OrgXtract

Team

  • Lorenzo Battiston
  • Laura Langhauser
  • Niklas Lengert
  • Sao Chi Pham
  • Viet Anh Jimmy Tran

Supervision

Stefan Wehrmeyer

Python Library

The library contains primitives for extracting PDFs and detects shapes and words. It includes semantic analysis of organizational charts. The library can be used in your Python projects. It can be customized to your needs for your organizational charts by providing your own datasets.

Command Line Tool

The command line tool is designed to convert PDF files into JSON files. This main feature offers a simple user interface to quickly extract data from individual or multiple files.

LLM Integration

The integration of various large language models facilitates advanced text analysis. For our demo, we employed the LLM from OpenAI. However, any supported LLM can be integrated to perform text analysis.

Web Interface

A web interface has been created to visualize the results from text extraction. At the moment it has been built for demonstration purposes, but can be built into a user-friendly web view in the future.