document-parser

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

python docker ocr pytorch omr optical-character-recognition optical-mark-recognition icr document-parser document-layout-analysis table-recognition table-detection publaynet intelligent-character-recognition intelligent-word-recognition iwr pubtabnet

Updated Jun 6, 2025
Python

JPLeoRX / opencv-text-deskew

Star

Tutorial on how to deskew (straighten) text images

python opencv tutorial computer-vision image-processing opencv-python deskew document-parser

Updated Mar 15, 2022
Python

papercast-dev / papercast

Star

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

python nlp pipeline podcast pdf-converter tts arxiv pdf-to-text dag document-parser pdf-document-processor grobid semantic-scholar document-parsing

Updated Mar 17, 2025
Python

InvoiceableAI / Invoiceable

Star

The invoice, document, and resume parser powered by AI.

python resume ai experimental invoices invoice documents resume-parser resumes document-parser invoice-parser invoiceable

Updated Nov 22, 2024
Python

decisionfacts / semantic-ai

Star

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

pdf machine-learning ocr deep-neural-networks openai docx approximate-nearest-neighbor-search semantic-search document-parser rag fastapi vector-database inference-api openai-api llm retrieval-augmented-generation llama2

Updated Jul 19, 2024
Python

decisionfacts / df-extract

Star

DF Extract Lib

pdf jpg png jpeg extraction python3 asyncio docx pptx document-parser

Updated Apr 3, 2024
Python

graphlit / graphlit-client-python

Star

Python client library for Graphlit Platform

ai chatbot api-client copilot agents ai-agents document-parser rag pdf-to-json api-client-python llms graphlit

Updated May 31, 2025
Python

has-abi / docparser

Star

Extract text from your DOCX documents.

text-parser document-parser doc-parser docx-parser

Updated Feb 10, 2024
Python

Gyanvir / DrParser

Star

Dr.Parser 🩸📊 – AI-powered blood report parser that extracts and analyzes medical data from images/PDFs. Built with React, FastAPI, EasyOCR, and Gemini AI. 🚀 🔹 Local Setup Available | 🔹 Future Enhancements Planned | 🔹 Hackathon Project 👉 Clone, run, and explore the future of AI-driven healthcare!

ocr reactjs healthcare hackathon-project document-parser fastapi medical-ai ai-ml easyocr team-euphoria blood-report-analysis

Updated Mar 30, 2025
Python

Vetrivel07 / AI-Powered-Resume-Evaluator

Star

An AI-powered resume evaluation app that compares a candidate’s resume with a job description using Google’s Gemini 1.5 Flash model to provide HR-style feedback and an ATS-style match scoring through a simple and interactive Streamlit interface.

python-library evaluator ats document-parser resume-analysis gemini-api streamlit streamlit-application genai gemini-flash

Updated May 30, 2025
Python

anyparser / anyparser_crewai

Star

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

python typescript artificial-intelligence knowledge-graph cag document-parser kag rag document-parsing retrieval-augmented-generation crewai crew-ai crewai-rag cache-augmented-generation anyparser crew-ai-rag

Updated Feb 17, 2025
Python

RevanKumarD / LlaMarker

Star

Your ultimate tool for effortlessly converting and parsing documents into clean, well-structured Markdown—fast, reliable, and 100% local! 💻✨

marker document-parser llama-ai local-parsing-tool llamarker

Updated Jan 19, 2025
Python

Besthope-Official / predoc

Star

Preprocess document service for RAG (Retriveal Augumented Generation)

api microservice yolo pdf-parser text-embedding document-parser rag text-chunking

Updated Jun 5, 2025
Python

Improve this page

Add a description, image, and links to the document-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-parser topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-parser

Here are 25 public repositories matching this topic...

infiniflow / ragflow

docling-project / docling

Marker-Inc-Korea / AutoRAG

run-llama / llama_cloud_services

Filimoa / open-parse

deepdoctection / deepdoctection

iamarunbrahma / vision-parse

marieai / marie-ai

JPLeoRX / opencv-text-deskew

papercast-dev / papercast

InvoiceableAI / Invoiceable

decisionfacts / semantic-ai

decisionfacts / df-extract

graphlit / graphlit-client-python

has-abi / docparser

Gyanvir / DrParser

Vetrivel07 / AI-Powered-Resume-Evaluator

anyparser / anyparser_crewai

RevanKumarD / LlaMarker

Besthope-Official / predoc

Improve this page

Add this topic to your repo