SQL Server Storage Support for Indexing Pipeline #1885

KennyZhang1 · 2025-04-17T18:59:05Z

Description

This PR adds SQL Server as a storage option for parquet outputs within the indexing pipeline.

Related Issues

This PR is similar to the CosmosDB storage support PR(s) that was completed previously.

Proposed Changes

Implement the SQLServerPipelineStorage class to interface parquet file outputs with SQL Server
Update the factory class to include the SQL server option
Update the output configs and config unit tests as needed

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

NOTE: This PR is scoped to only handle parquet file outputs in the indexing pipeline. This PR does not support using SQL server storage for cache or vector embedding outputs

…sql-server-support

dworthen and others added 30 commits January 27, 2025 14:49

Add vector store id reference to embeddings config.

76137d8

Merge branch 'main' of github.com:microsoft/graphrag

beefc46

Merge branch 'main' of github.com:microsoft/graphrag

5686bb0

Merge branch 'main' of github.com:microsoft/graphrag

2875450

Merge branch 'main' of github.com:microsoft/graphrag

ed4c77c

Merge branch 'main' of github.com:microsoft/graphrag

dbd5362

XMerge branch 'main' of github.com:microsoft/graphrag

d896830

generated initial implementation for sql server support

b24de53

cleaned up SQL server storage implementation

849009d

added warnings for non-parquet calls

6fad33f

added more comments and logging

febd55f

debugged CRUD methods for sql server storage class

652cf84

cleaned up formatting and linting

f1e5a0c

ruff formatting

4cded81

refactored comments

a843c6c

hook up sql server class to rest of graphrag

b6c1edf

added list serialization and deserialization

5b54709

add overwrite param

8f6b349

confirmed successful run of index pipeline

3152947

re-added overwrite table flag

73282a6

cleaned up comments and formatting

d378bdf

Merge branch 'main' of github.com:microsoft/graphrag into kennyzhang/…

93b772e

…sql-server-support

added more logging for row insertion

d4aa807

refactored logging for other storage functions

7c6e441

added support for ManagedIdentityCredential

c3f8849

introduced autogenerate tables functionality

37f9350

added outline for manual table creation

abb5f5b

added TODO for create_tables

77c70a7

sem

d487c74

modified tests

ab29d2c

KennyZhang1 requested review from a team as code owners April 17, 2025 18:59

KennyZhang1 added 3 commits April 17, 2025 15:19

generated new lockfile

c8f0082

added tracking to log for row insertion

1917791

cicd linting errors

d14c609

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Server Storage Support for Indexing Pipeline #1885

SQL Server Storage Support for Indexing Pipeline #1885

KennyZhang1 commented Apr 17, 2025

SQL Server Storage Support for Indexing Pipeline #1885

Are you sure you want to change the base?

SQL Server Storage Support for Indexing Pipeline #1885

Conversation

KennyZhang1 commented Apr 17, 2025

Description

Related Issues

Proposed Changes

Checklist

Additional Notes