Bulk embed #403

prasmussen15 · 2025-04-27T01:30:56Z

Important

Adds batch embedding support for nodes and edges, refactoring existing code to use batch methods for improved performance.

Behavior:
- Introduces create_entity_edge_embeddings() in edges.py and create_entity_node_embeddings() in nodes.py for batch embedding.
- Replaces individual embedding generation with batch embedding in extract_edges() in edge_operations.py and extract_nodes() in node_operations.py.
Embedder Clients:
- Adds create_batch() method to EmbedderClient in client.py.
- Implements create_batch() in gemini.py, openai.py, and voyage.py for batch embedding support.
Misc:
- Adjusts get_by_node_uuid() in edges.py and get_by_uuid() in nodes.py for consistent query formatting.

^{This description was created by}^{for b2d2a10. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 733ad6c in 2 minutes and 14 seconds. Click for details.

Reviewed 173 lines of code in 8 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. graphiti_core/utils/maintenance/node_operations.py:452

Draft comment:
Using dict.update in a list comprehension returns None. Consider merging dictionaries with {**dict1, **dict2}.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. graphiti_core/edges.py:472

Draft comment:
Consider adding a brief docstring for the new 'create_entity_edge_embeddings' function to improve clarity.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80% None

3. graphiti_core/embedder/client.py:36

Draft comment:
Add a docstring for the abstract 'create_batch' method for consistency with the 'create' method.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80% The comment suggests adding a docstring to match the create method, but create doesn't have a docstring either. While docstrings would be good for both methods, this specific comment is incorrect in its reasoning. The comment is about a changed portion of code (the new create_batch method), but its justification is flawed. Perhaps both methods should have docstrings as a general best practice, even if the comparison is wrong? Maybe I'm being too strict about the exact wording of the comment? While docstrings would be good, we should reject incorrect comments. A better comment would either request docstrings for both methods or phrase the request differently without the incorrect comparison. Delete the comment because it incorrectly claims to be asking for consistency with the create method, which doesn't have a docstring either.

4. graphiti_core/utils/maintenance/edge_operations.py:221

Draft comment:
Threshold parameter (0.8) was removed from get_relevant_edges. Verify if this change is intentional, as it may alter edge selection behavior.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 80% <= threshold 80% This comment is asking the PR author to verify if the removal of a parameter is intentional, which is against the rules. However, it is also pointing out a specific change that could alter behavior, which is a valid concern. The comment is specific and highlights a potential issue with the code change.

5. graphiti_core/utils/maintenance/node_operations.py:452

Draft comment:
The list comprehension in dedupe_node_list incorrectly uses 'dict.update()', which returns None. Use dictionary unpacking (e.g. {**base, **node.attributes}) instead.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. graphiti_core/utils/maintenance/node_operations.py:100

Draft comment:
Minor: The function name 'extract_nodes_reflexion' (and related variable 'reflexion_iterations' and constant 'MAX_REFLEXION_ITERATIONS') uses 'reflexion'. Consider verifying if this spelling is intentional or if 'reflection' was intended to improve clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_xVZbmgPkTCCnYQp8

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

graphiti_core/embedder/client.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed b2d2a10 in 1 minute and 34 seconds. Click for details.

Reviewed 57 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 10 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. graphiti_core/edges.py:324

Draft comment:
Indentation in the triple-quoted query has been changed. Consider using a dedent helper (e.g. textwrap.dedent) for clarity and consistency.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80% None

2. graphiti_core/edges.py:474

Draft comment:
Using zip with strict=True ensures matching lengths; just verify that your environment supports Python 3.10+.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80% The comment is informative and suggests verifying the environment's compatibility with Python 3.10+. It doesn't provide a specific code suggestion or ask for a test to be written. It also doesn't address any of the additional rules like idiomatic code or DRY principles. Therefore, it should be removed.

3. graphiti_core/embedder/voyage.py:62

Draft comment:
List comprehension now explicitly casts each embedding element to float. Confirm that this conversion is required for consistency.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 80% None

4. graphiti_core/nodes.py:332

Draft comment:
The query for get_by_uuid now has adjusted indentation. Consider using a dedent helper to ensure consistent formatting.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80% None

5. graphiti_core/nodes.py:566

Draft comment:
Using zip with strict=True in create_entity_node_embeddings requires Python 3.10+; please ensure this is acceptable in all target environments.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 20% <= threshold 80% The comment is asking the author to ensure compatibility with Python 3.10+ due to the use of zip with strict=True. This is a valid concern as it relates to ensuring the code runs in the intended environments. However, it does not directly suggest a code change or improvement, which is what the rules prioritize. The comment is informative and asks for confirmation, which violates the rules.

6. graphiti_core/edges.py:322

Draft comment:
Query string indentation changed in get_by_node_uuid. Ensure consistent formatting for all Cypher queries.
Reason this comment was not posted:
Comment looked like it was already resolved.

7. graphiti_core/edges.py:474

Draft comment:
Using zip with strict=True enforces equal lengths; confirm that Python 3.10+ is required and this behavior is intended.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80% The comment is speculative and asks for confirmation rather than pointing out a clear issue. The strict=True parameter is actually a good safety improvement that prevents silent bugs. If Python version compatibility was a concern, it would be caught by CI/build system. The comment doesn't suggest any concrete changes. The comment does point out a potential compatibility issue that could affect deployments. Python version requirements are important to track. While Python version requirements are important, they should be handled by pyproject.toml/setup.py and CI, not through code review comments. The strict=True addition is a positive change that improves code safety. Delete the comment. It asks for confirmation rather than suggesting changes, and Python version requirements should be handled by build configuration, not code review.

8. graphiti_core/embedder/voyage.py:62

Draft comment:
Casting embedding values to float ensures consistent types. This explicit type conversion improves data integrity.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states a positive aspect of the code without suggesting any changes or asking for confirmation of intent.

9. graphiti_core/nodes.py:333

Draft comment:
Adjusted query string indentation in get_by_uuid for EntityNode. Verify that the new whitespace does not alter the intended query.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80% This comment is asking the PR author to verify the behavior of the code, which is against the rules. It does not provide a specific suggestion or point out a clear issue with the code. Therefore, it should be removed.

10. graphiti_core/nodes.py:567

Draft comment:
Using strict=True in zip for creating entity node embeddings enforces length matching. Ensure compatibility with Python 3.10+.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80% The comment is informative and suggests ensuring compatibility with Python 3.10+. It doesn't provide a specific code suggestion or point out a potential issue with the current code. It also doesn't ask for a test or a specific confirmation about the code's intention. Therefore, it doesn't align with the rules for good comments.

Workflow ID: wflow_sh3rsQmWfZ6pRApn

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

prasmussen15 added 3 commits April 26, 2025 18:45

add batch embeddings

f78fb87

bulk edge and node embeddings

66c1570

update embeddings during add_episode

733ad6c

prasmussen15 had a problem deploying to development April 27, 2025 01:31 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development April 27, 2025 01:31 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 27, 2025

View reviewed changes

graphiti_core/embedder/client.py Outdated Show resolved Hide resolved

Update graphiti_core/embedder/client.py

0643fb2

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

prasmussen15 had a problem deploying to development April 27, 2025 01:36 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development April 27, 2025 01:36 — with GitHub Actions Inactive

mypy

b2d2a10

prasmussen15 temporarily deployed to development April 27, 2025 02:05 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 27, 2025

View reviewed changes

prasmussen15 merged commit 0b94e0e into main Apr 27, 2025
7 checks passed

prasmussen15 deleted the bulk-embed branch April 27, 2025 02:09

getzep locked and limited conversation to collaborators Apr 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk embed #403

Bulk embed #403

prasmussen15 commented Apr 27, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Bulk embed #403

Bulk embed #403

Conversation

prasmussen15 commented Apr 27, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

prasmussen15 commented Apr 27, 2025 •

edited by ellipsis-dev bot

Loading