Skip to content

Bulk embed #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 27, 2025
Merged

Bulk embed #403

merged 5 commits into from
Apr 27, 2025

Conversation

prasmussen15
Copy link
Collaborator

@prasmussen15 prasmussen15 commented Apr 27, 2025

Important

Adds batch embedding support for nodes and edges, refactoring existing code to use batch methods for improved performance.

  • Behavior:
    • Introduces create_entity_edge_embeddings() in edges.py and create_entity_node_embeddings() in nodes.py for batch embedding.
    • Replaces individual embedding generation with batch embedding in extract_edges() in edge_operations.py and extract_nodes() in node_operations.py.
  • Embedder Clients:
    • Adds create_batch() method to EmbedderClient in client.py.
    • Implements create_batch() in gemini.py, openai.py, and voyage.py for batch embedding support.
  • Misc:
    • Adjusts get_by_node_uuid() in edges.py and get_by_uuid() in nodes.py for consistent query formatting.

This description was created by Ellipsis for b2d2a10. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 733ad6c in 2 minutes and 14 seconds. Click for details.
  • Reviewed 173 lines of code in 8 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. graphiti_core/utils/maintenance/node_operations.py:452
  • Draft comment:
    Using dict.update in a list comprehension returns None. Consider merging dictionaries with {**dict1, **dict2}.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. graphiti_core/edges.py:472
  • Draft comment:
    Consider adding a brief docstring for the new 'create_entity_edge_embeddings' function to improve clarity.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80% None
3. graphiti_core/embedder/client.py:36
  • Draft comment:
    Add a docstring for the abstract 'create_batch' method for consistency with the 'create' method.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80% The comment suggests adding a docstring to match the create method, but create doesn't have a docstring either. While docstrings would be good for both methods, this specific comment is incorrect in its reasoning. The comment is about a changed portion of code (the new create_batch method), but its justification is flawed. Perhaps both methods should have docstrings as a general best practice, even if the comparison is wrong? Maybe I'm being too strict about the exact wording of the comment? While docstrings would be good, we should reject incorrect comments. A better comment would either request docstrings for both methods or phrase the request differently without the incorrect comparison. Delete the comment because it incorrectly claims to be asking for consistency with the create method, which doesn't have a docstring either.
4. graphiti_core/utils/maintenance/edge_operations.py:221
  • Draft comment:
    Threshold parameter (0.8) was removed from get_relevant_edges. Verify if this change is intentional, as it may alter edge selection behavior.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 80% <= threshold 80% This comment is asking the PR author to verify if the removal of a parameter is intentional, which is against the rules. However, it is also pointing out a specific change that could alter behavior, which is a valid concern. The comment is specific and highlights a potential issue with the code change.
5. graphiti_core/utils/maintenance/node_operations.py:452
  • Draft comment:
    The list comprehension in dedupe_node_list incorrectly uses 'dict.update()', which returns None. Use dictionary unpacking (e.g. {**base, **node.attributes}) instead.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. graphiti_core/utils/maintenance/node_operations.py:100
  • Draft comment:
    Minor: The function name 'extract_nodes_reflexion' (and related variable 'reflexion_iterations' and constant 'MAX_REFLEXION_ITERATIONS') uses 'reflexion'. Consider verifying if this spelling is intentional or if 'reflection' was intended to improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_xVZbmgPkTCCnYQp8

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed b2d2a10 in 1 minute and 34 seconds. Click for details.
  • Reviewed 57 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 10 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. graphiti_core/edges.py:324
  • Draft comment:
    Indentation in the triple-quoted query has been changed. Consider using a dedent helper (e.g. textwrap.dedent) for clarity and consistency.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80% None
2. graphiti_core/edges.py:474
  • Draft comment:
    Using zip with strict=True ensures matching lengths; just verify that your environment supports Python 3.10+.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 80% The comment is informative and suggests verifying the environment's compatibility with Python 3.10+. It doesn't provide a specific code suggestion or ask for a test to be written. It also doesn't address any of the additional rules like idiomatic code or DRY principles. Therefore, it should be removed.
3. graphiti_core/embedder/voyage.py:62
  • Draft comment:
    List comprehension now explicitly casts each embedding element to float. Confirm that this conversion is required for consistency.
  • Reason this comment was not posted:
    Confidence changes required: 70% <= threshold 80% None
4. graphiti_core/nodes.py:332
  • Draft comment:
    The query for get_by_uuid now has adjusted indentation. Consider using a dedent helper to ensure consistent formatting.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80% None
5. graphiti_core/nodes.py:566
  • Draft comment:
    Using zip with strict=True in create_entity_node_embeddings requires Python 3.10+; please ensure this is acceptable in all target environments.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 20% <= threshold 80% The comment is asking the author to ensure compatibility with Python 3.10+ due to the use of zip with strict=True. This is a valid concern as it relates to ensuring the code runs in the intended environments. However, it does not directly suggest a code change or improvement, which is what the rules prioritize. The comment is informative and asks for confirmation, which violates the rules.
6. graphiti_core/edges.py:322
  • Draft comment:
    Query string indentation changed in get_by_node_uuid. Ensure consistent formatting for all Cypher queries.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
7. graphiti_core/edges.py:474
  • Draft comment:
    Using zip with strict=True enforces equal lengths; confirm that Python 3.10+ is required and this behavior is intended.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80% The comment is speculative and asks for confirmation rather than pointing out a clear issue. The strict=True parameter is actually a good safety improvement that prevents silent bugs. If Python version compatibility was a concern, it would be caught by CI/build system. The comment doesn't suggest any concrete changes. The comment does point out a potential compatibility issue that could affect deployments. Python version requirements are important to track. While Python version requirements are important, they should be handled by pyproject.toml/setup.py and CI, not through code review comments. The strict=True addition is a positive change that improves code safety. Delete the comment. It asks for confirmation rather than suggesting changes, and Python version requirements should be handled by build configuration, not code review.
8. graphiti_core/embedder/voyage.py:62
  • Draft comment:
    Casting embedding values to float ensures consistent types. This explicit type conversion improves data integrity.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 80% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states a positive aspect of the code without suggesting any changes or asking for confirmation of intent.
9. graphiti_core/nodes.py:333
  • Draft comment:
    Adjusted query string indentation in get_by_uuid for EntityNode. Verify that the new whitespace does not alter the intended query.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 80% This comment is asking the PR author to verify the behavior of the code, which is against the rules. It does not provide a specific suggestion or point out a clear issue with the code. Therefore, it should be removed.
10. graphiti_core/nodes.py:567
  • Draft comment:
    Using strict=True in zip for creating entity node embeddings enforces length matching. Ensure compatibility with Python 3.10+.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 80% The comment is informative and suggests ensuring compatibility with Python 3.10+. It doesn't provide a specific code suggestion or point out a potential issue with the current code. It also doesn't ask for a test or a specific confirmation about the code's intention. Therefore, it doesn't align with the rules for good comments.

Workflow ID: wflow_sh3rsQmWfZ6pRApn

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@prasmussen15 prasmussen15 merged commit 0b94e0e into main Apr 27, 2025
7 checks passed
@prasmussen15 prasmussen15 deleted the bulk-embed branch April 27, 2025 02:09
@getzep getzep locked and limited conversation to collaborators Apr 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant