Skip to content

Data Science Suite #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 74 additions & 66 deletions site/content/3.13/data-science/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,81 @@ title: Data Science
menuTitle: Data Science
weight: 115
description: >-
ArangoDB lets you apply analytics and machine learning to graph data at scale
ArangoDB's set of tools and technologies enables analytics, machine learning,
and GenAI applications powered by graph data
aliases:
- data-science/overview
---
ArangoDB provides a wide range of functionality that can be utilized for
data science applications. The core database system includes multi-model storage
of information with scalable graph and information retrieval capabilities that
you can directly use for your research and product development.

ArangoDB also offers a dedicated Data Science Suite, using the database core
as the foundation for higher-level features. Whether you want to turbocharge
generative AI applications with a GraphRAG solution or apply analytics and
machine learning to graph data at scale, ArangoDB covers these needs.

<!--
ArangoDB's Graph Analytics and GraphML capabilities provide various solutions
in data science and data analytics. Multiple data science personas within the
engineering space can make use of ArangoDB's set of tools and technologies that
enable analytics and machine learning on graph data.
-->

## Data Science Suite

The Data Science Suite (DSS) is comprised of three major components:

- [**HybridRAG**](#hybridrag): A complete solution for extracting entities
from text files to create a knowledge graph that you can then query with a
natural language interface.
- [**GraphML**](#graphml): Apply machine learning to graphs for link prediction,
classification, and similar tasks.
- [**Graph Analytics**](#graph-analytics): Run graph algorithms such as PageRank
on dedicated compute resources.

Each component has an intuitive graphical user interface integrated into the
ArangoDB Platform web interface, guiding you through the process.
<!-- TODO: Not Graph Analytics? -->

Alongside these components, you also get the following additional features:

ArangoDB, as the foundation for GraphML, comes with the following key features:
- **Graph visualizer**: A web-based tool for exploring your graph data with an
intuitive interface and sophisticated querying capabilities.
- **Jupyter notebooks**: Run a Jupyter kernel in the platform for hosting
interactive notebooks for experimentation and development of applications
that use ArangoDB as their backend.
- **MLflow integration**: Built-in support for the popular management tool for
the machine learning lifecycle.
- **Adapters**: Use ArangoDB together with cuGraph, NetworkX, and other tools.
- **Application Programming Interfaces**: Use the underlying APIs of the
Data Science Suite services and build your own integrations.

- **Scalable**: designed to support true scalability with high performance for
## From graph to AI

This section classifies the complexity of the queries you can answer with
ArangoDB and gives you an overview of the respective feature.

It starts with running a simple query that shows what is the path that goes from
one node to another, continues with more complex tasks like graph classification,
link prediction, and node classification, and ends with generative AI solutions
powered by graph relationships and vector embeddings.

### Foundational features

ArangoDB comes with the following key features:

- **Scalable**: Designed to support true scalability with high performance for
enterprise use cases.
- **Simple Ingestion**: easy integration in existing data infrastructure with
- **Simple Ingestion**: Easy integration in existing data infrastructure with
connectors to all leading data processing and data ecosystems.
- **Source-Available**: extensibility and community.
- **NLP Support**: built-in text processing, search, and similarity ranking.

![ArangoDB Machine Learning Architecture](../../images/machine-learning-architecture.png)
- **Source-Available**: Extensibility and community.
- **NLP Support**: Built-in text processing, search, and similarity ranking.

## Graph Analytics vs. GraphML
<!-- TODO: This is actually GraphML specific... -->

This section classifies the complexity of the queries we can answer -
like running a simple query that shows what is the path that goes from one node
to another, or more complex tasks like node classification,
link prediction, and graph classification.
![ArangoDB Machine Learning Architecture](../../images/machine-learning-architecture.png)

### Graph Queries

Expand Down Expand Up @@ -69,65 +118,24 @@ GraphML can answer questions like:
![Graph ML](../../images/graph-ml.png)

For ArangoDB's enterprise-ready, graph-powered machine learning offering,
see [ArangoGraphML](arangographml/_index.md).

## Use Cases

This section contains an overview of different use cases where Graph Analytics
and GraphML can be applied.

### GraphML

GraphML capabilities of using more data outperform conventional deep learning
methods and **solve high-computational complexity graph problems**, such as:
- Drug discovery, repurposing, and predicting adverse effects.
- Personalized product/service recommendation.
- Supply chain and logistics.

With GraphML, you can also **predict relationships and structures**, such as:
- Predict molecules for treating diseases (precision medicine).
- Predict fraudulent behavior, credit risk, purchase of product or services.
- Predict relationships among customers, accounts.

ArangoDB uses well-known GraphML frameworks like
[Deep Graph Library](https://www.dgl.ai)
and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)
and connects to these external machine learning libraries. When coupled to
ArangoDB, you are essentially integrating them with your graph dataset.

## Example: ArangoFlix

ArangoFlix is a complete movie recommendation application that predicts missing
links between a user and the movies they have not watched yet.

This [interactive tutorial](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Integrate_ArangoDB_with_PyG.ipynb)
demonstrates how to integrate ArangoDB with PyTorch Geometric to
build recommendation systems using Graph Neural Networks (GNNs).

The full ArangoFlix demo website is accessible from the ArangoGraph Insights Platform,
the managed cloud for ArangoDB. You can open the demo website that connects to
your running database from the **Examples** tab of your deployment.
see [ArangoGraphML](graphml/_index.md).

{{< tip >}}
You can try out the ArangoGraph Insights Platform free of charge for 14 days.
Sign up at [dashboard.arangodb.cloud](https://dashboard.arangodb.cloud/home?utm_source=docs&utm_medium=cluster_pages&utm_campaign=docs_traffic).
{{< /tip >}}
### HybridRAG

The ArangoFlix demo uses five different recommendation methods:
- Content-Based using AQL
- Collaborative Filtering using AQL
- Content-Based using ML
- Matrix Factorization
- Graph Neural Networks
HybridRAG is ArangoDB's turn-key solution to turn your organization's data into
a knowledge graph and let everyone utilize the knowledge by asking questions in
natural language.

![ArangoFlix demo](../../images/data-science-arangoflix.png)
HybridRAG combines vector search for retrieving related text snippets
with graph-based retrieval augmented generation (GraphRAG) for context expansion
and relationship discovery. This lets a large language model (LLM) generate
answers that are accurate, context-aware, and chronologically structured.
This approach combats the common problem of hallucination.

The ArangoFlix website not only offers an example of how the user recommendations might
look like in real life, but it also provides information on a recommendation method,
an AQL query, a custom graph visualization for each movie, and more.
To learn more, see the [HybridRAG](hybrid-rag.md) documentation.

## Sample datasets

If you want to try out ArangoDB's data science features, you may use the
[`arango_datasets` Python package](../components/tools/arango-datasets.md)
[`arango-datasets` Python package](../components/tools/arango-datasets.md)
to load sample datasets into a deployment.
2 changes: 1 addition & 1 deletion site/content/3.13/data-science/adapters/_index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Adapters
menuTitle: Adapters
weight: 140
weight: 50
description: >-
ArangoDB offers multiple adapters that enable seamless integration with
data science tools
Expand Down
2 changes: 1 addition & 1 deletion site/content/3.13/data-science/arangograph-notebooks.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: ArangoGraph Notebooks
menuTitle: ArangoGraph Notebooks
weight: 130
weight: 40
description: >-
Colocated Jupyter Notebooks within the ArangoGraph Insights Platform
---
Expand Down
2 changes: 1 addition & 1 deletion site/content/3.13/data-science/graph-analytics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Graph Analytics
menuTitle: Graph Analytics
weight: 123
weight: 30
description: |
ArangoGraph offers Graph Analytics Engines to run graph algorithms on your
data separately from your ArangoDB deployments
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,68 @@
---
title: ArangoGraphML
menuTitle: ArangoGraphML
weight: 125
title: ArangoGraphML # Rename as well?
menuTitle: GraphML
weight: 20
description: >-
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed
aliases:
- graphml
- arangographml
---
Traditional Machine Learning (ML) overlooks the connections and relationships
between data points, which is where graph machine learning excels. However,
accessibility to GraphML has been limited to sizable enterprises equipped with
specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
enabling a broader range of personas to extract profound insights from their data.

## Use cases

GraphML capabilities of using more data outperform conventional deep learning
methods and **solve high-computational complexity graph problems**, such as:
- Drug discovery, repurposing, and predicting adverse effects.
- Personalized product/service recommendation.
- Supply chain and logistics.

With GraphML, you can also **predict relationships and structures**, such as:
- Predict molecules for treating diseases (precision medicine).
- Predict fraudulent behavior, credit risk, purchase of product or services.
- Predict relationships among customers, accounts.

ArangoDB uses well-known GraphML frameworks like
[Deep Graph Library](https://www.dgl.ai)
and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)
and connects to these external machine learning libraries. When coupled to
ArangoDB, you are essentially integrating them with your graph dataset.

#### Example: ArangoFlix

ArangoFlix is a complete movie recommendation application that predicts missing
links between a user and the movies they have not watched yet.

This [interactive tutorial](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Integrate_ArangoDB_with_PyG.ipynb)
demonstrates how to integrate ArangoDB with PyTorch Geometric to
build recommendation systems using Graph Neural Networks (GNNs).

The full ArangoFlix demo website is accessible from the ArangoGraph Insights Platform,
the managed cloud for ArangoDB. You can open the demo website that connects to
your running database from the **Examples** tab of your deployment.

{{< tip >}}
You can try out the ArangoGraph Insights Platform free of charge for 14 days.
Sign up at [dashboard.arangodb.cloud](https://dashboard.arangodb.cloud/home?utm_source=docs&utm_medium=cluster_pages&utm_campaign=docs_traffic).
{{< /tip >}}

The ArangoFlix demo uses five different recommendation methods:
- Content-Based using AQL
- Collaborative Filtering using AQL
- Content-Based using ML
- Matrix Factorization
- Graph Neural Networks

![ArangoFlix demo](../../../images/data-science-arangoflix.png)

The ArangoFlix website not only offers an example of how the user recommendations might
look like in real life, but it also provides information on a recommendation method,
an AQL query, a custom graph visualization for each movie, and more.

## How GraphML works

Graph machine learning leverages the inherent structure of graph data, where
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ weight: 5
description: >-
You can deploy ArangoGraphML in your own Kubernetes cluster or use the managed
cloud service that comes with a ready-to-go, pre-configured environment
aliases:
- ../arangographml/deploy
---

## Managed cloud service versus self-managed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ weight: 10
description: >-
How to control all resources inside ArangoGraphML in a scriptable manner
aliases:
- getting-started-with-arangographml
- ../arangographml/getting-started-with-arangographml
- ../arangographml/getting-started
---
ArangoGraphML provides an easy-to-use & scalable interface to run Graph Machine Learning on ArangoDB Data. Since all of the orchestration and ML logic is managed by ArangoGraph, all that is typically required are JSON specifications outlining individual processes to solve an ML Task. If you are using the self-managed solution, additional configurations may be required.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
---
title: Large Language Models (LLMs) and Knowledge Graphs
menuTitle: Large Language Models and Knowledge Graphs
weight: 133
title: Graph-powered HybridRAG
menuTitle: HybridRAG
weight: 10
description: >-
Integrate large language models (LLMs) with knowledge graphs using ArangoDB
ArangoDB's HybridRAG combines graph-based retrieval augmented generation
(GraphRAG) with Large Language Models (LLMs) for turbocharged Gen AI solutions
aliases:
llm-knowledge-graphs
# TODO: Repurpose for GenAI
---
Large language models (LLMs) and knowledge graphs are two prominent and
contrasting concepts, each possessing unique characteristics and functionalities
Expand All @@ -25,7 +29,17 @@ ArangoDB's unique capabilities and flexible integration of knowledge graphs and
LLMs provide a powerful and efficient solution for anyone seeking to extract
valuable insights from diverse datasets.

## Knowledge Graphs
The HybridRAG component of the Data Science Suite brings all the capabilities
together with an easy-to-use interface so you can make the knowledge accessible
to your organization.

## HybridRAG

ArangoDB's HybridRAG solution democratizes the creation and usage of knowledge
graphs with a unique combination of vector search, graphs, and LLMs in a
single product.

### Knowledge Graphs

A knowledge graph can be thought of as a dynamic and interconnected network of
real-world entities and the intricate relationships that exist between them.
Expand All @@ -48,7 +62,29 @@ the following tasks:

![ArangoDB Knowledge Graphs and LLMs](../../images/ArangoDB-knowledge-graphs-meets-llms.png)

## ArangoDB and LangChain
### Examples

### Services

#### Service A

#### Service B

### Interfaces

{{< tabs "interfaces" >}}

{{< tab "Web interface" >}}
1. In the Platform UI, ...
{{< /tab >}}

{{< tab "cURL" >}}
curl http://localhost:8529/gen-ai/
{{< /tab >}}

{{< /tabs >}}

#### ArangoDB and LangChain

[LangChain](https://www.langchain.com/) is a framework for developing applications
powered by language models.
Expand All @@ -62,12 +98,12 @@ data seamlessly via natural language, eliminating the need for query language
design. By using LLM chat models such as OpenAI’s ChatGPT, you can "speak" to
your data instead of querying it.

### Get started with ArangoDB QA chain
##### Get started with ArangoDB QA chain

The [ArangoDB QA chain notebook](https://langchain-langchain.vercel.app/docs/use_cases/more/graph/graph_arangodb_qa.html)
shows how to use LLMs to provide a natural language interface to an ArangoDB
instance.

Run the notebook directly in [Google Colab](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Langchain.ipynb).

See also other [machine learning interactive tutorials](https://github.com/arangodb/interactive_tutorials#machine-learning).
See also other [machine learning interactive tutorials](https://github.com/arangodb/interactive_tutorials#machine-learning).