Skip to content

Commit 45fb85f

Browse files
committed
Add E2B as an optional code sandbox provider
- Specify E2B api key and template to use via env variables - Try load, use e2b library when E2B api key set - Fallback to try use terrarium sandbox otherwise - Enable more python packages in e2b sandbox like rdkit via custom e2b template - Use Async E2B Sandbox - Parallelize file IO with sandbox - Add documentation on how to enable E2B as code sandbox instead of Terrarium
1 parent b4183c7 commit 45fb85f

File tree

7 files changed

+157
-18
lines changed

7 files changed

+157
-18
lines changed

.github/workflows/run_evals.yml

+11
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,14 @@ on:
3232
required: false
3333
default: 200
3434
type: number
35+
sandbox:
36+
description: 'Code sandbox to use'
37+
required: false
38+
default: 'terrarium'
39+
type: choice
40+
options:
41+
- terrarium
42+
- e2b
3543

3644
jobs:
3745
eval:
@@ -100,6 +108,8 @@ jobs:
100108
SERPER_DEV_API_KEY: ${{ matrix.dataset != 'math500' && secrets.SERPER_DEV_API_KEY }}
101109
OLOSTEP_API_KEY: ${{ matrix.dataset != 'math500' && secrets.OLOSTEP_API_KEY }}
102110
HF_TOKEN: ${{ secrets.HF_TOKEN }}
111+
E2B_API_KEY: ${{ inputs.sandbox == 'e2b' && secrets.E2B_API_KEY }}
112+
E2B_TEMPLATE: ${{ vars.E2B_TEMPLATE }}
103113
KHOJ_ADMIN_EMAIL: khoj
104114
KHOJ_ADMIN_PASSWORD: khoj
105115
POSTGRES_HOST: localhost
@@ -148,6 +158,7 @@ jobs:
148158
echo "**$(head -n 1 *_evaluation_summary_*.txt)**" >> $GITHUB_STEP_SUMMARY
149159
echo "- Khoj Version: ${{ steps.hatch.outputs.version }}" >> $GITHUB_STEP_SUMMARY
150160
echo "- Chat Model: Gemini 2.0 Flash" >> $GITHUB_STEP_SUMMARY
161+
echo "- Code Sandbox: ${{ inputs.sandbox}}" >> $GITHUB_STEP_SUMMARY
151162
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
152163
tail -n +2 *_evaluation_summary_*.txt >> $GITHUB_STEP_SUMMARY
153164
echo "" >> $GITHUB_STEP_SUMMARY

docker-compose.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,10 @@ services:
5858
- KHOJ_DEBUG=False
5959
6060
- KHOJ_ADMIN_PASSWORD=password
61-
# Default URL of Terrarium, the Python sandbox used by Khoj to run code. Its container is specified above
61+
# Default URL of Terrarium, the default Python sandbox used by Khoj to run code. Its container is specified above
6262
- KHOJ_TERRARIUM_URL=http://sandbox:8080
63+
# Uncomment line below to have Khoj run code in remote E2B code sandbox instead of the self-hosted Terrarium sandbox above. Get your E2B API key from https://e2b.dev/.
64+
# - E2B_API_KEY=your_e2b_api_key
6365
# Default URL of SearxNG, the default web search engine used by Khoj. Its container is specified above
6466
- KHOJ_SEARXNG_URL=http://search:8080
6567
# Uncomment line below to use with Ollama running on your local machine at localhost:11434.

documentation/docs/features/code_execution.md

+17-7
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,23 @@
33

44
# Code Execution
55

6-
Khoj can generate and run very simple Python code snippets as well. This is useful if you want to generate a plot, run a simple calculation, or do some basic data manipulation. LLMs by default aren't skilled at complex quantitative tasks. Code generation & execution can come in handy for such tasks.
6+
Khoj can generate and run simple Python code as well. This is useful if you want to have Khoj do some data analysis, generate plots and reports. LLMs by default aren't skilled at complex quantitative tasks. Code generation & execution can come in handy for such tasks.
77

8-
Just use `/code` in your chat command.
8+
Khoj automatically infers when to use the code tool. You can also tell it explicitly to use the code tool or use the `/code` [slash command](https://docs.khoj.dev/features/chat/#commands) in your chat.
99

10-
### Setup (Self-Hosting)
11-
Run [Cohere's Terrarium](https://github.com/cohere-ai/cohere-terrarium) on your machine to enable code generation and execution.
10+
## Setup (Self-Hosting)
11+
### Terrarium Sandbox
12+
Use [Cohere's Terrarium](https://github.com/cohere-ai/cohere-terrarium) to host the code sandbox locally on your machine for free.
1213

13-
Check the [instructions](https://github.com/cohere-ai/cohere-terrarium?tab=readme-ov-file#development) for running from source.
14-
15-
For running with Docker, you can use our [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml), or start it manually like this:
14+
To run with Docker, use our [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to automatically setup the Terrarium code sandbox, or start it manually like this:
1615

1716
```bash
1817
docker pull ghcr.io/khoj-ai/terrarium:latest
1918
docker run -d -p 8080:8080 ghcr.io/khoj-ai/terrarium:latest
2019
```
2120

21+
To run from source, check [these instructions](https://github.com/khoj-ai/cohere-terrarium?tab=readme-ov-file#development).
22+
2223
#### Verify
2324
Verify that it's running, by evaluating a simple Python expression:
2425

@@ -28,3 +29,12 @@ curl -X POST -H "Content-Type: application/json" \
2829
--data-raw '{"code": "1 + 1"}' \
2930
--no-buffer
3031
```
32+
33+
### E2B Sandbox
34+
[E2B](https://e2b.dev/) allows Khoj to run code on a remote but versatile sandbox with support for more python libraries. This is [not free](https://e2b.dev/pricing).
35+
36+
To have Khoj use E2B as the code sandbox:
37+
1. Generate an API key on [their dashboard](https://e2b.dev/dashboard).
38+
2. Set the `E2B_API_KEY` environment variable to it on the machine running your Khoj server.
39+
- When using our [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml), uncomment and set the `E2B_API_KEY` env var in the `docker-compose.yml` file.
40+
3. Now restart your Khoj server to switch to using the E2B code sandbox.

pyproject.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ dependencies = [
6868
"authlib == 1.2.1",
6969
"llama-cpp-python == 0.2.88",
7070
"itsdangerous == 2.1.2",
71-
"httpx == 0.25.0",
71+
"httpx == 0.27.2",
7272
"pgvector == 0.2.4",
7373
"psycopg2-binary == 2.9.9",
7474
"lxml == 4.9.3",
@@ -92,6 +92,7 @@ dependencies = [
9292
"pyjson5 == 1.6.7",
9393
"resend == 1.0.1",
9494
"email-validator == 2.2.0",
95+
"e2b-code-interpreter ~= 1.0.0",
9596
]
9697
dynamic = ["version"]
9798

src/khoj/processor/conversation/prompts.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -974,9 +974,8 @@
974974
python_code_generation_prompt = PromptTemplate.from_template(
975975
"""
976976
You are Khoj, an advanced python programmer. You are tasked with constructing a python program to best answer the user query.
977-
- The python program will run in a pyodide python sandbox with no network access.
977+
- The python program will run in a sandbox with no network access.
978978
- You can write programs to run complex calculations, analyze data, create charts, generate documents to meticulously answer the query.
979-
- The sandbox has access to the standard library, matplotlib, panda, numpy, scipy, bs4 and sympy packages. The requests, torch, catboost, tensorflow and tkinter packages are not available.
980979
- List known file paths to required user documents in "input_files" and known links to required documents from the web in the "input_links" field.
981980
- The python program should be self-contained. It can only read data generated by the program itself and from provided input_files, input_links by their basename (i.e filename excluding file path).
982981
- Do not try display images or plots in the code directly. The code should save the image or plot to a file instead.
@@ -1030,6 +1029,13 @@
10301029
""".strip()
10311030
)
10321031

1032+
e2b_sandbox_context = """
1033+
- The sandbox has access to only the standard library, matplotlib, pandas, numpy, scipy, bs4, sympy, einops, biopython, shapely, plotly and rdkit packages. The requests, torch, catboost, tensorflow and tkinter packages are not available.
1034+
""".strip()
1035+
1036+
terrarium_sandbox_context = """
1037+
The sandbox has access to the standard library, matplotlib, pandas, numpy, scipy, bs4 and sympy packages. The requests, torch, catboost, tensorflow, rdkit and tkinter packages are not available.
1038+
""".strip()
10331039

10341040
# Automations
10351041
# --

src/khoj/processor/tools/run_code.py

+105-5
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,12 @@
2727
load_complex_json,
2828
)
2929
from khoj.routers.helpers import send_message_to_model_wrapper
30-
from khoj.utils.helpers import is_none_or_empty, timer, truncate_code_context
30+
from khoj.utils.helpers import (
31+
is_e2b_code_sandbox_enabled,
32+
is_none_or_empty,
33+
timer,
34+
truncate_code_context,
35+
)
3136
from khoj.utils.rawconfig import LocationData
3237

3338
logger = logging.getLogger(__name__)
@@ -131,6 +136,12 @@ async def generate_python_code(
131136
prompts.personality_context.format(personality=agent.personality) if agent and agent.personality else ""
132137
)
133138

139+
# add sandbox specific context like available packages
140+
sandbox_context = (
141+
prompts.e2b_sandbox_context if is_e2b_code_sandbox_enabled() else prompts.terrarium_sandbox_context
142+
)
143+
personality_context = f"{sandbox_context}\n{personality_context}"
144+
134145
code_generation_prompt = prompts.python_code_generation_prompt.format(
135146
current_date=utc_date,
136147
query=q,
@@ -182,15 +193,104 @@ async def execute_sandboxed_python(code: str, input_data: list[dict], sandbox_ur
182193
Reference data i/o format based on Terrarium example client code at:
183194
https://github.com/cohere-ai/cohere-terrarium/blob/main/example-clients/python/terrarium_client.py
184195
"""
185-
headers = {"Content-Type": "application/json"}
186196
cleaned_code = clean_code_python(code)
187-
data = {"code": cleaned_code, "files": input_data}
197+
if is_e2b_code_sandbox_enabled():
198+
try:
199+
return await execute_e2b(cleaned_code, input_data)
200+
except ImportError:
201+
pass
202+
return await execute_terrarium(cleaned_code, input_data, sandbox_url)
203+
204+
205+
async def execute_e2b(code: str, input_files: list[dict]) -> dict[str, Any]:
206+
"""Execute code and handle file I/O in e2b sandbox"""
207+
from e2b_code_interpreter import AsyncSandbox
208+
209+
sandbox = await AsyncSandbox.create(
210+
api_key=os.getenv("E2B_API_KEY"),
211+
template=os.getenv("E2B_TEMPLATE", "pmt2o0ghpang8gbiys57"),
212+
timeout=120,
213+
request_timeout=30,
214+
)
215+
216+
try:
217+
# Upload input files in parallel
218+
upload_tasks = [
219+
sandbox.files.write(path=file["filename"], data=base64.b64decode(file["b64_data"]), request_timeout=30)
220+
for file in input_files
221+
]
222+
await asyncio.gather(*upload_tasks)
188223

224+
# Note stored files before execution
225+
E2bFile = NamedTuple("E2bFile", [("name", str), ("path", str)])
226+
original_files = {E2bFile(f.name, f.path) for f in await sandbox.files.list("~")}
227+
228+
# Execute code from main.py file
229+
execution = await sandbox.run_code(code=code, timeout=60)
230+
231+
# Collect output files
232+
output_files = []
233+
234+
# Identify new files created during execution
235+
new_files = set(E2bFile(f.name, f.path) for f in await sandbox.files.list("~")) - original_files
236+
# Read newly created files in parallel
237+
download_tasks = [sandbox.files.read(f.path, request_timeout=30) for f in new_files]
238+
downloaded_files = await asyncio.gather(*download_tasks)
239+
for f, content in zip(new_files, downloaded_files):
240+
if isinstance(content, bytes):
241+
# Binary files like PNG - encode as base64
242+
b64_data = base64.b64encode(content).decode("utf-8")
243+
elif Path(f.name).suffix in [".png", ".jpeg", ".jpg", ".svg"]:
244+
# Ignore image files as they are extracted from execution results below for inline display
245+
continue
246+
else:
247+
# Text files - encode utf-8 string as base64
248+
b64_data = base64.b64encode(content.encode("utf-8")).decode("utf-8")
249+
output_files.append({"filename": f.name, "b64_data": b64_data})
250+
251+
# Collect output files from execution results
252+
for idx, result in enumerate(execution.results):
253+
for result_type in ["png", "jpeg", "svg", "text", "markdown", "json"]:
254+
if b64_data := getattr(result, result_type, None):
255+
output_files.append({"filename": f"{idx}.{result_type}", "b64_data": b64_data})
256+
break
257+
258+
# collect logs
259+
success = not execution.error and not execution.logs.stderr
260+
stdout = "\n".join(execution.logs.stdout)
261+
errors = "\n".join(execution.logs.stderr)
262+
if execution.error:
263+
errors = f"{execution.error}\n{errors}"
264+
265+
return {
266+
"code": code,
267+
"success": success,
268+
"std_out": stdout,
269+
"std_err": errors,
270+
"output_files": output_files,
271+
}
272+
except Exception as e:
273+
return {
274+
"code": code,
275+
"success": False,
276+
"std_err": f"Sandbox failed to execute code: {str(e)}",
277+
"output_files": [],
278+
}
279+
280+
281+
async def execute_terrarium(
282+
code: str,
283+
input_data: list[dict],
284+
sandbox_url: str,
285+
) -> dict[str, Any]:
286+
"""Execute code using Terrarium sandbox"""
287+
headers = {"Content-Type": "application/json"}
288+
data = {"code": code, "files": input_data}
189289
async with aiohttp.ClientSession() as session:
190290
async with session.post(sandbox_url, json=data, headers=headers, timeout=30) as response:
191291
if response.status == 200:
192292
result: dict[str, Any] = await response.json()
193-
result["code"] = cleaned_code
293+
result["code"] = code
194294
# Store decoded output files
195295
result["output_files"] = result.get("output_files", [])
196296
for output_file in result["output_files"]:
@@ -202,7 +302,7 @@ async def execute_sandboxed_python(code: str, input_data: list[dict], sandbox_ur
202302
return result
203303
else:
204304
return {
205-
"code": cleaned_code,
305+
"code": code,
206306
"success": False,
207307
"std_err": f"Failed to execute code with {response.status}",
208308
"output_files": [],

src/khoj/utils/helpers.py

+11-2
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,12 @@ def get_device() -> torch.device:
321321
return torch.device("cpu")
322322

323323

324+
def is_e2b_code_sandbox_enabled():
325+
"""Check if E2B code sandbox is enabled.
326+
Set E2B_API_KEY environment variable to use it."""
327+
return not is_none_or_empty(os.getenv("E2B_API_KEY"))
328+
329+
324330
class ConversationCommand(str, Enum):
325331
Default = "default"
326332
General = "general"
@@ -362,20 +368,23 @@ class ConversationCommand(str, Enum):
362368
ConversationCommand.Code: "Agent can run Python code to parse information, run complex calculations, create documents and charts.",
363369
}
364370

371+
e2b_tool_description = "To run Python code in a E2B sandbox with no network access. Helpful to parse complex information, run calculations, create text documents and create charts with quantitative data. Only matplotlib, pandas, numpy, scipy, bs4, sympy, einops, biopython, shapely and rdkit external packages are available."
372+
terrarium_tool_description = "To run Python code in a Terrarium, Pyodide sandbox with no network access. Helpful to parse complex information, run complex calculations, create plaintext documents and create charts with quantitative data. Only matplotlib, panda, numpy, scipy, bs4 and sympy external packages are available."
373+
365374
tool_descriptions_for_llm = {
366375
ConversationCommand.Default: "To use a mix of your internal knowledge and the user's personal knowledge, or if you don't entirely understand the query.",
367376
ConversationCommand.General: "To use when you can answer the question without any outside information or personal knowledge",
368377
ConversationCommand.Notes: "To search the user's personal knowledge base. Especially helpful if the question expects context from the user's notes or documents.",
369378
ConversationCommand.Online: "To search for the latest, up-to-date information from the internet. Note: **Questions about Khoj should always use this data source**",
370379
ConversationCommand.Webpage: "To use if the user has directly provided the webpage urls or you are certain of the webpage urls to read.",
371-
ConversationCommand.Code: "To run Python code in a Pyodide sandbox with no network access. Helpful when need to parse complex information, run complex calculations, create plaintext documents, and create charts with quantitative data. Only matplotlib, panda, numpy, scipy, bs4 and sympy external packages are available.",
380+
ConversationCommand.Code: e2b_tool_description if is_e2b_code_sandbox_enabled() else terrarium_tool_description,
372381
}
373382

374383
function_calling_description_for_llm = {
375384
ConversationCommand.Notes: "To search the user's personal knowledge base. Especially helpful if the question expects context from the user's notes or documents.",
376385
ConversationCommand.Online: "To search the internet for information. Useful to get a quick, broad overview from the internet. Provide all relevant context to ensure new searches, not in previous iterations, are performed.",
377386
ConversationCommand.Webpage: "To extract information from webpages. Useful for more detailed research from the internet. Usually used when you know the webpage links to refer to. Share the webpage links and information to extract in your query.",
378-
ConversationCommand.Code: "To run Python code in a Pyodide sandbox with no network access. Helpful when need to parse complex information, run complex calculations, create plaintext documents, and create charts with quantitative data. Only matplotlib, panda, numpy, scipy, bs4 and sympy external packages are available.",
387+
ConversationCommand.Code: e2b_tool_description if is_e2b_code_sandbox_enabled() else terrarium_tool_description,
379388
}
380389

381390
mode_descriptions_for_llm = {

0 commit comments

Comments
 (0)