fix: clean unused bento pieces from serve.py and serving.py #532

ishandhanani · 2025-04-07T00:27:00Z

Targeting what we use for local dev

Running list of our internal accessed libs

from bentoml._internal.service.loader import load
from _bentoml_sdk import Service
from bentoml._internal.container import BentoMLContainer
from bentoml._internal.utils.circus import Server
Accessing the internal non-dynamo component script here - _SERVICE_WORKER_SCRIPT = "_bentoml_impl.worker.service"
from _bentoml_impl.loader import import_service, normalize_identifier
from bentoml._internal.utils.circus import create_standalone_arbiter

ishandhanani · 2025-04-07T00:41:51Z

After cleaning out serve - we are left with a single internal method and a working serve command

2025-04-07T00:30:50.606Z  INFO dynamo_llm::http::service::service_v2: Starting HTTP service on: 0.0.0.0:8000 address="0.0.0.0:8000"
2025-04-07T00:30:50.606Z  INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:35635
2025-04-07T00:30:50.606Z  INFO dynamo_llm::http::service::discovery: added Chat model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
2025-04-07T00:30:50.607Z  INFO dynamo_llm::http::service::discovery: added Chat model: nvidia/Llama-3.1-405B-Instruct-FP8
2025-04-07T00:31:08.614Z  INFO config._resolve_task: This model supports multiple tasks: {'classify', 'score', 'reward', 'generate', 'embed'}. Defaulting to 'generate'.
2025-04-07T00:31:08.640Z  INFO config._resolve_task: This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:08.642Z  INFO config._resolve_task: This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:09.450Z  INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:43631
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.99s/it]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.94s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.26s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.22s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.18s/it]

Capturing CUDA graph shapes:   0%|                                                                                                                                                                                            | 0/35 [00:00<?, ?it/s]Initializied NIXL agent: c9d1dffc-a6ac-4f6f-852e-286f5f2f2c0e
2025-04-07T00:31:45.363Z  INFO prefill_worker.async_init: PrefillWorker initialized
2025-04-07T00:31:45.363Z  INFO serve_dynamo.worker: [PrefillWorker] Starting PrefillWorker instance with all registered endpoints
2025-04-07T00:31:45.363Z  INFO prefill_worker.prefill_queue_handler: Prefill queue handler entered
2025-04-07T00:31:45.363Z  INFO prefill_worker.prefill_queue_handler: Prefill queue: nats://localhost:4222:vllm
2025-04-07T00:31:45.370Z  INFO prefill_worker.prefill_queue_handler: prefill queue handler started
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00,  1.75it/s]
Initializied NIXL agent: 2aab3269-f718-46b3-a6cf-a8edf69829f1
2025-04-07T00:32:05.898Z  INFO worker.async_init: VllmWorker has been initialized
2025-04-07T00:32:05.898Z  INFO serve_dynamo.worker: [VllmWorker] Starting VllmWorker instance with all registered endpoints
2025-04-07T00:32:06.315Z  INFO logging.check_required_workers: Waiting for more workers to be ready.
 Current: 1, Required: 1
Workers ready: [7587885863974176849]
2025-04-07T00:32:06.315Z  INFO serve_dynamo.worker: [Processor] Starting Processor instance with all registered endpoints
2025-04-07T00:38:22.293Z  INFO sighandler.signal: Got signal SIG_WINCH
2025-04-07T00:38:30.323Z  INFO chat_utils.resolve_chat_template_content_format: Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
2025-04-07T00:38:30.356Z  INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:30.356Z  INFO worker.generate: Prefilling remotely for request 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0 with length 193
2025-04-07T00:38:30.360Z  INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0
2025-04-07T00:38:30.363Z  INFO prefill_worker.generate: Loaded nixl metadata from engine 531fbf01-5158-41e0-b1fa-169b8be91678 into engine dc56c449-a7d9-4e97-a0f7-8bca499935c9
2025-04-07T00:38:38.827Z  INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:38.827Z  INFO worker.generate: Prefilling remotely for request d8e79227-06a8-4b9e-8fdd-b23c99e6f532 with length 193
2025-04-07T00:38:38.831Z  INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: d8e79227-06a8-4b9e-8fdd-b23c99e6f532

ubuntu(ishan):~ curl localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    "messages": [
    {
        "role": "user",
        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
    }
    ],
    "stream":false,
    "max_tokens": 30
  }'
{"id":"56394bed-8fba-4868-b9ac-acb4e4c157b6","choices":[{"index":0,"message":{"content":"Okay, so I'm trying to develop a character background for someone who's part of a story set in the ancient city of Aeloria. The","refusal":null,"tool_calls":null,"role":"assistant","function_call":null,"audio":null},"finish_reason":"length","logprobs":null}],"created":1743986499,"model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","service_tier":null,"system_fingerprint":null,"object":"chat.completion","usage":null}

ishandhanani · 2025-04-07T01:09:26Z

After cleaning serving - things still work

ishandhanani · 2025-04-07T02:01:28Z

All 4 examples have been tested

deploy/dynamo/sdk/src/dynamo/sdk/cli/utils.py

mohammedabdulwahhab

Tested llm examples 1 and 2. Works!

…o#532)

ishandhanani and others added 17 commits April 4, 2025 17:26

init

06054e0

pre

eea4c4c

more logging vs print

754159d

bump logging

f741da1

copywrite

7f9919a

remove wack lowercase message

58123ed

Merge branch 'main' into ishan/unify-logging-examples

20ea1c6

components and utils

02e16f7

pre

0543ad4

bump

624d123

clean

90d7010

bump

c25afc4

Merge branch 'main' into ishan/unify-logging-examples

c07acc2

Merge branch 'main' into ishan/unify-logging-examples

f3597d8

remove start

0fe6144

Merge branch 'ishan/unify-logging-examples' into ishan/cleanup-serve

affa5d7

serve.py clean

d5ba3bd

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 00:27 Inactive

ishandhanani added 2 commits April 7, 2025 00:28

bring back accidental remove

ce6d820

pre

2476afa

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 00:40 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 00:41 Inactive

clean serving

a8aa675

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 01:08 Inactive

ishandhanani changed the title ~~fix: clean ununsed bento pieces~~ fix: clean unused bento pieces from serve.py and serving.py Apr 7, 2025

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 01:09 Inactive

bump bento to 1.4.7

cd56c5c

mypy1

72d4917

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 05:13 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 05:14 Inactive

mypy2

7bc353d

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 05:18 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 05:19 Inactive

Merge branch 'main' into ishan/cleanup-serve

fa71597

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 19:58 Inactive

mohammedabdulwahhab reviewed Apr 7, 2025

View reviewed changes

deploy/dynamo/sdk/src/dynamo/sdk/cli/utils.py Show resolved Hide resolved

mohammedabdulwahhab reviewed Apr 7, 2025

View reviewed changes

deploy/dynamo/sdk/src/dynamo/sdk/cli/utils.py Show resolved Hide resolved

ishandhanani enabled auto-merge (squash) April 7, 2025 20:28

ack

68261ea

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 20:31 Inactive

ack2

d5005b9

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 20:32 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 20:34 Inactive

ishandhanani mentioned this pull request Apr 7, 2025

chore: Fixed deploy/dynamo/sdk/src/dynamo/sdk/cli/serve.py header #542

Closed

dmitry-tokarev-nv approved these changes Apr 7, 2025

View reviewed changes

Merge branch 'main' into ishan/cleanup-serve

5c836de

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 21:28 Inactive

mohammedabdulwahhab approved these changes Apr 7, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 21:30 Inactive

Merge branch 'main' into ishan/cleanup-serve

29ee237

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 21:34 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2025 21:40 Inactive

ishandhanani merged commit 646e5fe into main Apr 7, 2025
8 of 9 checks passed

ishandhanani deleted the ishan/cleanup-serve branch April 7, 2025 22:00

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

fix: clean unused bento pieces from serve.py and serving.py (ai-dynam…

f91d548

…o#532)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clean unused bento pieces from serve.py and serving.py #532

fix: clean unused bento pieces from serve.py and serving.py #532

ishandhanani commented Apr 7, 2025 •

edited

Loading

ishandhanani commented Apr 7, 2025

ishandhanani commented Apr 7, 2025

ishandhanani commented Apr 7, 2025

mohammedabdulwahhab left a comment

fix: clean unused bento pieces from serve.py and serving.py #532

fix: clean unused bento pieces from serve.py and serving.py #532

Conversation

ishandhanani commented Apr 7, 2025 • edited Loading

ishandhanani commented Apr 7, 2025

ishandhanani commented Apr 7, 2025

ishandhanani commented Apr 7, 2025

mohammedabdulwahhab left a comment

Choose a reason for hiding this comment

ishandhanani commented Apr 7, 2025 •

edited

Loading