Skip to content

fix: clean unused bento pieces from serve.py and serving.py #532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Apr 7, 2025

Conversation

ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Apr 7, 2025

Targeting what we use for local dev

Running list of our internal accessed libs

  • from bentoml._internal.service.loader import load
  • from _bentoml_sdk import Service
  • from bentoml._internal.container import BentoMLContainer
  • from bentoml._internal.utils.circus import Server
  • Accessing the internal non-dynamo component script here - _SERVICE_WORKER_SCRIPT = "_bentoml_impl.worker.service"
  • from _bentoml_impl.loader import import_service, normalize_identifier
  • from bentoml._internal.utils.circus import create_standalone_arbiter

@ishandhanani
Copy link
Contributor Author

After cleaning out serve - we are left with a single internal method and a working serve command

2025-04-07T00:30:50.606Z  INFO dynamo_llm::http::service::service_v2: Starting HTTP service on: 0.0.0.0:8000 address="0.0.0.0:8000"
2025-04-07T00:30:50.606Z  INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:35635
2025-04-07T00:30:50.606Z  INFO dynamo_llm::http::service::discovery: added Chat model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
2025-04-07T00:30:50.607Z  INFO dynamo_llm::http::service::discovery: added Chat model: nvidia/Llama-3.1-405B-Instruct-FP8
2025-04-07T00:31:08.614Z  INFO config._resolve_task: This model supports multiple tasks: {'classify', 'score', 'reward', 'generate', 'embed'}. Defaulting to 'generate'.
2025-04-07T00:31:08.640Z  INFO config._resolve_task: This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:08.642Z  INFO config._resolve_task: This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:09.450Z  INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:43631
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.99s/it]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.94s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.26s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.22s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00,  2.18s/it]

Capturing CUDA graph shapes:   0%|                                                                                                                                                                                            | 0/35 [00:00<?, ?it/s]Initializied NIXL agent: c9d1dffc-a6ac-4f6f-852e-286f5f2f2c0e
2025-04-07T00:31:45.363Z  INFO prefill_worker.async_init: PrefillWorker initialized
2025-04-07T00:31:45.363Z  INFO serve_dynamo.worker: [PrefillWorker] Starting PrefillWorker instance with all registered endpoints
2025-04-07T00:31:45.363Z  INFO prefill_worker.prefill_queue_handler: Prefill queue handler entered
2025-04-07T00:31:45.363Z  INFO prefill_worker.prefill_queue_handler: Prefill queue: nats://localhost:4222:vllm
2025-04-07T00:31:45.370Z  INFO prefill_worker.prefill_queue_handler: prefill queue handler started
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00,  1.75it/s]
Initializied NIXL agent: 2aab3269-f718-46b3-a6cf-a8edf69829f1
2025-04-07T00:32:05.898Z  INFO worker.async_init: VllmWorker has been initialized
2025-04-07T00:32:05.898Z  INFO serve_dynamo.worker: [VllmWorker] Starting VllmWorker instance with all registered endpoints
2025-04-07T00:32:06.315Z  INFO logging.check_required_workers: Waiting for more workers to be ready.
 Current: 1, Required: 1
Workers ready: [7587885863974176849]
2025-04-07T00:32:06.315Z  INFO serve_dynamo.worker: [Processor] Starting Processor instance with all registered endpoints
2025-04-07T00:38:22.293Z  INFO sighandler.signal: Got signal SIG_WINCH
2025-04-07T00:38:30.323Z  INFO chat_utils.resolve_chat_template_content_format: Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
2025-04-07T00:38:30.356Z  INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:30.356Z  INFO worker.generate: Prefilling remotely for request 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0 with length 193
2025-04-07T00:38:30.360Z  INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0
2025-04-07T00:38:30.363Z  INFO prefill_worker.generate: Loaded nixl metadata from engine 531fbf01-5158-41e0-b1fa-169b8be91678 into engine dc56c449-a7d9-4e97-a0f7-8bca499935c9
2025-04-07T00:38:38.827Z  INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:38.827Z  INFO worker.generate: Prefilling remotely for request d8e79227-06a8-4b9e-8fdd-b23c99e6f532 with length 193
2025-04-07T00:38:38.831Z  INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: d8e79227-06a8-4b9e-8fdd-b23c99e6f532
ubuntu(ishan):~ curl localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    "messages": [
    {
        "role": "user",
        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
    }
    ],
    "stream":false,
    "max_tokens": 30
  }'
{"id":"56394bed-8fba-4868-b9ac-acb4e4c157b6","choices":[{"index":0,"message":{"content":"Okay, so I'm trying to develop a character background for someone who's part of a story set in the ancient city of Aeloria. The","refusal":null,"tool_calls":null,"role":"assistant","function_call":null,"audio":null},"finish_reason":"length","logprobs":null}],"created":1743986499,"model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","service_tier":null,"system_fingerprint":null,"object":"chat.completion","usage":null}

@ishandhanani ishandhanani changed the title fix: clean ununsed bento pieces fix: clean unused bento pieces from serve.py and serving.py Apr 7, 2025
@ishandhanani
Copy link
Contributor Author

After cleaning serving - things still work

@ishandhanani
Copy link
Contributor Author

All 4 examples have been tested

@ishandhanani ishandhanani enabled auto-merge (squash) April 7, 2025 20:28
Copy link
Contributor

@mohammedabdulwahhab mohammedabdulwahhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested llm examples 1 and 2. Works!

@ishandhanani ishandhanani merged commit 646e5fe into main Apr 7, 2025
8 of 9 checks passed
@ishandhanani ishandhanani deleted the ishan/cleanup-serve branch April 7, 2025 22:00
kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants