-
Notifications
You must be signed in to change notification settings - Fork 337
fix: clean unused bento pieces from serve.py and serving.py #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
After cleaning out serve - we are left with a single internal method and a working serve command 2025-04-07T00:30:50.606Z INFO dynamo_llm::http::service::service_v2: Starting HTTP service on: 0.0.0.0:8000 address="0.0.0.0:8000"
2025-04-07T00:30:50.606Z INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:35635
2025-04-07T00:30:50.606Z INFO dynamo_llm::http::service::discovery: added Chat model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
2025-04-07T00:30:50.607Z INFO dynamo_llm::http::service::discovery: added Chat model: nvidia/Llama-3.1-405B-Instruct-FP8
2025-04-07T00:31:08.614Z INFO config._resolve_task: This model supports multiple tasks: {'classify', 'score', 'reward', 'generate', 'embed'}. Defaulting to 'generate'.
2025-04-07T00:31:08.640Z INFO config._resolve_task: This model supports multiple tasks: {'reward', 'embed', 'classify', 'generate', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:08.642Z INFO config._resolve_task: This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'.
2025-04-07T00:31:09.450Z INFO dynamo_runtime::pipeline::network::tcp::server: tcp transport service on 10.128.0.83:43631
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.99s/it]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.94s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00, 2.26s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00, 2.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00, 2.22s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:04<00:00, 2.18s/it]
Capturing CUDA graph shapes: 0%| | 0/35 [00:00<?, ?it/s]Initializied NIXL agent: c9d1dffc-a6ac-4f6f-852e-286f5f2f2c0e
2025-04-07T00:31:45.363Z INFO prefill_worker.async_init: PrefillWorker initialized
2025-04-07T00:31:45.363Z INFO serve_dynamo.worker: [PrefillWorker] Starting PrefillWorker instance with all registered endpoints
2025-04-07T00:31:45.363Z INFO prefill_worker.prefill_queue_handler: Prefill queue handler entered
2025-04-07T00:31:45.363Z INFO prefill_worker.prefill_queue_handler: Prefill queue: nats://localhost:4222:vllm
2025-04-07T00:31:45.370Z INFO prefill_worker.prefill_queue_handler: prefill queue handler started
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00, 1.75it/s]
Initializied NIXL agent: 2aab3269-f718-46b3-a6cf-a8edf69829f1
2025-04-07T00:32:05.898Z INFO worker.async_init: VllmWorker has been initialized
2025-04-07T00:32:05.898Z INFO serve_dynamo.worker: [VllmWorker] Starting VllmWorker instance with all registered endpoints
2025-04-07T00:32:06.315Z INFO logging.check_required_workers: Waiting for more workers to be ready.
Current: 1, Required: 1
Workers ready: [7587885863974176849]
2025-04-07T00:32:06.315Z INFO serve_dynamo.worker: [Processor] Starting Processor instance with all registered endpoints
2025-04-07T00:38:22.293Z INFO sighandler.signal: Got signal SIG_WINCH
2025-04-07T00:38:30.323Z INFO chat_utils.resolve_chat_template_content_format: Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
2025-04-07T00:38:30.356Z INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:30.356Z INFO worker.generate: Prefilling remotely for request 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0 with length 193
2025-04-07T00:38:30.360Z INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: 456f6da9-e7c2-4db4-b5f7-60d2b23db1f0
2025-04-07T00:38:30.363Z INFO prefill_worker.generate: Loaded nixl metadata from engine 531fbf01-5158-41e0-b1fa-169b8be91678 into engine dc56c449-a7d9-4e97-a0f7-8bca499935c9
2025-04-07T00:38:38.827Z INFO disagg_router.prefill_remote: Remote prefill: True (prefill length: 193/193, prefill queue size: 0/2)
2025-04-07T00:38:38.827Z INFO worker.generate: Prefilling remotely for request d8e79227-06a8-4b9e-8fdd-b23c99e6f532 with length 193
2025-04-07T00:38:38.831Z INFO prefill_worker.prefill_queue_handler: Dequeued prefill request: d8e79227-06a8-4b9e-8fdd-b23c99e6f532 ubuntu(ishan):~ curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
{"id":"56394bed-8fba-4868-b9ac-acb4e4c157b6","choices":[{"index":0,"message":{"content":"Okay, so I'm trying to develop a character background for someone who's part of a story set in the ancient city of Aeloria. The","refusal":null,"tool_calls":null,"role":"assistant","function_call":null,"audio":null},"finish_reason":"length","logprobs":null}],"created":1743986499,"model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","service_tier":null,"system_fingerprint":null,"object":"chat.completion","usage":null} |
After cleaning serving - things still work |
All 4 examples have been tested |
dmitry-tokarev-nv
approved these changes
Apr 7, 2025
mohammedabdulwahhab
approved these changes
Apr 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested llm examples 1 and 2. Works!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Targeting what we use for local dev
Running list of our internal accessed libs
from bentoml._internal.service.loader import load
from _bentoml_sdk import Service
from bentoml._internal.container import BentoMLContainer
from bentoml._internal.utils.circus import Server
_SERVICE_WORKER_SCRIPT = "_bentoml_impl.worker.service"
from _bentoml_impl.loader import import_service, normalize_identifier
from bentoml._internal.utils.circus import create_standalone_arbiter