Skip to content

Fix CORS for /health endpoint #6892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

soulofmischief
Copy link

/health is missing Access-Control-Allow-Origin. This sets it to the request origin, in line with the other endpoints.

Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 202 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=23763.92ms p(95)=41061.88ms fails=, finish reason: stop=83 truncated=119
  • Prompt processing (pp): avg=267.8tk/s p(95)=810.06tk/s
  • Token generation (tg): avg=18.88tk/s p(95)=25.5tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=patch-1 commit=0e51cc38cbd2dbba1f6cce991d86cc646a7d800d

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 202 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714022329 --> 1714022959
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 290.49, 290.49, 290.49, 290.49, 290.49, 290.49, 290.49, 290.49, 290.49, 290.49, 350.31, 350.31, 350.31, 350.31, 350.31, 417.12, 417.12, 417.12, 417.12, 417.12, 417.12, 417.12, 417.12, 417.12, 417.12, 437.06, 437.06, 437.06, 437.06, 437.06, 438.48, 438.48, 438.48, 438.48, 438.48, 431.54, 431.54, 431.54, 431.54, 431.54, 432.9, 432.9, 432.9, 432.9, 432.9, 452.96, 452.96, 452.96, 452.96, 452.96, 456.62, 456.62, 456.62, 456.62, 456.62, 473.83, 473.83, 473.83, 473.83, 473.83, 478.36, 478.36, 478.36, 478.36, 478.36, 506.82, 506.82, 506.82, 506.82, 506.82, 517.19, 517.19, 517.19, 517.19, 517.19, 517.28, 517.28, 517.28, 517.28, 517.28, 518.37, 518.37, 518.37, 518.37, 518.37, 518.77, 518.77, 518.77, 518.77, 518.77, 522.25, 522.25, 522.25, 522.25, 522.25, 522.53, 522.53, 522.53, 522.53, 522.53, 530.71, 530.71, 530.71, 530.71, 530.71, 536.03, 536.03, 536.03, 536.03, 536.03, 535.79, 535.79, 535.79, 535.79, 535.79, 546.57, 546.57, 546.57, 546.57, 546.57, 546.35, 546.35, 546.35, 546.35, 546.35, 546.45, 546.45, 546.45, 546.45, 546.45, 559.61, 559.61, 559.61, 559.61, 559.61, 560.37, 560.37, 560.37, 560.37, 560.37, 561.75, 561.75, 561.75, 561.75, 561.75, 561.85, 561.85, 561.85, 561.85, 561.85, 582.81, 582.81, 582.81, 582.81, 582.81, 581.71, 581.71, 581.71, 581.71, 581.71, 580.93, 580.93, 580.93, 580.93, 580.93, 598.58, 598.58, 598.58, 598.58, 598.58, 604.29, 604.29, 604.29, 604.29, 604.29, 603.12, 603.12, 603.12, 603.12, 603.12, 602.73, 602.73, 602.73, 602.73, 602.73, 601.69, 601.69, 601.69, 601.69, 601.69, 607.34, 607.34, 607.34, 607.34, 607.34, 591.61, 591.61, 591.61, 591.61, 591.61, 594.72, 594.72, 594.72, 594.72, 594.72, 594.47, 594.47, 594.47, 594.47, 594.47, 594.11, 594.11, 594.11, 594.11, 594.11, 593.48, 593.48, 593.48, 593.48, 593.48, 596.15, 596.15, 596.15, 596.15, 596.15, 596.0, 596.0, 596.0, 596.0, 596.0, 596.31, 596.31, 596.31, 596.31, 596.31, 597.38, 597.38, 597.38, 597.38, 597.38, 598.33, 598.33, 598.33, 598.33, 598.33, 597.58, 597.58, 597.58, 597.58, 597.58, 610.49, 610.49, 610.49, 610.49, 610.49, 612.95, 612.95, 612.95, 612.95, 612.95, 612.51, 612.51, 612.51, 612.51, 612.51, 610.74, 610.74, 610.74, 610.74, 610.74, 612.64, 612.64, 612.64, 612.64, 612.64, 612.34, 612.34, 612.34, 612.34, 612.34, 613.43, 613.43, 613.43, 613.43, 613.43, 613.74, 613.74, 613.74, 613.74, 613.74, 616.62, 616.62, 616.62, 616.62, 616.62, 616.96, 616.96, 616.96, 616.96, 616.96, 619.56, 619.56, 619.56, 619.56, 619.56, 619.56]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 202 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714022329 --> 1714022959
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 32.93, 25.57, 25.57, 25.57, 25.57, 25.57, 25.59, 25.59, 25.59, 25.59, 25.59, 25.59, 25.59, 25.59, 25.59, 25.59, 23.98, 23.98, 23.98, 23.98, 23.98, 21.23, 21.23, 21.23, 21.23, 21.23, 17.81, 17.81, 17.81, 17.81, 17.81, 17.81, 17.81, 17.81, 17.81, 17.81, 18.13, 18.13, 18.13, 18.13, 18.13, 18.46, 18.46, 18.46, 18.46, 18.46, 18.77, 18.77, 18.77, 18.77, 18.77, 18.99, 18.99, 18.99, 18.99, 18.99, 19.0, 19.0, 19.0, 19.0, 19.0, 18.89, 18.89, 18.89, 18.89, 18.89, 18.76, 18.76, 18.76, 18.76, 18.76, 18.56, 18.56, 18.56, 18.56, 18.56, 18.55, 18.55, 18.55, 18.55, 18.55, 18.77, 18.77, 18.77, 18.77, 18.77, 18.91, 18.91, 18.91, 18.91, 18.91, 19.2, 19.2, 19.2, 19.2, 19.2, 19.25, 19.25, 19.25, 19.25, 19.25, 19.29, 19.29, 19.29, 19.29, 19.29, 19.37, 19.37, 19.37, 19.37, 19.37, 19.38, 19.38, 19.38, 19.38, 19.38, 19.4, 19.4, 19.4, 19.4, 19.4, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.46, 19.39, 19.39, 19.39, 19.39, 19.39, 19.43, 19.43, 19.43, 19.43, 19.43, 19.42, 19.42, 19.42, 19.42, 19.42, 19.43, 19.43, 19.43, 19.43, 19.43, 19.38, 19.38, 19.38, 19.38, 19.38, 19.31, 19.31, 19.31, 19.31, 19.31, 19.1, 19.1, 19.1, 19.1, 19.1, 19.02, 19.02, 19.02, 19.02, 19.02, 18.87, 18.87, 18.87, 18.87, 18.87, 18.71, 18.71, 18.71, 18.71, 18.71, 18.61, 18.61, 18.61, 18.61, 18.61, 18.49, 18.49, 18.49, 18.49, 18.49, 18.41, 18.41, 18.41, 18.41, 18.41, 18.3, 18.3, 18.3, 18.3, 18.3, 18.19, 18.19, 18.19, 18.19, 18.19, 17.73, 17.73, 17.73, 17.73, 17.73, 17.73, 17.73, 17.73, 17.73, 17.73, 17.71, 17.71, 17.71, 17.71, 17.71, 17.75, 17.75, 17.75, 17.75, 17.75, 17.77, 17.77, 17.77, 17.77, 17.77, 17.88, 17.88, 17.88, 17.88, 17.88, 17.91, 17.91, 17.91, 17.91, 17.91, 17.91, 17.91, 17.91, 17.91, 17.91, 17.9, 17.9, 17.9, 17.9, 17.9, 17.77, 17.77, 17.77, 17.77, 17.77, 17.69, 17.69, 17.69, 17.69, 17.69, 17.66, 17.66, 17.66, 17.66, 17.66, 17.64, 17.64, 17.64, 17.64, 17.64, 17.69, 17.69, 17.69, 17.69, 17.69, 17.74, 17.74, 17.74, 17.74, 17.74, 17.79, 17.79, 17.79, 17.79, 17.79, 17.85, 17.85, 17.85, 17.85, 17.85, 17.95]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 202 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714022329 --> 1714022959
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.39, 0.39, 0.39, 0.39, 0.39, 0.49, 0.49, 0.49, 0.49, 0.49, 0.5, 0.5, 0.5, 0.5, 0.5, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.18, 0.18, 0.18, 0.18, 0.18, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.27, 0.27, 0.27, 0.27, 0.27, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.36, 0.36, 0.36, 0.36, 0.36, 0.37, 0.37, 0.37, 0.37, 0.37, 0.39, 0.39, 0.39, 0.39, 0.39, 0.28, 0.28, 0.28, 0.28, 0.28, 0.27, 0.27, 0.27, 0.27, 0.27, 0.32, 0.32, 0.32, 0.32, 0.32, 0.41, 0.41, 0.41, 0.41, 0.41, 0.43, 0.43, 0.43, 0.43, 0.43, 0.4, 0.4, 0.4, 0.4, 0.4, 0.41, 0.41, 0.41, 0.41, 0.41, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.26, 0.26, 0.26, 0.26, 0.26, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.37, 0.37, 0.37, 0.37, 0.37, 0.4, 0.4, 0.4, 0.4, 0.4, 0.38, 0.38, 0.38, 0.38, 0.38, 0.23, 0.23, 0.23, 0.23, 0.23, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.26]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 202 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714022329 --> 1714022959
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0]
                    
Loading

@phymbert
Copy link
Collaborator

Health endpoint is not supposed to be called cross origin

@soulofmischief
Copy link
Author

soulofmischief commented Apr 25, 2024

@phymbert I see. Without this, a health check inside a browser client fails. What's the reason for preventing browser-side health checks?

The benefit is that I can poll the endpoint at startup and queue any inference until it's ready.

@phymbert
Copy link
Collaborator

What's the motivation to call health on a browser? Will you restart the server from there ?
If you really need it, put a reverse proxy on top.

@soulofmischief
Copy link
Author

Calling /health allows the client to delay and queue inference until the server is set up, which is useful when spinning up development environments.

What's the reasoning for not having /health be cross-origin? If it's a security issue, maybe /ping or similar endpoint works as a way to enable this behavior?

Otherwise I can just use a reverse proxy as suggested. Thanks.

@mofosyne mofosyne added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server/webui labels May 9, 2024
@mofosyne mofosyne added the need feedback Testing and feedback with results are needed label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug need feedback Testing and feedback with results are needed Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server/webui
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants