Skip to content

[SCP] Open port support #4490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 78 commits into
base: master
Choose a base branch
from
Open

[SCP] Open port support #4490

wants to merge 78 commits into from

Conversation

hyoxt121
Copy link
Contributor

Open port support for SkyServe

TODO: Refactoring for SkyServe architecture

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@hyoxt121
Copy link
Contributor Author

hyoxt121 commented Mar 23, 2025

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:

https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo
I completed

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:

https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo

It took a while because I was very busy with various task at my company. I completed the smoke test.

I did the following command as you guided.

# Run one of the smoke tests
pytest tests/test_smoke.py::test_minimal

I needed AWS credentials but I completed it like this:

(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/test_smoke.py::test_minimal
bringing up nodes...
[minimal] Test started. Log: less -r /tmp/minimal-mdhznd2k.log
[minimal] Passed.
[minimal] Log: less -r /tmp/minimal-mdhznd2k.log
[minimal]
.
1 passed, 5440 warnings in 204.05s (0:03:24)

Can you merge my code without outbound ports? I will create another PR to discuss how to handle outbound ports. I am also writing for SkyPilot v2 code for SCP. I hope we can continue the discussion about SkyPilot v2 development :)

@cblmemo
Copy link
Collaborator

cblmemo commented Mar 25, 2025

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:
https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo I completed

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:
https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo

It took a while because I was very busy with various task at my company. I completed the smoke test.

I did the following command as you guided.

# Run one of the smoke tests pytest tests/test_smoke.py::test_minimal

I needed AWS credentials but I completed it like this:

(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/test_smoke.py::test_minimal bringing up nodes... [minimal] Test started. Log: less -r /tmp/minimal-mdhznd2k.log [minimal] Passed. [minimal] Log: less -r /tmp/minimal-mdhznd2k.log [minimal] . 1 passed, 5440 warnings in 204.05s (0:03:24)

Can you merge my code without outbound ports? I will create another PR to discuss how to handle outbound ports. I am also writing for SkyPilot v2 code for SCP. I hope we can continue the discussion about SkyPilot v2 development :)

Hi @hyoxt121 ! Thanks for the update. This is exciting.

For smoke test, I mean can we have some scp open port related tests like this?

@pytest.mark.gcp
def test_gcp_http_server_with_custom_ports():
name = smoke_tests_utils.get_cluster_name()
test = smoke_tests_utils.Test(
'gcp_http_server_with_custom_ports',
[
f'sky launch -y -d -c {name} --cloud gcp {smoke_tests_utils.LOW_RESOURCE_ARG} examples/http_server_with_custom_ports/task.yaml',
f'until SKYPILOT_DEBUG=0 sky status --endpoint 33828 {name}; do sleep 10; done',
# Retry a few times to avoid flakiness in ports being open.
f'ip=$(SKYPILOT_DEBUG=0 sky status --endpoint 33828 {name}); success=false; for i in $(seq 1 5); do if curl $ip | grep "<h1>This is a demo HTML page.</h1>"; then success=true; break; fi; sleep 10; done; if [ "$success" = false ]; then exit 1; fi',
],
f'sky down -y {name}',
)
smoke_tests_utils.run_one_test(test)

@hyoxt121
Copy link
Contributor Author

hyoxt121 commented Mar 28, 2025

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:
https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo I completed

Hi @cblmemo I wrote smoke test for SCP. But I don't know how to run it. Can you explain how to run smoke test? The code is here: https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L747-L761
If there are other tests that I have to run, please let me know.

Hi, please check the # Run one of the smoke tests:
https://github.com/skypilot-org/skypilot/blob/master/CONTRIBUTING.md#testing

Hi @cblmemo
It took a while because I was very busy with various task at my company. I completed the smoke test.
I did the following command as you guided.
# Run one of the smoke tests pytest tests/test_smoke.py::test_minimal
I needed AWS credentials but I completed it like this:
(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/test_smoke.py::test_minimal bringing up nodes... [minimal] Test started. Log: less -r /tmp/minimal-mdhznd2k.log [minimal] Passed. [minimal] Log: less -r /tmp/minimal-mdhznd2k.log [minimal] . 1 passed, 5440 warnings in 204.05s (0:03:24)
Can you merge my code without outbound ports? I will create another PR to discuss how to handle outbound ports. I am also writing for SkyPilot v2 code for SCP. I hope we can continue the discussion about SkyPilot v2 development :)

Hi @hyoxt121 ! Thanks for the update. This is exciting.

For smoke test, I mean can we have some scp open port related tests like this?

@pytest.mark.gcp
def test_gcp_http_server_with_custom_ports():
name = smoke_tests_utils.get_cluster_name()
test = smoke_tests_utils.Test(
'gcp_http_server_with_custom_ports',
[
f'sky launch -y -d -c {name} --cloud gcp {smoke_tests_utils.LOW_RESOURCE_ARG} examples/http_server_with_custom_ports/task.yaml',
f'until SKYPILOT_DEBUG=0 sky status --endpoint 33828 {name}; do sleep 10; done',
# Retry a few times to avoid flakiness in ports being open.
f'ip=$(SKYPILOT_DEBUG=0 sky status --endpoint 33828 {name}); success=false; for i in $(seq 1 5); do if curl $ip | grep "<h1>This is a demo HTML page.</h1>"; then success=true; break; fi; sleep 10; done; if [ "$success" = false ]; then exit 1; fi',
],
f'sky down -y {name}',
)
smoke_tests_utils.run_one_test(test)

Hi @cblmemo

I have tried to run my test code but the test case is skipped. So, I tested other test cases for SCP but it showed that test is skipped. I have tried to fine the reason but I am not sure what the reason is. Can you let me know how to fix it?

(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports
bringing up nodes...
s
1 skipped, 1662 warnings in 4.81s
(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/smoke_tests/test_basic.py::test_scp_logs
bringing up nodes...
s
1 skipped, 672 warnings in 4.76s
(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/smoke_tests/test_mount_and_storage.py::test_scp_file_mounts
bringing up nodes...
s
1 skipped, 576 warnings in 4.82s

My test code for open port is this:
https://github.com/hyoxt121/skypilot/blob/master/tests/smoke_tests/test_cluster_job.py#L756-L770

@hyoxt121
Copy link
Contributor Author

hyoxt121 commented Apr 4, 2025

Hi @cblmemo

I added --scp to the original command

(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports --scp
bringing up nodes...
[scp_http_server_with_custom_ports] Test started. Log: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log
[scp_http_server_with_custom_ports] Failed (returned 1).
[scp_http_server_with_custom_ports] Reason: sky launch -y -d -c t-scp-http-server-8d-40 --cloud scp examples/http_server_with_custom_ports/task.yaml
[scp_http_server_with_custom_ports] Log: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log
[scp_http_server_with_custom_ports]
F
======================================================================================== FAILURES ========================================================================================
_________________________________________________________________________ test_scp_http_server_with_custom_ports _________________________________________________________________________
[gw0] linux -- Python 3.10.16 /home/ubuntu/miniconda3/envs/sky/bin/python
tests/smoke_tests/test_cluster_job.py:770: in test_scp_http_server_with_custom_ports
smoke_tests_utils.run_one_test(test)
tests/smoke_tests/smoke_tests_utils.py:443: in run_one_test
raise Exception(f'test failed: less -r {log_file.name}')
E Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log
================================================================================ short test summary info
=================================================================================
FAILED tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports - Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log
1 failed, 1632 warnings in 426.96s (0:07:06)

I was unable to complete the port test because I got the following errors in /tmp/scp_http_server_with_custom_ports-9dd_a201.log

I 04-04 07:18:26 optimizer.py:955] -------------------------------------------------------------------------------------------------
I 04-04 07:18:26 optimizer.py:955] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-04 07:18:26 optimizer.py:955] -------------------------------------------------------------------------------------------------
I 04-04 07:18:26 optimizer.py:955] SCP s1v8m16 8 16 - KOREA-EAST-1-SCP-B001 0.41 ^[[32m ✔^[[0m
I 04-04 07:18:26 optimizer.py:955] -------------------------------------------------------------------------------------------------
D 04-04 07:18:26 cloud_vm_ray_backend.py:4639] cluster_ever_up: False
D 04-04 07:18:26 cloud_vm_ray_backend.py:4640] record: None
D 04-04 07:18:26 backend_utils.py:683] Using ssh_proxy_command: None
I 04-04 07:18:26 cloud_vm_ray_backend.py:1794] ^[[0m⚙︎ Launching on SCP KOREA-EAST-1-SCP-B001.
D 04-04 07:18:26 cloud_vm_ray_backend.py:222] ray up script: /tmp/skypilot_ray_up_51dg733q.py
I 04-04 07:23:16 log_utils.py:62] Head VM is up.
D 04-04 07:23:43 cloud_vm_ray_backend.py:1866] ray up takes 316.2 seconds with 1 retries.
D 04-04 07:23:43 cloud_vm_ray_backend.py:1892] Get head ips from ray up stdout: 192.168.0.2 None
D 04-04 07:23:45 cloud_vm_ray_backend.py:2368] Cached external IPs do not match with the newly fetched ones: cached (None), new (['123.41.128.51'])
D 04-04 07:23:45 cloud_vm_ray_backend.py:2381] Using provided internal IPs: ['192.168.0.2']
I 04-04 07:23:45 cloud_vm_ray_backend.py:1642] ^[[0m^[[32m✓ Cluster launched: 't-scp-http-server-8d-40'.^[[0m ^[[2mView logs: sky api logs -l sky-2025-04-04-07-18-25-981792/provision.log^[[0m
D 04-04 07:23:45 cloud_vm_ray_backend.py:3058] Checking if skylet is running on the head node.
D 04-04 07:23:47 sdk.py:1456] Got request with error: sky.launch
E 04-04 07:23:47 sdk.py:1468] === Traceback on SkyPilot API Server ===^M
E 04-04 07:23:47 sdk.py:1468] Traceback (most recent call last):^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/server/requests/executor.py", line 259, in _request_execution_wrapper^M
E 04-04 07:23:47 sdk.py:1468] return_value = func(**request_body.to_kwargs())^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M
E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M
E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/execution.py", line 537, in launch^M
E 04-04 07:23:47 sdk.py:1468] return _execute(^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/execution.py", line 311, in _execute^M
E 04-04 07:23:47 sdk.py:1468] handle = backend.provision(^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M
E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 445, in _record^M
E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/backends/backend.py", line 84, in provision^M
E 04-04 07:23:47 sdk.py:1468] return self._provision(task, to_provision, dryrun, stream_logs,^M
E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3069, in _provision^M
E 04-04 07:23:47 sdk.py:1468] prev_cluster_status, lock_path, config_hash)^M
E 04-04 07:23:47 sdk.py:1468] UnboundLocalError: local variable 'config_hash' referenced before assignment^M
E 04-04 07:23:47 sdk.py:1468]
D 04-04 07:23:47 sdk.py:82] To stream request logs: sky api logs 1566aebe-7e5a-41d1-87c1-c6afd7cf7b24
UnboundLocalError: local variable 'config_hash' referenced before assignment
D 04-04 07:23:48 common_utils.py:549] Tried to remove /home/ubuntu/.sky/generated/ssh/t-scp-http-server-8d-40 but failed to find it. Skip.

The reason is that I still use SkyPilot v1 and it occurs the above error when I run the SkyPilot (v0.8.0)

This will be resolved after migrating SCP code to SkyPilot v2. Do I continue to do this? I am developing SkyPilot v2 for SCP supports. So, after I complete migrating SkyPilot v2. I will be okay.

I think I need to close this PR and open new PR for SkyPilot v2 for SCP support and include open port functionalities later.

At this point, I think it is not easy to merge this code.

Can you let me know your opinion about this?

@cblmemo
Copy link
Collaborator

cblmemo commented Apr 7, 2025

Hi @cblmemo

I added --scp to the original command

(sky) ubuntu@ip-172-31-6-181:~/skypilot$ pytest tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports --scp bringing up nodes... [scp_http_server_with_custom_ports] Test started. Log: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log [scp_http_server_with_custom_ports] Failed (returned 1). [scp_http_server_with_custom_ports] Reason: sky launch -y -d -c t-scp-http-server-8d-40 --cloud scp examples/http_server_with_custom_ports/task.yaml [scp_http_server_with_custom_ports] Log: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log [scp_http_server_with_custom_ports] F ======================================================================================== FAILURES ======================================================================================== _________________________________________________________________________ test_scp_http_server_with_custom_ports _________________________________________________________________________ [gw0] linux -- Python 3.10.16 /home/ubuntu/miniconda3/envs/sky/bin/python tests/smoke_tests/test_cluster_job.py:770: in test_scp_http_server_with_custom_ports smoke_tests_utils.run_one_test(test) tests/smoke_tests/smoke_tests_utils.py:443: in run_one_test raise Exception(f'test failed: less -r {log_file.name}') E Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log ================================================================================ short test summary info ================================================================================= FAILED tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports - Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-9dd_a201.log 1 failed, 1632 warnings in 426.96s (0:07:06)

I was unable to complete the port test because I got the following errors in /tmp/scp_http_server_with_custom_ports-9dd_a201.log

I 04-04 07:18:26 optimizer.py:955] ------------------------------------------------------------------------------------------------- I 04-04 07:18:26 optimizer.py:955] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN I 04-04 07:18:26 optimizer.py:955] ------------------------------------------------------------------------------------------------- I 04-04 07:18:26 optimizer.py:955] SCP s1v8m16 8 16 - KOREA-EAST-1-SCP-B001 0.41 ^[[32m ✔^[[0m I 04-04 07:18:26 optimizer.py:955] ------------------------------------------------------------------------------------------------- D 04-04 07:18:26 cloud_vm_ray_backend.py:4639] cluster_ever_up: False D 04-04 07:18:26 cloud_vm_ray_backend.py:4640] record: None D 04-04 07:18:26 backend_utils.py:683] Using ssh_proxy_command: None I 04-04 07:18:26 cloud_vm_ray_backend.py:1794] ^[[0m⚙︎ Launching on SCP KOREA-EAST-1-SCP-B001. D 04-04 07:18:26 cloud_vm_ray_backend.py:222] ray up script: /tmp/skypilot_ray_up_51dg733q.py I 04-04 07:23:16 log_utils.py:62] Head VM is up. D 04-04 07:23:43 cloud_vm_ray_backend.py:1866] ray up takes 316.2 seconds with 1 retries. D 04-04 07:23:43 cloud_vm_ray_backend.py:1892] Get head ips from ray up stdout: 192.168.0.2 None D 04-04 07:23:45 cloud_vm_ray_backend.py:2368] Cached external IPs do not match with the newly fetched ones: cached (None), new (['123.41.128.51']) D 04-04 07:23:45 cloud_vm_ray_backend.py:2381] Using provided internal IPs: ['192.168.0.2'] I 04-04 07:23:45 cloud_vm_ray_backend.py:1642] ^[[0m^[[32m✓ Cluster launched: 't-scp-http-server-8d-40'.^[[0m ^[[2mView logs: sky api logs -l sky-2025-04-04-07-18-25-981792/provision.log^[[0m D 04-04 07:23:45 cloud_vm_ray_backend.py:3058] Checking if skylet is running on the head node. D 04-04 07:23:47 sdk.py:1456] Got request with error: sky.launch E 04-04 07:23:47 sdk.py:1468] === Traceback on SkyPilot API Server ===^M E 04-04 07:23:47 sdk.py:1468] Traceback (most recent call last):^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/server/requests/executor.py", line 259, in _request_execution_wrapper^M E 04-04 07:23:47 sdk.py:1468] return_value = func(**request_body.to_kwargs())^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/execution.py", line 537, in launch^M E 04-04 07:23:47 sdk.py:1468] return _execute(^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/execution.py", line 311, in _execute^M E 04-04 07:23:47 sdk.py:1468] handle = backend.provision(^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 465, in _record^M E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/utils/common_utils.py", line 445, in _record^M E 04-04 07:23:47 sdk.py:1468] return f(*args, **kwargs)^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/backends/backend.py", line 84, in provision^M E 04-04 07:23:47 sdk.py:1468] return self._provision(task, to_provision, dryrun, stream_logs,^M E 04-04 07:23:47 sdk.py:1468] File "/home/ubuntu/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3069, in _provision^M E 04-04 07:23:47 sdk.py:1468] prev_cluster_status, lock_path, config_hash)^M E 04-04 07:23:47 sdk.py:1468] UnboundLocalError: local variable 'config_hash' referenced before assignment^M E 04-04 07:23:47 sdk.py:1468] D 04-04 07:23:47 sdk.py:82] To stream request logs: sky api logs 1566aebe-7e5a-41d1-87c1-c6afd7cf7b24 UnboundLocalError: local variable 'config_hash' referenced before assignment D 04-04 07:23:48 common_utils.py:549] Tried to remove /home/ubuntu/.sky/generated/ssh/t-scp-http-server-8d-40 but failed to find it. Skip.

The reason is that I still use SkyPilot v1 and it occurs the above error when I run the SkyPilot (v0.8.0)

This will be resolved after migrating SCP code to SkyPilot v2. Do I continue to do this? I am developing SkyPilot v2 for SCP supports. So, after I complete migrating SkyPilot v2. I will be okay.

I think I need to close this PR and open new PR for SkyPilot v2 for SCP support and include open port functionalities later.

At this point, I think it is not easy to merge this code.

Can you let me know your opinion about this?

Hi @hyoxt121 , could you try sky api stop && sky api start and see if that resolve the problem? The code in the traceback seems outdated.

@hyoxt121
Copy link
Contributor Author

hyoxt121 commented Apr 8, 2025

Hi @cblmemo Thank you for your advice.
However, I got the same error after applying sky api stop && sky api start.
Can you let me know how to avoid this legacy code error?

@cblmemo
Copy link
Collaborator

cblmemo commented Apr 11, 2025

Hi @cblmemo Thank you for your advice. However, I got the same error after applying sky api stop && sky api start. Can you let me know how to avoid this legacy code error?

Are you checking out this branch? Can you try a clean installment in a fresh new conda environment?

@hyoxt121
Copy link
Contributor Author

Hi @cblmemo
I run SCP port test in a completely clean environment. I created new AWS instance, conda environment and downloaded the code from https://github.com/hyoxt121/skypilot.git which is a master-branch of https://github.com/skypilot-org/skypilot.git.

(sky) ubuntu@ip-172-31-3-167:~/skypilot$ pytest tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports --scp
D 04-14 06:49:02 skypilot_config.py:274] using default user config file: ~/.sky/skyconfig.yaml
D 04-14 06:49:02 skypilot_config.py:293] using default project config file: skyconfig.yaml
D 04-14 06:49:02 skypilot_config.py:314] final config: {}
bringing up nodes...
[scp_http_server_with_custom_ports] Test started. Log: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log
[scp_http_server_with_custom_ports] Failed (returned 1).
[scp_http_server_with_custom_ports] Reason: sky launch -y -d -c t-scp-http-server-8d-8f --cloud scp examples/http_server_with_custom_ports/task.yaml
[scp_http_server_with_custom_ports] Log: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log
[scp_http_server_with_custom_ports]
F
======================================================================================== FAILURES
========================================================================================
_________________________________________________________________________ test_scp_http_server_with_custom_ports _________________________________________________________________________
[gw0] linux -- Python 3.10.16 /home/ubuntu/miniconda3/envs/sky/bin/python
tests/smoke_tests/test_cluster_job.py:770: in test_scp_http_server_with_custom_ports
smoke_tests_utils.run_one_test(test)
tests/smoke_tests/smoke_tests_utils.py:445: in run_one_test
raise Exception(f'test failed: less -r {log_file.name}')
E Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log
================================================================================ short test summary info =================================================================================
FAILED tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports - Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log
1 failed, 1616 warnings in 419.13s (0:06:59)

The error while running this is legacy code error.
(sky) ubuntu@ip-172-31-3-167:~$ tail -f /tmp/scp_http_server_with_custom_ports-yq2cclfb.log
E 04-14 06:54:22 sdk.py:1496] File "/home/ubuntu/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3096, in _provision
E 04-14 06:54:22 sdk.py:1496] prev_cluster_status, lock_path, config_hash)
E 04-14 06:54:22 sdk.py:1496] UnboundLocalError: local variable 'config_hash' referenced before assignment
E 04-14 06:54:22 sdk.py:1496]
D 04-14 06:54:22 sdk.py:82] To stream request logs: sky api logs 57cb7ce5-eb44-4e2a-8dd3-8a947be23115
UnboundLocalError: local variable 'config_hash' referenced before assignment
D 04-14 06:54:23 skypilot_config.py:274] using default user config file: ~/.sky/skyconfig.yaml
D 04-14 06:54:23 skypilot_config.py:293] using default project config file: skyconfig.yaml
D 04-14 06:54:23 skypilot_config.py:314] final config: {}
D 04-14 06:54:23 common_utils.py:549] Tried to remove /home/ubuntu/.sky/generated/ssh/t-scp-http-server-8d-8f but failed to find it. Skip.

I am developing SkyPilot v2 (https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?pli=1&tab=t.0) for SCP support. After completing it, there will be no more legacy code error.

Can you let me know if it is better to run open port test along with it after developing SkyPilot v2 for SCP with another PR?

@cblmemo
Copy link
Collaborator

cblmemo commented Apr 14, 2025

Hi @cblmemo I run SCP port test in a completely clean environment. I created new AWS instance, conda environment and downloaded the code from https://github.com/hyoxt121/skypilot.git which is a master-branch of https://github.com/skypilot-org/skypilot.git.

(sky) ubuntu@ip-172-31-3-167:~/skypilot$ pytest tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports --scp D 04-14 06:49:02 skypilot_config.py:274] using default user config file: ~/.sky/skyconfig.yaml D 04-14 06:49:02 skypilot_config.py:293] using default project config file: skyconfig.yaml D 04-14 06:49:02 skypilot_config.py:314] final config: {} bringing up nodes... [scp_http_server_with_custom_ports] Test started. Log: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log [scp_http_server_with_custom_ports] Failed (returned 1). [scp_http_server_with_custom_ports] Reason: sky launch -y -d -c t-scp-http-server-8d-8f --cloud scp examples/http_server_with_custom_ports/task.yaml [scp_http_server_with_custom_ports] Log: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log [scp_http_server_with_custom_ports] F ======================================================================================== FAILURES ======================================================================================== _________________________________________________________________________ test_scp_http_server_with_custom_ports _________________________________________________________________________ [gw0] linux -- Python 3.10.16 /home/ubuntu/miniconda3/envs/sky/bin/python tests/smoke_tests/test_cluster_job.py:770: in test_scp_http_server_with_custom_ports smoke_tests_utils.run_one_test(test) tests/smoke_tests/smoke_tests_utils.py:445: in run_one_test raise Exception(f'test failed: less -r {log_file.name}') E Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log ================================================================================ short test summary info ================================================================================= FAILED tests/smoke_tests/test_cluster_job.py::test_scp_http_server_with_custom_ports - Exception: test failed: less -r /tmp/scp_http_server_with_custom_ports-yq2cclfb.log 1 failed, 1616 warnings in 419.13s (0:06:59)

The error while running this is legacy code error. (sky) ubuntu@ip-172-31-3-167:~$ tail -f /tmp/scp_http_server_with_custom_ports-yq2cclfb.log E 04-14 06:54:22 sdk.py:1496] File "/home/ubuntu/skypilot/sky/backends/cloud_vm_ray_backend.py", line 3096, in _provision E 04-14 06:54:22 sdk.py:1496] prev_cluster_status, lock_path, config_hash) E 04-14 06:54:22 sdk.py:1496] UnboundLocalError: local variable 'config_hash' referenced before assignment E 04-14 06:54:22 sdk.py:1496] D 04-14 06:54:22 sdk.py:82] To stream request logs: sky api logs 57cb7ce5-eb44-4e2a-8dd3-8a947be23115 UnboundLocalError: local variable 'config_hash' referenced before assignment D 04-14 06:54:23 skypilot_config.py:274] using default user config file: ~/.sky/skyconfig.yaml D 04-14 06:54:23 skypilot_config.py:293] using default project config file: skyconfig.yaml D 04-14 06:54:23 skypilot_config.py:314] final config: {} D 04-14 06:54:23 common_utils.py:549] Tried to remove /home/ubuntu/.sky/generated/ssh/t-scp-http-server-8d-8f but failed to find it. Skip.

I am developing SkyPilot v2 (https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?pli=1&tab=t.0) for SCP support. After completing it, there will be no more legacy code error.

Can you let me know if it is better to run open port test along with it after developing SkyPilot v2 for SCP with another PR?

Hi @hyoxt121 , iiuc you need to checkout this branch (hyoxt121:master) to run the related experiments. Have you done that? Seems like you are using the master branch?

Also, I think this is already taking parts of implementation in the new provisioner API (i.e. the v2 api), since you are calling the provision lib to open ports. I think it would be more clear to separate those two into 2 separate PR to reduce the possibilities of correlated bugs :)) wdyt?

@hyoxt121
Copy link
Contributor Author

hyoxt121 commented Apr 17, 2025

Hi @cblmemo
I used master branch (hyoxt121:master not skypilot:master). I am currently developing new provisioner for SCP. So, the open port code and the rest of refactoring will be included in new provisioner code. So, if I cannot run the open port test because of legacy code, I think it will be better to go directly to the new provisioner PR even though the possibilities of bugs.
I am writing the code and it will not take too much time. Thank you for your help :))

@cblmemo
Copy link
Collaborator

cblmemo commented Apr 17, 2025

Hi @cblmemo I used master branch (hyoxt121:master not skypilot:master). I am currently developing new provisioner for SCP. So, the open port code and the rest of refactoring will be included in new provisioner code. So, if I cannot run the open port test because of legacy code, I think it will be better to go directly to the new provisioner PR even though the possibilities of bugs. I am writing the code and it will not take too much time. Thank you for your help :))

Sounds good. Then lets merge it in the new provisioner PR.

@hyoxt121
Copy link
Contributor Author

Hi @cblmemo
I will leave a new provisioner PR here after completing it. I will contact you soon.
Thank you for new OR #5288. Let us discuss it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants