Skip to content

Commit c489694

Browse files
azzhipafacebook-github-bot
authored andcommitted
fix: ray module not found handling (#1055)
Summary: TorchX has been handling `ModuleNotFoundError` gracefully for a while now, e.g. for SageMaker when running `torchx runopts` we get: ``` ... (remote jobs) the image repository to use when pushing patched images, must have push access. Ex: example.com/your/container quiet=QUIET (bool, False) whether to suppress verbose output for image building. Defaults to ``False``. aws_sagemaker: No module named 'sagemaker' gcp_batch: usage: [project=PROJECT],[location=LOCATION] ... ``` But for `ray` we get an exception after which we won't get next runopts: ``` gcp_batch: usage: [project=PROJECT],[location=LOCATION] optional arguments: project=PROJECT (str, None) Name of the GCP project. Defaults to the configured GCP project in the environment location=LOCATION (str, us-central1) Name of the location to schedule the job in. Defaults to us-central1 Traceback (most recent call last): File "/usr/local/bin/torchx", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torchx/cli/main.py", line 118, in main run_main(get_sub_cmds(), argv) File "/usr/local/lib/python3.10/dist-packages/torchx/cli/main.py", line 114, in run_main args.func(args) File "/usr/local/lib/python3.10/dist-packages/torchx/cli/cmd_runopts.py", line 36, in run opts = runner.scheduler_run_opts(scheduler) File "/usr/local/lib/python3.10/dist-packages/torchx/runner/api.py", line 473, in scheduler_run_opts return self._scheduler(scheduler).run_opts() File "/usr/local/lib/python3.10/dist-packages/torchx/runner/api.py", line 718, in _scheduler sched = factory(self._name, **self._scheduler_params) File "/usr/local/lib/python3.10/dist-packages/torchx/schedulers/__init__.py", line 39, in run module = importlib.import_module(path) File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/usr/local/lib/python3.10/dist-packages/torchx/schedulers/ray_scheduler.py", line 448, in <module> session_name: str, ray_client: Optional[JobSubmissionClient] = None, **kwargs: Any NameError: name 'JobSubmissionClient' is not defined ``` That's because `ray_scheduler` has custom `ModuleNotFoundException` handling - perhaps for historic reasons. Test Plan: [x] existing test must pass Reviewed By: tonykao8080 Differential Revision: D73751531 Pulled By: andywag
1 parent 16cefac commit c489694

File tree

3 files changed

+990
-1018
lines changed

3 files changed

+990
-1018
lines changed

docs/source/schedulers/ray.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ Ray
1313
:show-inheritance:
1414

1515
.. autofunction:: create_scheduler
16-
.. autofunction:: has_ray
1716
.. autofunction:: serialize
1817

1918
.. autoclass:: RayJob

0 commit comments

Comments
 (0)