Cloud : AWS
Auto Scaler - Karpenter
Infra Created via - Terraform
can you please somebody help me on this.
Document Followed - Ray on EKS | Data on EKS
- check the ray cluster head pod
kubectl get pod -n ray-cluster
NAME READY STATUS RESTARTS AGE
ray-cluster-kuberay-head-b7tfn 1/1 Running 0 5h40m
- check ray cluster svc
kubectl get svc -n ray-cluster
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ray-cluster-kuberay-head-svc ClusterIP 172.20.11.18 <none> 8000/TCP,10001/TCP,6379/TCP,8265/TCP,8080/TCP 5h27m
- Job submission
export RAY_ADDRESS="http://localhost:8265"
ray job submit --working-dir ./ -- python xgboost_submit.py
- Job logs
$ ray job logs 'raysubmit_P1JXkHSwsv2Shft1' --follow --address http://127.0.0.1:8265
Job submission server address: http://localhost:8265
fatal: destination path 'ray' already exists and is not an empty directory.
Traceback (most recent call last):
File "ray/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py", line 11, in <module>
import xgboost as xgb
ModuleNotFoundError: No module named 'xgboost'
---------------------------------------
Job 'raysubmit_P1JXkHSwsv2Shft1' failed
---------------------------------------
Status message: Job failed due to an application error, last available logs (truncated to 20,000 chars):
fatal: destination path 'ray' already exists and is not an empty directory.
Traceback (most recent call last):
File "ray/release/air_tests/air_benchmarks/workloads/xgboost_benchmark.py", line 11, in <module>
import xgboost as xgb
ModuleNotFoundError: No module named 'xgboost'