Roll out CQL policy

Hi @fksvensson ,

I guess that the Python version might be the problem. I would use pyenv (if you are on MacOS-X you can use brew to install it. Pull Python 3.9 and then use

pyenv install 3.9.4
mkdir ray-1.8.0 
cd ray-1.8.0
pyenv local 3.9.4
python -m venv -venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install numpy tensorflow
python -m pip install "ray[default]" 

and you should be good to go. ThatÄs the setup I use, so I can choose python versions and install with these versions the packages in a virtual env.

Regarding your problem:

  1. Have you registered your input reader via register_input("custom_input", input_creator)? before requesting it in your config?
  2. Have you tested your input reader? Does it really read in the samples and returns a SampleBatch as needed?
  3. How did you write your ouputs? Did you use a specific ouput writer you coded yourself or did you use the default?
  4. Do you actually need the input or can you deploy your policy via running the environment in the background?

I actually do not understand why you include all the training parameters when you want to do evaluation. Evaluation can be done quite easily by running

rllib rollout \
    ~/ray_results/default/DQN_CartPole-v0_0upjmdgr0/checkpoint_1/checkpoint-1 \
    --run DQN --env CartPole-v0 --steps 10000

So you use your checkpoint then the policy (DQN here) then your env and the number of steps you want to evaluate. The workers sample experiences from the environment using the trained policy - there is no need for an input sampler. You can also do custom evaluation as shown in this example here.

If you still need something else, you can post your code here and we can take a look at it.

Best, Simon