MAML Discrete actions update

I noticed that there is a commit (47b499d899f8eccc00905992a6603ede97d1d44e) with the message “Cartpole MAML + Discrete,” does this mean that MAML handles discrete actions now?

Before noticing that the MAML documentation states that it doesn’t handle discrete action spaces, I tried setting it up to run on an environment with a discrete action space, and it runs… but I’m guessing I should’t expect learning to occur?


It should work for PyTorch, yes!

I have added a test case for this as well:

Here is the PR (with the CartPole task env and a simple test added): [RLlib] MAML: Add cartpole mass test for PyTorch. by sven1977 · Pull Request #13679 · ray-project/ray · GitHub


It does seem to be working for me, so that is awesome! I can’t run with more than 2 workers though. I’ll post the traceback at the end, but it looks to me like in there is a default value set for the split variable on line 274 that doesn’t work with the loop in the MAMLLoss init at line 167 when 'num_workers' > 2. I’m going to see if I can make a change to fix it, but if you know a solution that would be great! :slight_smile: Especially considering my solution may just be a quick hack.

Traceback (most recent call last):
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/tune/”, line 519, in _process_trial
result = self.trial_executor.fetch_result(trial)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/tune/”, line 497, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/”, line 1379, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): e[36mray::MAML.train()e[39m (pid=73302, ip=
File “python/ray/_raylet.pyx”, line 422, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 456, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 459, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 463, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 415, in ray._raylet.execute_task.function_executor
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/”, line 106, in init
Trainer.init(self, config, env, logger_creator)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/”, line 465, in init
super().init(config, logger_creator)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/tune/”, line 96, in init
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/”, line 629, in setup
self._init(self.config, self.env_creator)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/”, line 133, in _init
self.workers = self._make_workers(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/”, line 700, in _make_workers
return WorkerSet(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/evaluation/”, line 87, in init
self._local_worker = self._make_worker(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/evaluation/”, line 315, in _make_worker
worker = cls(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/evaluation/”, line 462, in init
self.policy_map, self.preprocessors = self._build_policy_map(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/evaluation/”, line 1076, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/policy/”, line 249, in init
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/policy/”, line 654, in _initialize_loss_from_dummy_batch
self._loss(self, self.model, self.dist_class, train_batch)
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/maml/”, line 277, in maml_loss
policy.loss_obj = MAMLLoss(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/maml/”, line 173, in init
ppo_loss, _, inner_kl_loss, _, _ = self.compute_losses(
File “/Users/me/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/agents/maml/”, line 216, in compute_losses
obs = self.obs[inner_adapt_iter][task_iter]
IndexError: tuple index out of range

I have parallel environments working with modified RLlib code here:

The code is running, and I don’t think my “fix” broke anything else, but please let me know if it did!

Awesome @Chace_Ashcraft . Did you create a PR, so we can merge your fix into master?

@sven1977 I had not, but created one just now here: Pytorch MAML fix for more than two workers with discrete actions by ChaceAshcraft · Pull Request #13835 · ray-project/ray · GitHub. Hope it helps! Thanks for your awesome work on ray and rllib by the way!