Correct usage of tune sampling in AlgorithmConfig dicts

I know that a similar topic came up in this old thread, but there was no answer communicated.

Basically, the same thing is happening to me now. I would like to use the new AlgorithmConfig API and define 2 parameters in the .training() section as a result of tune sampling. Note that I am using framework “tf” on purpose, as “tf2” always leads to memory crashes on my workstation (even when setting eager_tracing = False).

trainer_config = (
        PPOConfig()
        .environment(MyCustomEnv)
        .checkpointing(
            export_native_model_files=True, checkpoint_trainable_policies_only=True
        )
        .framework("tf")
        .callbacks(MyCallback)
        .rollouts(
            num_rollout_workers=1
        )
        .resources(
            num_cpus_per_worker=1
        ).debugging(
            log_level="INFO"
        )
        .training(
            model={"custom_model": MyTFModelV2},
            train_batch_size=10,
            sgd_minibatch_size=5,
            num_sgd_iter=2,
            gamma=tune.grid_search([0.9, 0.99, 0.999]),
            lr=tune.loguniform(1e-4, 1e-1),            
        )        
    )

Sorry for the wall of text, but as @kai also asked in the thread linked above for the trainer_config, I post it below.
→ Could the issue be related to the fact that tune.loguniform() returns a Float? But how to overcome?

{ ‘_disable_action_flattening’: False,
‘_disable_execution_plan_api’: True,
‘_disable_preprocessor_api’: False,
‘_enable_rl_module_api’: False,
‘_enable_rl_trainer_api’: False,
‘_fake_gpus’: False,
‘_rl_trainer_hps’: RLTrainerHPs(),
‘_tf_policy_handles_more_than_one_loss’: False,
‘action_space’: None,
‘actions_in_input_normalized’: False,
‘always_attach_evaluation_results’: False,
‘auto_wrap_old_gym_envs’: True,
‘batch_mode’: ‘truncate_episodes’,
‘callbacks’: <class ‘rl_chem_pps.callbacks.MyCallback’>,
‘checkpoint_trainable_policies_only’: True,
‘clip_actions’: False,
‘clip_param’: 0.3,
‘clip_rewards’: None,
‘compress_observations’: False,
‘create_env_on_driver’: False,
‘custom_eval_function’: None,
‘custom_resources_per_worker’: { },
‘disable_env_checking’: False,
‘eager_max_retraces’: 20,
‘eager_tracing’: False,
‘enable_async_evaluation’: False,
‘enable_connectors’: True,
‘enable_tf1_exec_eagerly’: False,
‘entropy_coeff’: 0.0,
‘entropy_coeff_schedule’: None,
‘env’: <class ‘rl_chem_pps.MyCustomEnv’>,
‘env_task_fn’: None,
‘evaluation_config’: None,
‘evaluation_duration’: 10,
‘evaluation_duration_unit’: ‘episodes’,
‘evaluation_interval’: None,
‘evaluation_num_workers’: 0,
‘evaluation_parallel_to_training’: False,
‘evaluation_sample_timeout_s’: 180.0,
‘exploration_config’: { ‘type’: ‘StochasticSampling’},
‘explore’: True,
‘export_native_model_files’: True,
‘extra_python_environs_for_driver’: { },
‘extra_python_environs_for_worker’: { },
‘fake_sampler’: False,
‘framework’: ‘tf’,
‘gamma’: { ‘grid_search’: [ 0.9,
0.99,
0.999]},
‘grad_clip’: None,
‘horizon’: -1,
‘ignore_worker_failures’: False,
‘in_evaluation’: False,
‘input’: ‘sampler’,
‘input_config’: { },
‘is_atari’: None,
‘keep_per_episode_custom_metrics’: False,
‘kl_coeff’: 0.2,
‘kl_target’: 0.01,
‘lambda’: 1.0,
‘local_tf_session_args’: { ‘inter_op_parallelism_threads’: 8,
‘intra_op_parallelism_threads’: 8},
‘log_level’: ‘INFO’,
‘log_sys_usage’: True,
‘logger_config’: None,
‘logger_creator’: None,
‘lr’: <ray.tune.search.sample.Float object at 0x000002454A7AC460>,
‘lr_schedule’: None,
‘max_requests_in_flight_per_sampler_worker’: 2,
‘metrics_episode_collection_timeout_s’: 60.0,
‘metrics_num_episodes_for_smoothing’: 100,
‘min_sample_timesteps_per_iteration’: 0,
‘min_time_s_per_iteration’: None,
‘min_train_timesteps_per_iteration’: 0,
‘model’: { ‘_disable_action_flattening’: False,
‘_disable_preprocessor_api’: False,
‘_time_major’: False,
‘_use_default_native_models’: -1,
‘attention_dim’: 64,
‘attention_head_dim’: 32,
‘attention_init_gru_gate_bias’: 2.0,
‘attention_memory_inference’: 50,
‘attention_memory_training’: 50,
‘attention_num_heads’: 1,
‘attention_num_transformer_units’: 1,
‘attention_position_wise_mlp_dim’: 32,
‘attention_use_n_prev_actions’: 0,
‘attention_use_n_prev_rewards’: 0,
‘conv_activation’: ‘relu’,
‘conv_filters’: None,
‘custom_action_dist’: None,
‘custom_model’: <class ‘rl_chem_pps.models.WkActionMaskModel’>,
‘custom_model_config’: { },
‘custom_preprocessor’: None,
‘dim’: 84,
‘fcnet_activation’: ‘tanh’,
‘fcnet_hiddens’: [ 256,
256],
‘framestack’: True,
‘free_log_std’: False,
‘grayscale’: False,
‘lstm_cell_size’: 256,
‘lstm_use_prev_action’: False,
‘lstm_use_prev_action_reward’: -1,
‘lstm_use_prev_reward’: False,
‘max_seq_len’: 20,
‘no_final_linear’: False,
‘post_fcnet_activation’: ‘relu’,
‘post_fcnet_hiddens’: [ ],
‘use_attention’: False,
‘use_lstm’: False,
‘vf_share_layers’: False,
‘zero_mean’: True},
‘multiagent’: { ‘count_steps_by’: ‘env_steps’,
‘observation_fn’: None,
‘policies’: { ‘default_policy’: ( None,
None,
None,
None)},
‘policies_to_train’: None,
‘policy_map_cache’: -1,
‘policy_map_capacity’: 100,
‘policy_mapping_fn’: <function AlgorithmConfig.init.. at 0x000002454A792940>},
‘no_done_at_end’: -1,
‘normalize_actions’: True,
‘num_consecutive_worker_failures_tolerance’: 100,
‘num_cpus_for_driver’: 1,
‘num_cpus_per_trainer_worker’: 1,
‘num_cpus_per_worker’: 1,
‘num_envs_per_worker’: 1,
‘num_gpus’: 0,
‘num_gpus_per_trainer_worker’: 0,
‘num_gpus_per_worker’: 0,
‘num_sgd_iter’: 2,
‘num_trainer_workers’: 0,
‘num_workers’: 1,
‘observation_filter’: ‘NoFilter’,
‘observation_space’: None,
‘off_policy_estimation_methods’: { },
‘offline_sampling’: False,
‘ope_split_batch_by_episode’: True,
‘optimizer’: { },
‘output’: None,
‘output_compress_columns’: [ ‘obs’,
‘new_obs’],
‘output_config’: { },
‘output_max_file_size’: 67108864,
‘placement_strategy’: ‘PACK’,
‘policies’: { ‘default_policy’: <ray.rllib.policy.policy.PolicySpec object at 0x000002454A7AC5E0>},
‘policy_states_are_swappable’: False,
‘postprocess_inputs’: False,
‘preprocessor_pref’: ‘deepmind’,
‘recreate_failed_workers’: False,
‘remote_env_batch_wait_ms’: 0,
‘remote_worker_envs’: False,
‘render_env’: False,
‘replay_sequence_length’: None,
‘restart_failed_sub_environments’: False,
‘rl_module_class’: None,
‘rl_trainer_class’: None,
‘rollout_fragment_length’: ‘auto’,
‘sample_async’: False,
‘sample_collector’: <class ‘ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector’>,
‘sampler_perf_stats_ema_coef’: None,
‘seed’: None,
‘sgd_minibatch_size’: 5,
‘shuffle_buffer_size’: 0,
‘shuffle_sequences’: True,
‘simple_optimizer’: -1,
‘soft_horizon’: -1,
‘sync_filters_on_rollout_workers_timeout_s’: 60.0,
‘synchronize_filters’: True,
‘tf_session_args’: { ‘allow_soft_placement’: True,
‘device_count’: { ‘CPU’: 1},
‘gpu_options’: { ‘allow_growth’: True},
‘inter_op_parallelism_threads’: 2,
‘intra_op_parallelism_threads’: 2,
‘log_device_placement’: False},
‘train_batch_size’: 10,
‘use_critic’: True,
‘use_gae’: True,
‘validate_workers_after_construction’: True,
‘vf_clip_param’: 10.0,
‘vf_loss_coeff’: 1.0,
‘vf_share_layers’: -1,
‘worker_cls’: None,
‘worker_health_probe_timeout_s’: 60,
‘worker_restore_timeout_s’: 1800}