Thank you for the notebook. But it is not clear I guess since we use RL to solve the problem I don't see action-value or state-action pair. Secondly, I don't see where we reinforce the agent to learn from interaction from the state and action taken to a given state.
You can choose not to use RL to solve this problem. Before using any fancy stuff, just try to run the notebook and propose a pdb for us to score in the Zindi platform. If you are willing to use RL you are welcome. Take this notebook as part of the environment, where the score you get is the reward of your environment.
Thanks Zindi!
It's cool but it's a bit difficult understanding the
Mathenge123 Let me know if I can help get you started
Thank you for the notebook. But it is not clear I guess since we use RL to solve the problem I don't see action-value or state-action pair. Secondly, I don't see where we reinforce the agent to learn from interaction from the state and action taken to a given state.
You can choose not to use RL to solve this problem. Before using any fancy stuff, just try to run the notebook and propose a pdb for us to score in the Zindi platform. If you are willing to use RL you are welcome. Take this notebook as part of the environment, where the score you get is the reward of your environment.