【RL】Q Learning

import numpy as np
import gym
import random
import time
from IPython.display import clear_output
"""Creating the Environment"""
env = gym.make("FrozenLake-v0")
"""Creating the Q-Table and initializing all the Q-Values to zero for each state-action pair."""

action_space_size = env.action_space.n
state_space_size = env.observation_space …
more ...

【RL】User Simulator

User Simulator BackGround

为什么需要用户模拟器?

监督学习方法缺陷:

  1. 需要收集大量实际的人机与人人的训练标注数据,昂贵且耗时。

  2. 此外,即 …

more ...

【RL】Policy Gradient

1. Reinforcement Learning

  • Actor(Policy)

    Neural Network as Actor (Deep). vs lookup Table(Q Learning).

使用神经网络作为Actor比查表的优势?

查表无法穷举输入,e.g.图像画面或者语言输入 …
more ...