CS 188 notes III
Markov Decision Processes MDP
“Markov” generally means that given the present state, the future and the past are independent
Trust Region Policy Optimization https://arxiv.org/abs/1502.05477
Nuclear fusion plasma control
Magnetic control of tokamak plasmas through deep reinforcement learning https://pubmed.ncbi.nlm.nih.gov/35173339/
Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155
Time-Limited Values similar to HTML sitemap.xml
Dangerous Optimism: Assuming chance when the world is adversarial
Dangerous Pessimism: Assuming the worst case when it’s not likely
...
Human Machine Reward System: Heaven, Reincarnation, Nothingness
work harder if Nothingness?
...
Monty Hall Problem
Reinforcement Learning Problems
maximize expected rewards
DayDreamer: World Models for Physical Robot Learning https://autolab.berkeley.edu/assets/publications/media/2022-12-DayDreamer-CoRL.pdf
Temporal difference learning
Active Reinforcement Learning
Regret is a measure of your total mistake cost
CS 188 notes https://blogbarley.blogspot.com/2025/09/cs-188-notes.html
CS 188 notes II https://blogbarley.blogspot.com/2025/10/cs-188-notes-ii.html
No comments:
Post a Comment