TS//RELIDO/UNCLASSIFIED MIX

20251031

CS 188 notes III

CS 188 notes III

crawler.py

Markov Decision Processes MDP

“Markov” generally means that given the present state, the future and the past are independent

know world model
-interactions with the world

(Deep) Reinforcement Learning

Trust Region Policy Optimization https://arxiv.org/abs/1502.05477

Nuclear fusion plasma control

Magnetic control of tokamak plasmas through deep reinforcement learning https://pubmed.ncbi.nlm.nih.gov/35173339/

Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155


Time-Limited Values similar to HTML sitemap.xml


Dangerous Optimism: Assuming chance when the world is adversarial

Dangerous Pessimism: Assuming the worst case when it’s not likely

...

Human Machine Reward System: Heaven, Reincarnation, Nothingness

work harder if Nothingness?

...

Monty Hall Problem


Reinforcement Learning Problems

maximize expected rewards

DayDreamer: World Models for Physical Robot Learning https://autolab.berkeley.edu/assets/publications/media/2022-12-DayDreamer-CoRL.pdf

Temporal difference learning

Active Reinforcement Learning


Regret is a measure of your total mistake cost




CS 188 notes https://blogbarley.blogspot.com/2025/09/cs-188-notes.html

CS 188 notes II https://blogbarley.blogspot.com/2025/10/cs-188-notes-ii.html

No comments:

Post a Comment