TS//RELIDO/UNCLASSIFIED MIX

20251031

CS 188 notes III

CS 188 notes III

crawler.py

Markov Decision Processes MDP

“Markov” generally means that given the present state, the future and the past are independent

know world model
-interactions with the world

(Deep) Reinforcement Learning

Trust Region Policy Optimization https://arxiv.org/abs/1502.05477

Nuclear fusion plasma control

Magnetic control of tokamak plasmas through deep reinforcement learning https://pubmed.ncbi.nlm.nih.gov/35173339/

Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155


Time-Limited Values similar to HTML sitemap.xml


Dangerous Optimism: Assuming chance when the world is adversarial

Dangerous Pessimism: Assuming the worst case when it’s not likely

...

Human Machine Reward System: Heaven, Reincarnation, Nothingness

work harder if Nothingness?

...

Monty Hall Problem


Reinforcement Learning Problems

maximize expected rewards

DayDreamer: World Models for Physical Robot Learning https://autolab.berkeley.edu/assets/publications/media/2022-12-DayDreamer-CoRL.pdf

Temporal difference learning

Active Reinforcement Learning


Regret is a measure of your total mistake cost




CS 188 notes https://blogbarley.blogspot.com/2025/09/cs-188-notes.html

CS 188 notes II https://blogbarley.blogspot.com/2025/10/cs-188-notes-ii.html

IBM Granite

IBM Granite

https://www.ibm.com/granite/docs/use-cases/cookbooks

"IBM Granite is a family of large language models developed by IBM. These models are designed to be powerful, reliable, and customizable for enterprise use cases. Here are some key points about IBM Granite:

Model Variants: IBM Granite offers different model sizes and configurations to suit various needs, ranging from smaller models optimized for specific tasks to larger models capable of handling more complex and general-purpose language understanding and generation tasks.

Performance: The models are trained on a diverse and extensive dataset, which includes a mix of licensed data, data created by IBM research, and publicly available data. This training approach aims to ensure high performance across a wide range of natural language processing (NLP) tasks.

Customization: IBM Granite models can be fine-tuned and customized for specific industries or applications. This allows businesses to tailor the models to their unique requirements, improving accuracy and relevance for their specific use cases.

Enterprise Focus: These models are designed with enterprise needs in mind, including considerations for security, compliance, and scalability. IBM provides tools and services to help organizations deploy and manage these models in their environments.

Responsible AI: IBM emphasizes responsible AI practices in the development of Granite models. This includes efforts to mitigate biases, ensure transparency, and maintain ethical standards in AI deployment.

Integration and Deployment: IBM offers tools and platforms to facilitate the integration and deployment of Granite models into existing workflows and applications. This includes APIs, SDKs, and other developer resources to make it easier for businesses to leverage these models.

Applications: IBM Granite models can be used for a variety of applications, such as customer support, content generation, data analysis, and more. Their versatility makes them suitable for numerous business scenarios where natural language understanding and generation are required.

Overall, IBM Granite represents IBM's effort to provide high-quality, enterprise-grade language models that can be adapted and deployed to meet the specific needs of businesses across different industries." -IBM Granite

https://www.ibm.com/granite/docs/models/granite


Chat, search, and research with Granite 4. https://www.ibm.com/granite/playground

......

Inference Examples

Basic Inference

import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" model_path = "ibm-granite/granite-4.0-h-tiny" tokenizer = AutoTokenizer.from_pretrained(model_path) # drop device_map if running on CPU model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() # change input text as desired chat = [ { "role": "user", "content": "What is the largest ocean on Earth?"}, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) # tokenize the text input_tokens = tokenizer(chat, return_tensors="pt").to(device) # generate output tokens output = model.generate(\*\*input_tokens, max_new_tokens=150, temperature=0) # decode output tokens into text output = tokenizer.batch_decode(output) print(output[0])

......

Tool calling

import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # model_path = "" model_path = "ibm-granite/granite-4.0-micro" tokenizer = AutoTokenizer.from_pretrained(model_path) # drop device_map if running on CPU model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() chat=[ {"role": "user", "content": "What's the current weather in New York?"}, {"role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_current_weather", "arguments": {"city": "New York"} } } ] }, {"role": "tool", "content": "New York is sunny with a temperature of 30°C."}, {"role": "user", "content": "OK, Now tell me what's the weather like in Bengaluru at this moment?"} ] tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location" : { "description": "The city and state, e.g. San Francisco, CA", "type": "string", }, }, "required" : ["location"] } } }, { "type": "function", "function": { "name": "get_stock_price", "description": "Retrieves the current stock price for a given ticker symbol. The ticker symbol must be a valid symbol f
or a publicly traded company on a major US stock exchange like NYSE or NASDAQ. The tool will return the latest
trade price in USD. It should be used when the user asks about the current or most recent price of a specific stock. It wi
ll not provide any other information about the stock or company.", "parameters": { "type": "object", "properties": { "ticker" : { "description": "The stock ticker symbol, e.g. AAPL for Apple Inc.", "type": "string", }, } } } } ] chat = tokenizer.apply_chat_template(chat,tokenize=False, add_generation_prompt=True, tools=tools) # tokenize the text input_tokens = tokenizer(chat, return_tensors="pt").to(device) # generate output tokens output = model.generate(\*\*input_tokens, max_new_tokens=100, temperature=0) # decode output tokens into text output = tokenizer.batch_decode(output) # print output print(output[0])