Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug decode cache #2

Open
wants to merge 5 commits into
base: nlp/cache-start-pos
Choose a base branch
from

Conversation

nlpfollower
Copy link
Owner

@nlpfollower nlpfollower commented Jan 23, 2025

Command DEBUG_CACHE=True python3 torchchat.py chat --checkpoint-path ../torchtitan/models/Llama3.1-8B-Instruct/consolidated.0.pth --tokenizer-path ../torchtitan/models/Llama3.1-8B-Instruct/tokenizer.model --params-path torchchat/model_params/Meta-Llama-3.1-8B.json

Without yield cur_token.clone()

Starting Interactive Chat
Entering Chat Mode. Will continue chatting back and forth with the language model until the models max context length of 8192 tokens is hit or until the user says /bye
Do you want to enter a system prompt? Enter y for yes and anything else for no. 
y
What is your system prompt? 
You are a helpful assistant.
User: Can you remember this word for me: "berryjuice"? Reply briefly.
Model: Transformer before forward input pos: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36], device='cuda:0', dtype=torch.int32); (pos=0 cache_sample=0.0) | Transformer after forward cache (pos=0; cache_sample=0.5546875)
Transformer before forward input pos: tensor([37], device='cuda:0', dtype=torch.int32); (pos=37 cache_sample=0.0) | Transformer after forward cache (pos=37; cache_sample=7.65625)
Transformer before forward input pos: tensor([38], device='cuda:0', dtype=torch.int32); (pos=38 cache_sample=0.0) | Transformer after forward cache (pos=38; cache_sample=4.46875)
Transformer before forward input pos: tensor([39], device='cuda:0', dtype=torch.int32); (pos=39 cache_sample=0.0) | Transformer after forward cache (pos=39; cache_sample=-0.103515625)
Transformer before forward input pos: tensor([40], device='cuda:0', dtype=torch.int32); (pos=40 cache_sample=0.0) | Transformer after forward cache (pos=40; cache_sample=-5.25)
Transformer before forward input pos: tensor([41], device='cuda:0', dtype=torch.int32); (pos=41 cache_sample=0.0) | Transformer after forward cache (pos=41; cache_sample=-3.953125)
Transformer before forward input pos: tensor([42], device='cuda:0', dtype=torch.int32); (pos=42 cache_sample=0.0) | Transformer after forward cache (pos=42; cache_sample=-1.0078125)
Transformer before forward input pos: tensor([43], device='cuda:0', dtype=torch.int32); (pos=43 cache_sample=0.0) | Transformer after forward cache (pos=43; cache_sample=7.5625)
Transformer before forward input pos: tensor([44], device='cuda:0', dtype=torch.int32); (pos=44 cache_sample=0.0) | Transformer after forward cache (pos=44; cache_sample=8.125)
Transformer before forward input pos: tensor([45], device='cuda:0', dtype=torch.int32); (pos=45 cache_sample=0.0) | Transformer after forward cache (pos=45; cache_sample=1.40625)
Transformer before forward input pos: tensor([46], device='cuda:0', dtype=torch.int32); (pos=46 cache_sample=0.0) | Transformer after forward cache (pos=46; cache_sample=-2.234375)
Transformer before forward input pos: tensor([47], device='cuda:0', dtype=torch.int32); (pos=47 cache_sample=0.0) | Transformer after forward cache (pos=47; cache_sample=-0.6171875)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([47], device='cuda:0', dtype=torch.int32)
local_token_tensor: [40, 3077, 2751, 433, 25, 330, 15717, 8783, 560, 3343]
I've got it: "berryjuice".

User: What's the derivative of ln(x)? Reply briefly.
Model: Transformer before forward input pos: tensor([47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
        65, 66, 67], device='cuda:0', dtype=torch.int32); (pos=47 cache_sample=-0.6171875) | Transformer after forward cache (pos=47; cache_sample=-0.5078125)
Transformer before forward input pos: tensor([68], device='cuda:0', dtype=torch.int32); (pos=68 cache_sample=0.0) | Transformer after forward cache (pos=68; cache_sample=4.71875)
Transformer before forward input pos: tensor([69], device='cuda:0', dtype=torch.int32); (pos=69 cache_sample=0.0) | Transformer after forward cache (pos=69; cache_sample=7.21875)
Transformer before forward input pos: tensor([70], device='cuda:0', dtype=torch.int32); (pos=70 cache_sample=0.0) | Transformer after forward cache (pos=70; cache_sample=1.9765625)
Transformer before forward input pos: tensor([71], device='cuda:0', dtype=torch.int32); (pos=71 cache_sample=0.0) | Transformer after forward cache (pos=71; cache_sample=-4.4375)
Transformer before forward input pos: tensor([72], device='cuda:0', dtype=torch.int32); (pos=72 cache_sample=0.0) | Transformer after forward cache (pos=72; cache_sample=-7.46875)
Transformer before forward input pos: tensor([73], device='cuda:0', dtype=torch.int32); (pos=73 cache_sample=0.0) | Transformer after forward cache (pos=73; cache_sample=-1.4296875)
Transformer before forward input pos: tensor([74], device='cuda:0', dtype=torch.int32); (pos=74 cache_sample=0.0) | Transformer after forward cache (pos=74; cache_sample=2.390625)
Transformer before forward input pos: tensor([75], device='cuda:0', dtype=torch.int32); (pos=75 cache_sample=0.0) | Transformer after forward cache (pos=75; cache_sample=3.625)
Transformer before forward input pos: tensor([76], device='cuda:0', dtype=torch.int32); (pos=76 cache_sample=0.0) | Transformer after forward cache (pos=76; cache_sample=3.15625)
Transformer before forward input pos: tensor([77], device='cuda:0', dtype=torch.int32); (pos=77 cache_sample=0.0) | Transformer after forward cache (pos=77; cache_sample=-3.484375)
Transformer before forward input pos: tensor([78], device='cuda:0', dtype=torch.int32); (pos=78 cache_sample=0.0) | Transformer after forward cache (pos=78; cache_sample=-2.328125)
Transformer before forward input pos: tensor([79], device='cuda:0', dtype=torch.int32); (pos=79 cache_sample=0.0) | Transformer after forward cache (pos=79; cache_sample=-0.3984375)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([79], device='cuda:0', dtype=torch.int32)
local_token_tensor: [791, 32905, 315, 30490, 2120, 8, 374, 220, 16, 11009, 13]
The derivative of ln(x) is 1/x.

User: What word did I ask you to remember? 
Model: Transformer before forward input pos: tensor([79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
        97, 98], device='cuda:0', dtype=torch.int32); (pos=79 cache_sample=-0.3984375) | Transformer after forward cache (pos=79; cache_sample=-0.64453125)
Transformer before forward input pos: tensor([99], device='cuda:0', dtype=torch.int32); (pos=99 cache_sample=0.0) | Transformer after forward cache (pos=99; cache_sample=2.140625)
Transformer before forward input pos: tensor([100], device='cuda:0', dtype=torch.int32); (pos=100 cache_sample=0.0) | Transformer after forward cache (pos=100; cache_sample=8.9375)
Transformer before forward input pos: tensor([101], device='cuda:0', dtype=torch.int32); (pos=101 cache_sample=0.0) | Transformer after forward cache (pos=101; cache_sample=5.71875)
Transformer before forward input pos: tensor([102], device='cuda:0', dtype=torch.int32); (pos=102 cache_sample=0.0) | Transformer after forward cache (pos=102; cache_sample=-2.421875)
Transformer before forward input pos: tensor([103], device='cuda:0', dtype=torch.int32); (pos=103 cache_sample=0.0) | Transformer after forward cache (pos=103; cache_sample=-4.34375)
Transformer before forward input pos: tensor([104], device='cuda:0', dtype=torch.int32); (pos=104 cache_sample=0.0) | Transformer after forward cache (pos=104; cache_sample=-0.462890625)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([104], device='cuda:0', dtype=torch.int32)
local_token_tensor: [1, 15717, 8783, 560, 1]
"berryjuice"

With yield cur_token.clone()

Starting Interactive Chat
Entering Chat Mode. Will continue chatting back and forth with the language model until the models max context length of 8192 tokens is hit or until the user says /bye
Do you want to enter a system prompt? Enter y for yes and anything else for no. 
y
What is your system prompt? 
You are a helpful assistant.
User: Can you remember this word for me: "berryjuice"? Reply briefly.
Model: Transformer before forward input pos: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36], device='cuda:0', dtype=torch.int32); (pos=0 cache_sample=0.0) | Transformer after forward cache (pos=0; cache_sample=0.5546875)
Transformer before forward input pos: tensor([37], device='cuda:0', dtype=torch.int32); (pos=37 cache_sample=0.0) | Transformer after forward cache (pos=37; cache_sample=4.3125)
Transformer before forward input pos: tensor([38], device='cuda:0', dtype=torch.int32); (pos=38 cache_sample=0.0) | Transformer after forward cache (pos=38; cache_sample=6.9375)
Transformer before forward input pos: tensor([39], device='cuda:0', dtype=torch.int32); (pos=39 cache_sample=0.0) | Transformer after forward cache (pos=39; cache_sample=-1.15625)
Transformer before forward input pos: tensor([40], device='cuda:0', dtype=torch.int32); (pos=40 cache_sample=0.0) | Transformer after forward cache (pos=40; cache_sample=-7.75)
Transformer before forward input pos: tensor([41], device='cuda:0', dtype=torch.int32); (pos=41 cache_sample=0.0) | Transformer after forward cache (pos=41; cache_sample=-3.625)
Transformer before forward input pos: tensor([42], device='cuda:0', dtype=torch.int32); (pos=42 cache_sample=0.0) | Transformer after forward cache (pos=42; cache_sample=0.025390625)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([42], device='cuda:0', dtype=torch.int32)
local_token_tensor: [1, 51302, 8783, 560, 1, 128009]
"Berryjuice"<|eot_id|>

User: What's the derivative of ln(x)? Reply briefly.
Model: Transformer before forward input pos: tensor([43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
        61, 62, 63], device='cuda:0', dtype=torch.int32); (pos=43 cache_sample=0.0) | Transformer after forward cache (pos=43; cache_sample=0.0264892578125)
Transformer before forward input pos: tensor([64], device='cuda:0', dtype=torch.int32); (pos=64 cache_sample=0.0) | Transformer after forward cache (pos=64; cache_sample=0.0634765625)
Transformer before forward input pos: tensor([65], device='cuda:0', dtype=torch.int32); (pos=65 cache_sample=0.0) | Transformer after forward cache (pos=65; cache_sample=-5.96875)
Transformer before forward input pos: tensor([66], device='cuda:0', dtype=torch.int32); (pos=66 cache_sample=0.0) | Transformer after forward cache (pos=66; cache_sample=-5.21875)
Transformer before forward input pos: tensor([67], device='cuda:0', dtype=torch.int32); (pos=67 cache_sample=0.0) | Transformer after forward cache (pos=67; cache_sample=-1.4296875)
Transformer before forward input pos: tensor([68], device='cuda:0', dtype=torch.int32); (pos=68 cache_sample=0.0) | Transformer after forward cache (pos=68; cache_sample=5.40625)
Transformer before forward input pos: tensor([69], device='cuda:0', dtype=torch.int32); (pos=69 cache_sample=0.0) | Transformer after forward cache (pos=69; cache_sample=2.921875)
Transformer before forward input pos: tensor([70], device='cuda:0', dtype=torch.int32); (pos=70 cache_sample=0.0) | Transformer after forward cache (pos=70; cache_sample=2.046875)
Transformer before forward input pos: tensor([71], device='cuda:0', dtype=torch.int32); (pos=71 cache_sample=0.0) | Transformer after forward cache (pos=71; cache_sample=-2.078125)
Transformer before forward input pos: tensor([72], device='cuda:0', dtype=torch.int32); (pos=72 cache_sample=0.0) | Transformer after forward cache (pos=72; cache_sample=-5.8125)
Transformer before forward input pos: tensor([73], device='cuda:0', dtype=torch.int32); (pos=73 cache_sample=0.0) | Transformer after forward cache (pos=73; cache_sample=-4.25)
Transformer before forward input pos: tensor([74], device='cuda:0', dtype=torch.int32); (pos=74 cache_sample=0.0) | Transformer after forward cache (pos=74; cache_sample=0.7734375)
Transformer before forward input pos: tensor([75], device='cuda:0', dtype=torch.int32); (pos=75 cache_sample=0.0) | Transformer after forward cache (pos=75; cache_sample=0.6484375)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([75], device='cuda:0', dtype=torch.int32)
local_token_tensor: [791, 32905, 315, 30490, 2120, 8, 374, 220, 16, 11009, 13, 128009]
The derivative of ln(x) is 1/x.<|eot_id|>

User: What word did I ask you to remember? 
Model: Transformer before forward input pos: tensor([76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
        94, 95], device='cuda:0', dtype=torch.int32); (pos=76 cache_sample=0.0) | Transformer after forward cache (pos=76; cache_sample=0.6484375)
Transformer before forward input pos: tensor([96], device='cuda:0', dtype=torch.int32); (pos=96 cache_sample=0.0) | Transformer after forward cache (pos=96; cache_sample=-4.4375)
Transformer before forward input pos: tensor([97], device='cuda:0', dtype=torch.int32); (pos=97 cache_sample=0.0) | Transformer after forward cache (pos=97; cache_sample=-6.4375)
Transformer before forward input pos: tensor([98], device='cuda:0', dtype=torch.int32); (pos=98 cache_sample=0.0) | Transformer after forward cache (pos=98; cache_sample=-4.375)
Transformer before forward input pos: tensor([99], device='cuda:0', dtype=torch.int32); (pos=99 cache_sample=0.0) | Transformer after forward cache (pos=99; cache_sample=1.7578125)
Transformer before forward input pos: tensor([100], device='cuda:0', dtype=torch.int32); (pos=100 cache_sample=0.0) | Transformer after forward cache (pos=100; cache_sample=7.03125)
Transformer before forward input pos: tensor([101], device='cuda:0', dtype=torch.int32); (pos=101 cache_sample=0.0) | Transformer after forward cache (pos=101; cache_sample=3.609375)
Transformer before forward input pos: tensor([102], device='cuda:0', dtype=torch.int32); (pos=102 cache_sample=0.0) | Transformer after forward cache (pos=102; cache_sample=-1.5234375)
Transformer before forward input pos: tensor([103], device='cuda:0', dtype=torch.int32); (pos=103 cache_sample=0.0) | Transformer after forward cache (pos=103; cache_sample=-4.4375)
Transformer before forward input pos: tensor([104], device='cuda:0', dtype=torch.int32); (pos=104 cache_sample=0.0) | Transformer after forward cache (pos=104; cache_sample=-6.5625)
Transformer before forward input pos: tensor([105], device='cuda:0', dtype=torch.int32); (pos=105 cache_sample=0.0) | Transformer after forward cache (pos=105; cache_sample=1.3828125)
Transformer before forward input pos: tensor([106], device='cuda:0', dtype=torch.int32); (pos=106 cache_sample=0.0) | Transformer after forward cache (pos=106; cache_sample=7.84375)
Transformer before forward input pos: tensor([107], device='cuda:0', dtype=torch.int32); (pos=107 cache_sample=0.0) | Transformer after forward cache (pos=107; cache_sample=3.28125)
Transformer before forward input pos: tensor([108], device='cuda:0', dtype=torch.int32); (pos=108 cache_sample=0.0) | Transformer after forward cache (pos=108; cache_sample=-0.04248046875)
last cur_token: tensor([128009], device='cuda:0', dtype=torch.int32)
final token tensor([128006], device='cuda:0', dtype=torch.int32) input_pos: tensor([108], device='cuda:0', dtype=torch.int32)
local_token_tensor: [2675, 4691, 757, 311, 6227, 279, 3492, 330, 15717, 8783, 560, 3343, 128009]
You asked me to remember the word "berryjuice".<|eot_id|>

@Jack-Khuu
Copy link

If you are up for it, spinning this up as a PR would be great to have.

We can even wire up some visualization to this down the line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants