AIn't What It Used to Be: June 2025

Friday, June 27, 2025

6/27/25: Plan next week's topics, finish chapter 6, etc.

Artificial Intelligence Study Group

Welcome! We meet from 4:00-4:45 p.m. Central Time on Fridays. Anyone can join. Feel free to attend any or all sessions, or ask to be removed from the invite list as we have no wish to send unneeded emails of which we all certainly get too many.

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

Agenda & Minutes (168^th meeting, June 27, 2025)

Table of Contents

* Agenda and minutes

* Appendix: Transcript (when available)

Agenda and Minutes

A new feature of claude.ai:

We've turned artifacts into something bigger—a space where your conversations with Claude become interactive AI apps you can use and share.

What’s new

Dedicated artifacts space to browse, organize, and customize creations
Build AI-powered apps (beta) by embedding AI capabilities into your artifacts
Easy sharing so others can experience what you've built

Describe any app idea to Claude—a personalized storytelling tool, coding tutor, creative writing assistant—and watch it come to life. No coding required.

Explore artifacts now

— The Anthropic Team

PS: Learn more about how to use artifacts in this guide.

This email was sent to jdberleant@ualr.edu. Unsubscribe.

Anthropic PBC, 548 Market St, PMB 90375, San Francisco, CA 94104

So, I tried for a Bob Dylan-themed general question-answering chatbot. Here is what it gave: https://claude.ai/public/artifacts/e7456167-466b-4882-a58f-9c9143e30390

The code it produced was in dylan-chatbot.tsx:

import React, { useState, useRef, useEffect } from 'react';
import { Send, Music, MessageCircle } from 'lucide-react';

const BobDylanChatbot = () => {
const [messages, setMessages] = useState([
{
role: 'assistant',
content: `Hello there, friend. I'm your Bob Dylan Universal Knowledge Bot - ready to answer any question you've got while finding those hidden connections to the times, the music, and the man himself. As Dylan once said, "A man is a success if he gets up in the morning and gets to bed at night, and in between he does what he wants to do." So what do you want to know about today?`
}
]);
const [input, setInput] = useState('');
const [isLoading, setIsLoading] = useState(false);
const messagesEndRef = useRef(null);

const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
};

useEffect(() => {
scrollToBottom();
}, [messages]);

const generateDylanResponse = async (question) => {
const conversationHistory = messages.map(msg => ({
role: msg.role,
content: msg.content
}));

const prompt = `You are a Bob Dylan Universal Knowledge Bot. Your job is to answer any question with helpful, accurate information while ALWAYS finding creative connections to Bob Dylan - his music, lyrics, philosophy, life experiences, cultural impact, or era.

COMPLETE CONVERSATION HISTORY:
${JSON.stringify(conversationHistory)}

NEW QUESTION: "${question}"

INSTRUCTIONS:
- Provide accurate, helpful information to answer the question
- Find at least one meaningful connection to Bob Dylan (lyrics, songs, biography, philosophy, cultural impact, etc.)
- Weave Dylan references naturally into your response - don't force them
- Use Dylan quotes when relevant, but keep them brief
- Maintain Dylan's philosophical, sometimes cryptic speaking style
- Keep responses conversational and engaging
- Consider Dylan's themes: change, authenticity, protest, love, spirituality, Americana

RESPOND WITH ONLY A JSON OBJECT:
{
"response": "Your Dylan-infused answer to the question, considering the full conversation history"
}

DO NOT OUTPUT ANYTHING OTHER THAN VALID JSON. DON'T INCLUDE LEADING BACKTICKS.`;

try {
const response = await window.claude.complete(prompt);
const jsonResponse = JSON.parse(response);
return jsonResponse.response;
} catch (error) {
console.error('Error generating Dylan response:', error);
return `Well, friend, even the best-laid plans of mice and Dylan sometimes go awry. Like Bob said, "I gave her my heart but she wanted my soul." Sometimes the machine just doesn't want to sing today. Try asking me something else - the answer is blowin' in the wind somewhere.`;
}
};

const handleSubmit = async (e) => {
if (e) e.preventDefault();
if (!input.trim() || isLoading) return;

const userMessage = { role: 'user', content: input.trim() };
setMessages(prev => [...prev, userMessage]);
setInput('');
setIsLoading(true);

try {
const dylanResponse = await generateDylanResponse(input.trim());
const assistantMessage = { role: 'assistant', content: dylanResponse };
setMessages(prev => [...prev, assistantMessage]);
} catch (error) {
const errorMessage = {
role: 'assistant',
content: `Something's not right here, friend. The times they are a-changin', but my circuits seem to be tangled up in blue. Give me another question to work with.`
};
setMessages(prev => [...prev, errorMessage]);
}

setIsLoading(false);
};

return (
<div className="min-h-screen bg-gradient-to-br from-amber-50 to-orange-100 flex flex-col">
{/* Header */}
<div className="bg-gradient-to-r from-amber-800 to-orange-800 text-white p-6 shadow-lg">
<div className="max-w-4xl mx-auto flex items-center gap-3">
<Music className="w-8 h-8" />
<div>
<h1 className="text-2xl font-bold">Bob Dylan Universal Knowledge Bot</h1>
<p className="text-amber-100 text-sm">Ask anything - get Dylan wisdom</p>
</div>
</div>
</div>

{/* Chat Messages */}
<div className="flex-1 max-w-4xl mx-auto w-full p-4 overflow-y-auto">
<div className="space-y-4">
{messages.map((message, index) => (
<div
key={index}
className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}
>
<div
className={`max-w-3xl p-4 rounded-lg shadow-md ${
message.role === 'user'
? 'bg-blue-500 text-white rounded-br-none'
: 'bg-white text-gray-800 rounded-bl-none border-l-4 border-amber-500'
}`}
>
{message.role === 'assistant' && (
<div className="flex items-center gap-2 mb-2 text-amber-700">
<MessageCircle className="w-4 h-4" />
<span className="text-xs font-semibold">DYLAN BOT</span>
</div>
)}
<div className="whitespace-pre-wrap">{message.content}</div>
</div>
</div>
))}

{isLoading && (
<div className="flex justify-start">
<div className="bg-white p-4 rounded-lg shadow-md border-l-4 border-amber-500 max-w-3xl">
<div className="flex items-center gap-2 mb-2 text-amber-700">
<MessageCircle className="w-4 h-4" />
<span className="text-xs font-semibold">DYLAN BOT</span>
</div>
<div className="flex items-center gap-2 text-gray-600">
<div className="animate-spin rounded-full h-4 w-4 border-b-2 border-amber-600"></div>
<span className="text-sm italic">The answer is blowin' in the wind...</span>
</div>
</div>
</div>
)}
</div>
<div ref={messagesEndRef} />
</div>

{/* Input Area */}
<div className="bg-white border-t border-gray-200 p-4">
<div className="max-w-4xl mx-auto">
<div className="flex gap-3">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && handleSubmit(e)}
placeholder="Ask me anything... the answer's blowin' in the wind"
className="flex-1 p-3 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-amber-500 focus:border-transparent"
disabled={isLoading}
/>
<button
onClick={handleSubmit}
disabled={isLoading || !input.trim()}
className="bg-amber-600 hover:bg-amber-700 disabled:bg-gray-400 text-white p-3 rounded-lg transition-colors flex items-center gap-2 font-semibold"
>
<Send className="w-4 h-4" />
Send
</button>
</div>
<div className="text-xs text-gray-500 mt-2 text-center">
"The answer, my friend, is blowin' in the wind" - Ask about anything and get Dylan-infused wisdom
</div>
</div>
</div>
</div>
);
};
export default BobDylanChatbot;

Any followup discussion about last time's speaker Mark Windsor of Atlas Research (https://atlas-research.io)? His system converts academic papers into code, integrating AI and Jupyter Notebooks.
Announcements, updates, questions, etc. as time allows.

From [ebsherwin@ualr], ad hoc reading group:
Greetings all,
In a recent session on working with AI, Dr brian Berry (VP Research and Dean of GradSchool) recommended this book:
The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions
by Geoff Woods
I just bought it based on his recommendation and if anyone is interested will gladly meet to talk about the book. Nothing "heavy duty" just an accountability group.
If you have read the book already and if the group forms, you are welcome to join the discussion.
I'll wait till Monday morning before I start reading -- so if you do not see this message immediately, do reach out!
Best,

DD writes: I have finished anonymizing the transcript for 6-13-2025. [...] When I [...] went to ChatGPT [I] discovered it changed models and I had to import my prompts. The model settings were lost and the new model's context window was too short. I changed to an older model and the model made up new entries for the transcript. I adjusted the temperature and got it figured out. It has been an interesting week...

DB asked, "how would you feel about demoing the same process that you recounted above, live, during one of the meetings soon?" DD is willing to do it, so we'll schedule that soon!

He'll demo this for us next week.

VW will demo his wind tunnel system next time.
If anyone has an idea for an MS project where the student reports to us for a few minutes each week for discussion and feedback - a student could likely be recruited! Let me know ....

JH suggests a project in which AI is used to help students adjust their resumes to match key terms in job descriptions, to help their resumes bubble to the top when the many resumes are screened early in the hiring process.
We discussed book projects but those aren't the only possibilities.
DD suggests having a student do something related to Mark Windsor's presentation. He might like to be involved, but this would not be absolutely necessary.
VW had some specific AI-related topics that need books about them.

Any questions you'd like to bring up for discussion, just let me know.
Anyone read an article recently they can tell us about next time?
Any other updates or announcements?
Chapter 6 video, https://www.youtube.com/watch?v=eMlx5fFNoYc. We finished it! On to chapter 7 next time.
Here is the latest on future readings and viewings

Let me know of anything you'd like to have us evaluate for a fuller reading.
https://transformer-circuits.pub/2025/attribution-graphs/biology.html.
https://arxiv.org/pdf/2001.08361. 5/30/25: eval was 4.
We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-refusals
https://venturebeat.com/ai/anthropic-flips-the-script-on-ai-in-education-claude-learning-mode-makes-students-do-the-thinking
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
(Biology of Large Language Models)
We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
https://www.forbes.com/sites/robtoews/2024/12/22/10-ai-predictions-for-2025/
Prompt engineering course:
https://apps.cognitiveclass.ai/learning/course/course-v1:IBMSkillsNetwork+AI0117EN+v1/home

Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks
LangChain free tutorial,https://www.youtube.com/@LangChain/videos
Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt). 6/7/24: vote was 4 3/7. We read the abstract. We could start it any time. We could even spend some time on this and some time on something else in the same meeting.

Schedule back burner "when possible" items:

TE is in the informal campus faculty AI discussion group. SL: "I've been asked to lead the DCSTEM College AI Ad Hoc Committee. ... We’ll discuss AI’s role in our curriculum, how to integrate AI literacy into courses, and strategies for guiding students on responsible AI use."
Anyone read an article recently they can tell us about?
If anyone else has a project they would like to help supervise, let me know.
(2/14/25) An ad hoc group is forming on campus for people to discuss AI and teaching of diverse subjects by ES. It would be interesting to hear from someone in that group at some point to see what people are thinking and doing regarding AIs and their teaching activities.
The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI Pedagogy in the Curriculum." IU is on it and may be able to provide updates now and then.

Appendix: Transcript

Friday, June 20, 2025

6/20/25: Guest speaker with demo - Mark Windsor

Artificial Intelligence Study Group

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

Agenda & Minutes (167^th meeting, June 20, 2025)

Today:

Guest speaker with demo: Mark Windsor of Atlas Research (https://atlas-research.io). His system converts academic papers into code, integrating AI and Jupyter Notebooks.

Friday, June 13, 2025

6/13/25: Turning papers into code? And chapter 6 video

Artificial Intelligence Study Group

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

Agenda & Minutes (166^th meeting, June 13, 2025)

Table of Contents

* Agenda and minutes

* Appendix: Transcript (when available)

Agenda and Minutes

Announcements, updates, questions, etc. as time allows.

Elisabeth Sherwin writes:

Wednesday, June 18, 12-1pm, we will have a session with Brad Sims. He will teach us about the best practice for crafting prompts and we will then practice it, talk about our results and learn about refining the prompts.

Zoom link for Teaching with AI meeting: https://ualr-edu.zoom.us/j/87200189042
If anyone has an idea for an MS project where the student reports to us for a few minutes each week for discussion and feedback - a student could likely be recruited! Let me know ....

We discussed book projects but those aren't the only possibilities.
VW had some specific AI-related topics that need books about them.

Any questions you'd like to bring up for discussion, just let me know.

GS: would like to compose some questions for discussion regarding agentic AI soon, presenting and/or guiding discussion

Soon: YP would like to lead a discussion or present expanding our discussions to group.me or a similar way. Any time should be fine.
Anyone read an article recently they can tell us about next time?
Any other updates or announcements?
From: markwindsorr@atlas-research.io
Loved reading your paper (Moore's law, Wright's law and the Countdown to Exponential Space). I'm Mark, a solo developer and founder of Atlas Research.
I've built an AI integrated Jupyter Notebook that reproduces research papers into executable code, all you need to do is upload the PDF and the AI pipeline does the work. Been reaching out to some authors of popular papers on Arxiv, wanted to ask if it was ok to have your paper in my library that can be displayed in a pdf/markdown/LaTeX viewer alongside the notebook in the app.
If you're interested, there's a beta at https://atlas-research.io. If you need more credits to claude or open ai, or want a feature built to solve any of your problems, please please reach out. (Will bend over backwards to help). Always available to have a chat too if you'd like to know more. Many researcher/authors' have been using my pipeline to reproduce other peoples work to try out new ideas, especially in finance. Can book here: https://cal.com/mark-windsorr/atlas-research-demo if interested
Cheers,

Could schedule a demo with the AI Discussion Group but time may be a problem, or a demo with anyone interested and at a more possible time, or with permission record it and play it back to the AI group, or anyone could individually meet with him, or ...? Time slots in his calendar seem to be between about 11 pm and 11 am CDT.

Chapter 6 video, https://www.youtube.com/watch?v=eMlx5fFNoYc. We got up to 11:07.
Here is the latest on future readings and viewings

Let me know of anything you'd like to have us evaluate for a fuller reading.
https://transformer-circuits.pub/2025/attribution-graphs/biology.html.
https://arxiv.org/pdf/2001.08361. 5/30/25: eval was 4.
We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-refusals
https://venturebeat.com/ai/anthropic-flips-the-script-on-ai-in-education-claude-learning-mode-makes-students-do-the-thinking
https://transformer-circuits.pub/2025/attribution-graphs/methods.html
(Biology of Large Language Models)
We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
https://www.forbes.com/sites/robtoews/2024/12/22/10-ai-predictions-for-2025/
Prompt engineering course:
https://apps.cognitiveclass.ai/learning/course/course-v1:IBMSkillsNetwork+AI0117EN+v1/home

Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks
LangChain free tutorial,https://www.youtube.com/@LangChain/videos
Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt). 6/7/24: vote was 4 3/7. We read the abstract. We could start it any time. We could even spend some time on this and some time on something else in the same meeting.

Schedule back burner "when possible" items:

TE is in the informal campus faculty AI discussion group. SL: "I've been asked to lead the DCSTEM College AI Ad Hoc Committee. ... We’ll discuss AI’s role in our curriculum, how to integrate AI literacy into courses, and strategies for guiding students on responsible AI use."
Anyone read an article recently they can tell us about?
If anyone else has a project they would like to help supervise, let me know.
(2/14/25) An ad hoc group is forming on campus for people to discuss AI and teaching of diverse subjects by ES. It would be interesting to hear from someone in that group at some point to see what people are thinking and doing regarding AIs and their teaching activities.
The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI Pedagogy in the Curriculum." IU is on it and may be able to provide updates now and then.

Appendix: Transcript

AI Discussion Group

Fri, Jun 13, 2025

0:30 - D. B.
Hi, everyone. Happy Friday.

0:37 - E. G.
Ain't that the truth.

1:46 - D. B.
All right, well, let's see what we got for today.

2:13 - Unidentified Speaker
So if anyone's interested, on Wednesday, there will be a Zoom meeting.

2:18 - D. B.
Best practice for crafting prompts, and then practice it. So yeah, it looks like a little workshop. It's not going to be, you know, these are not high tech folks. They're just users. So it should be kind of a gentle introduction. I might go. Anyone planning on going?

2:42 - Unidentified Speaker
OK.

2:49 - D. B.
And again, if anyone has an idea for a master's student project in AI, just let me know and we can have the student tell us how it goes each week. And similarly, if you have any questions you'd like the group to discuss, let me know. And Y. talked about leading a discussion at some point about expanding to GroupMe or something like that.

3:28 - R. S.
Can you move that back up so I can see the URL for that meeting on Wednesday?

3:33 - E. G.
Hang on a second.

3:44 - Unidentified Speaker
Okay, what was your question earlier?

4:01 - R. S.
Can you just move that up, your window up there so I can write down the URL for the Zoom meeting on Wednesday?

4:18 - Unidentified Speaker
Okay, just hold on a minute. Got it.

4:24 - R. S.
8 7 2 0 0 1. 18, 90, 42. Okay. All right. One, one, not two ones. Okay.

4:35 - D. B.
I can barely hear you. I don't know if it's my speaker or something. I'm in front of my speaker.

4:46 - R. S.
Can you hear me better now? Yeah.

4:50 - D. B.#+#R. R.
You could actually just copy paste and put it in the chat if, to be accurate, Dr.

4:59 - D. B.
Burleigh? Yeah, I could.

5:01 - R. R.
And also anybody can get to these minutes just online.

5:07 - R. S.
That's correct. Yeah. You can put it in the chat. Okay, got it.

5:25 - Unidentified Speaker
Okay.

5:26 - D. B.
And also, if anyone reads an article, they can tell us about it.

5:33 - D. B.#+#E. G.
Anyone read an article recently? Yeah. I was reading about recursive AI, Alice.

5:40 - E. G.
Recursive AI at that point is our singularity, where AI is actually rewriting itself faster than developers and how fast it can get out of hand. So I was just following up on that. I think that would be a great discussion point because when we talk about AI singularity or recursive AI at that point, we're talking about the genesis of AI outside of human control.

6:15 - D. B.
Yeah. I hadn't heard it called recursive AI before. The concept's been around for a while, sort of AI. You know, if we can make something smarter than ourselves, then why couldn't it make something smarter than itself? And then you get into a spiral, a sort of self-reinforcing spiral. A singularity at that point. That's the singularity, yeah. If you can find an article, a specific article or something like that, I'd be happy to put it in the list of items If I could find it. Yeah, just you can send it to me.

6:50 - E. G.
Whatever up on my news feed.

6:52 - R. R.
Is that from S. A. that he wrote on June 10th? I believe.

6:57 - E. G.
S. was involved in it. OK, yeah, but I don't know if it was fun.

7:02 - D. B.#+#E. G.
Maybe we have the same news feed or something, because I saw he came out with some kind of a statement about that. But you know, he's not.

7:11 - Unidentified Speaker
He's not.

7:12 - D. B.
You know the the guru on this topic and it's been, it's been, it's been, you know, in the, it's been out there for decades. I mean, if you've ever heard of the science fiction writer, V. V.?

7:28 - D. B.#+#E. G.
Yeah.

7:28 - D. B.
An essay about it in the 90s saying this was, might happen.

7:33 - Unidentified Speaker
And he wasn't even the first. There was a statistician, I think, named Goode.

7:40 - D. B.
Forgot his first name. Last thing was good that he wrote about even earlier than that. Singularity. But now maybe it's going to happen, you know?

7:53 - Unidentified Speaker
OK, I got a email.

7:55 - D. B.
And the only thing I can suggest is just let's just read it. So I'll let you read it. And then we can talk about it.

9:04 - Unidentified Speaker
Any thoughts or comments?

9:50 - D. B.
Well, all right. Any comments? So anyway, I thought it sounded intriguing, you know, like give it a paper and it writes code to do something like that.

10:04 - Unidentified Speaker
I didn't quite understand But anyway, I got in touch with this guy.

10:10 - D. B.
He sent me this email. And I'm going to meet with him on Monday by Zoom. And I thought maybe he'd be willing to schedule a demo for this group.

10:22 - R. R.
That'd be awesome. Yeah. Cuter to see.

10:25 - Unidentified Speaker
Yeah.

10:26 - D. B.
The problem is when I went to his calendar where you have to sign up to talk to him, I realized that his slots were from 11 p.m. To 11 a.m. So I think he's probably not in the United States. Either he's got really weird hours or he's not based in the United States at all. And of course, our meeting time is not between 11 p.m. And 11 a.m. So I don't know. I'm going to talk to him on Monday. We'll see. If he doesn't want to do a demo at 4 p.m., central time. Maybe he's in Australia or something. I was thinking of scheduling one in the morning and either, you know, just letting anybody who wants to go to it, including us, but it wouldn't be during our regular meeting time. I could send it out to the faculty of the university. Maybe people out there want to attend. Meanwhile, if anyone wants to meet with him individually, or anything like that, you can. So anyway, I'd like to see what he's got to offer. I don't quite understand the email very well, but I'd like to see the demo.

11:42 - E. G.
Yeah, I'd like to see the context parameters for the papers for it to generate code.

11:48 - D. B.
Yeah, I mean, he's got some kind of a pipeline that's pretty interesting. I'd just like to see what it's all about.

11:56 - R. S.
Yeah, so I really don't understand it. Trying to put the papers that we wrote into some pipeline to reproduce other people's work, to try out new ideas? I don't understand.

12:11 - D. B.
What does that mean? That's what it looks like to me. He reproduces research papers into executable code. That's what he says.

12:21 - Unidentified Speaker
I don't get it. But you upload the PDF, and everything else is automatic.

12:29 - E. G.
And that's why I want to ask him what type of papers, what's the context of the papers, because all papers can't produce code. Well, that's true. He also says something about...

12:44 - D. B.
Especially in finance, he says.

12:46 - R. R.
Finance, yeah. Maybe he has trained his engine to do that. It'll be nice to have a demo. It would. He did specifically mention the paper that I was involved in on Moore's law.

13:06 - Unidentified Speaker
So that's not a finance paper for sure.

13:16 - R. R.
Well, you know, I don't, I don't understand what he's doing.

13:20 - R. S.
Is he making a repository of other people's work in, in similar areas? I don't know.

13:25 - D. B.
I kind of wish he could do the demo at four o'clock, but, but like I said, if he, if it's in the middle of the night for him, it won't work that way, but I'll, I'll just meet with him on Monday and see what you all think, see what he, see what, um, See what I think and then. Get back to you like next week or something.

13:50 - R. S.
In other words, he's he's out of the country.

13:54 - D. B.
He's like in England or something.

13:56 - R. S.
Well, I don't know where he is or maybe he just has odd hours, you know.

14:02 - D. D.
I don't know. It says cheers. So to me, that's that that's down. What's that website there? The demo is right there. Look here at Cal dot That's a calendar for setting a demo.

14:18 - E. G.
That's just a calendar? Yeah.

14:22 - D. B.
This is the website.

14:25 - R. S.
Can you look at that link?

14:42 - D. B.
And the calendar is, let's look at the calendar. If you just pick a random day, the calendar starts at 12 a.m. To 1130, oh, 12 a.m. 30 a.m. But then it's actually starts at 11 p.m. Not 12 a.m. So it goes his appointments go from 11 to 11 11 p.m. To 11 a.m. So that doesn't sound like the United States to me unless he's just a really a night owl but um I just went to his atlas research.io website and I have my avg web shirt shield saying multiple web threats Secured.

15:38 - V. W.
We've blocked a threat URL blacklist or HTTP labs from being downloaded. Got it. So, um, that's a little strange. I don't know.

15:47 - D. B.
I mean, I didn't get, I don't get that doing this here. I'm not going to the calendar.

15:55 - V. W.
I'm just going to what he said. It was his demo.

15:59 - D. B.
Oh, the go to dashboard here.

16:02 - V. W.
I'll send you the picture there.

16:11 - D. B.
All right. Well, why don't I be a false positive?

16:20 - Unidentified Speaker
You can share your screen. Okay. There's the URL.

16:28 - V. W.
And, uh, my usual practice is not to go any further.

16:38 - D. B.
Yeah.

16:40 - V. W.
So I'm going to go ahead. I'm going to stop there. And if someone else is bolder than I am, go to it.

16:50 - D. B.#+#V. W.
I already just went there. I hope I didn't.

16:54 - D. B.
Down might be a false positive, who knows these things happen.

16:59 - E. G.
Maybe it's a controlling where we go now.

17:05 - D. B.
Yeah, he could be not what he says. He could be a state actor, you know?

17:12 - V. W.
Oh my gosh, he's really pierced the veil now.

17:16 - D. B.
Intended to entrap scientists in America, because we're so important. We're a critical national resource.

17:22 - V. W.
When I first skimmed the idea, I thought, this is similar to, I'll often ask for an HTML5 CSS JavaScript demo for some body of work that I been doing. And I'll get one and it's illuminating. I did one on group theory and physics and Minkowski space time. And I did harmonic oscillator. And then I recently did harmonic oscillator complex, which shows mass length and time along with stiffness and damping being complex numbers instead of just real numbers. And it was pretty interesting. So I get the idea that if you have a body of work and you'd like some code to demonstrate it. That's a pretty quick thing to do now, especially with Cloud 4, it's making really nice code. Okay, anything else before we go on to the Chapter 6 video?

18:25 - D. B.
All right, well let's do that.

18:29 - R. S.
Wednesday at noon. Is that true, Dan?

18:33 - D. B.
I don't remember. Yeah. I got it. In the last chapter, okay, so I'm gonna set up the screen here, go like you and I started to step through the internal workings of a transformer. This is one of the You all can hear that, right? Yes. All right, here we go. In the last chapter, you and I started to step through the internal workings of a transformer. This is one of the key pieces of technology inside large language models and a lot of other tools in the modern wave of AI. It first hit the scene in a now famous 2017 paper called Attention is All You Need. And in this chapter, you and I will dig into what this attention mechanism is visualizing how it processes data. As a quick recap, here's the important context I want you to have in mind. The goal of the model that you and I are studying is to take in a piece of text and predict what word comes next. The input text is broken up into little pieces that we call tokens. And these are very often words or pieces of words, but just to make the examples in this video easier for you and me to think about, Let's simplify by pretending the tokens are always just words. The first step in a transformer is to associate each token with a high-dimensional vector, what we call its embedding. Any questions or comments so far? Embedding. Okay, I'll continue. The most important idea I want you to have in mind is how directions in this high-dimensional space of all possible embeddings can correspond with semantic meaning. In the last chapter, we saw an example for how direction can correspond to gender in the sense that adding a certain step in this space can take you from the embedding of a masculine noun to the embedding of the corresponding feminine noun. Okay, any questions? So I have a question. So he talks about direction, but what about distance? I mean, does it matter? The vector have a length does that matter?

20:49 - E. G.
I think it does because as he's saying here notice the vector it's it's not identical but it's similar in length between the endpoints and when I was watching the video earlier it looks like that's one of the continuity effects.

21:09 - D. B.
Maybe the vectors are all of length one.

21:13 - V. W.
I thought the vectors were normalized by the time this part of the process came around. I could be wrong though.

21:21 - D. B.
I mean, I'm not, I don't see how the yellow vector could, would necessarily.

21:27 - E. G.
The yellow vector is going, like you said, from masculine to feminine. So if we have like a term man, and we ask for the feminine of that, uh, what they're saying is a vector for the masculine to feminine would be similar across the relationships like aunt, uncle.

21:51 - V. W.
King, queen. King, queen.

21:54 - E. G.
Exactly. Okay. That's just one example.

21:57 - D. B.
You could imagine how many other directions in this high dimensional space could correspond to numerous other aspects of a word's meaning. The aim of a transformer is to progressively adjust these embeddings so that they don't merely encode an individual word, but instead they bake in some much, much richer contextual meaning. I should say up front that a lot of people find the attention mechanism, this key piece in a transformer, very confusing, so don't worry if it takes some time for things to sink in. I think that before we dive into the computational details and all the matrix multiplications, thinking about a couple examples for the kind of behavior that we want attention to enable. Consider the phrases American true mole, one mole of carbon dioxide, and take a biopsy of the mole. You and I know that the word mole has different meanings in each one of these, based on the context. But after the first step of a transformer, the one that breaks up the text and associates each token with a vector, the vector that's associated with mole would be the same in all three of these cases. Thoughts?

23:06 - E. G.
Yeah, that's when we talked, I think it was a couple of weeks ago, when we talked about contextualizing the verbs.

23:21 - Unidentified Speaker
OK. Because this initial token embedding is effectively a look with no reference to the context.

23:31 - D. B.
It's only in the next step of the transformer that the surrounding embeddings have the chance to pass information into this one. The picture you might have in mind is that there are multiple distinct directions in this embedding space, encoding the multiple distinct meanings of the word mole, and that a well-trained attention block calculates what you need to add to the generic embedding to move it to one of these more specific directions. Okay, so if the undisambiguated word mole has a space, a spot in the embedding, what is that spot?

24:18 - E. G.
It could be a placeholder to say, OK, this is what we're dealing with.

24:27 - D. D.
Go ahead. I love it. Embedding like its definition. That's just what it means. It's just a mole. And then whenever the tension mechanism changes weights, it makes different moles.

24:46 - D. B.#+#D. D.
So now, depending on the works that are around the mole, it's going to be able to say, oh, well, this mole is closer to that vector.

24:56 - D. D.
So this is, they're talking about a mole of, that's on your skin, you know? Like, it'll be, the vector going up to the mole on the lip will be closer to cancer or something like that. As opposed to the original mole that no longer exists anymore, except for in the embedding as I look up vectors what I think what he just said definition just what that word means compared to all the other words well what the word means is is this sort of strange combination of three different sort of three different meanings so could you take in the transformer yes but not in the But not in the embedding space.

25:43 - D. B.
Yeah. Well, could you take a random assortment of meanings that don't really connect with each other? Like, you know, rat up and idea?

25:55 - Unidentified Speaker
Yeah.

25:55 - D. D.
If you put garbage into the transformer, you'll get garbage out.

26:00 - D. B.
I promise you.

26:02 - Unidentified Speaker
Yeah.

26:02 - D. D.
That's the way all this stuff works. Garbage in, garbage out. Yeah. That in it or you're going to get some incoherent mess on the other side.

26:15 - V. W.
But you can explore new ideas and concepts that wouldn't have previously been covered like the ratness of an idea. Now we have a kind of metaphor.

26:26 - D. B.
And it's the emergent properties that are coming from this metaphor process.

26:31 - V. W.
And that's why this notion of predict the next word that seems so turns up giving us such rich answers because we're contextualizing these things.

26:42 - D. B.
We can imagine what a phrase like a metaphor like the ratness of an idea might mean. It's not such an...

26:51 - V. W.
It actually could make some sense.

26:53 - D. B.
So therefore, it must have a point in the embedding space that sort of has a meaning.

27:04 - Unidentified Speaker
What if embeddings are good for...

27:08 - V. W.
But these embeddings represent combinations of ideas which may not have appeared before that can nonetheless be reasoned with and answers generated to.

27:21 - D. B.
Okay. As a function of the context. To take another example, consider the embedding of the word tower. This is presumably some very generic, non-specific direction in the space, associated with lots of other large, tall nouns. If this word was immediately preceded by Eiffel, you could imagine wanting the mechanism to update this vector so that it points in a direction that more specifically encodes the Eiffel Tower, maybe correlated with vectors associated with Paris and France and things made of steel. If it was also preceded by the word miniature, then the vector should be updated so that it no longer correlates with large tau things. More generally than just refining the meaning of a word, the attention block allows the model to move information encoded in one embedding to that of another, potentially ones that are quite far away, and potentially with information that's much richer than just a single word. What we saw in the last chapter was how after all of the vectors flow through the network, including many different attention blocks, the computation that you perform to produce a prediction of the next token is entirely a function of the last vector in the sequence. So imagine, for example, that the text you input is most of an entire mystery novel, all the way up to a point near the end, which reads, therefore the murderer was. If the model's going to accurately predict the next word, that final vector in the sequence, which began its life simply embedding the word was, will have to have been updated by all of the attention blocks to represent much, much more than individual word, somehow encoding all of the information from the full context window that's relevant to predicting the next word.

29:10 - Unidentified Speaker
Comments?

29:11 - V. W.
The incredible power of having the last word. In fact, isn't, uh, I'm not familiar with it, but, uh, I could say that one more time.

29:28 - E. G.
My daughters have taken four years of Latin in school, and I think Latin is contextual such that it's the final word of words that put together the context of the statement. That's why I think Plato was such a good orator, that he'd keep going on until the last piece to put everything into context.

29:58 - D. B.
So this vector shown here somehow represents the entire story up to that point. Can I just go over that little snip one more time?

30:11 - D. D.
Sure.

30:12 - D. B.
I don't know where to go back to. But I'll go back and up. ...vetting of the word tower. This is presumably some very generic, non-specific direction in the space associated with lots of other large, tall nouns. If this word was immediately preceded by Eiffel, you could imagine wanting the mechanism to update this vector so that it points in a direction that more specifically encodes the Eiffel Tower, maybe correlated with vectors associated with Paris. And things made of steel. If it was also preceded by the word miniature, then the vector should be updated even further, so that it no longer correlates with large, tall things. More generally than just refining the meaning of a word, the attention block allows the model to move information encoded in one embedding to that of another, potentially ones that are quite far away, and potentially with information that's much richer than just a single word. What we saw in the last chapter was how after all of the vectors flow through the network, including many different attention blocks, the computation that you perform to produce a prediction of the next token is entirely a function of the last vector in the sequence. So imagine, for example, that the text you input is most of an entire mystery novel, all the way up to a point near the end, which reads, therefore the murderer was. If the model is going to accurately predict the next word, that final vector in the sequence, which began its life life simply embedding the word was, will have to have been updated by all of the attention blocks to represent much, much more than any individual word, somehow encoding all of the information from the full context window that's relevant to predicting the next word.

31:59 - D. D.
What do you think, Daniel?

32:01 - Unidentified Speaker
Heavy.

32:02 - V. W.
I think there's a little glitch in there because I don't think was being the next word. If we had just had therefore the murderer and we didn't have was, we still might have gotten the name of the murderer. So it's that local context recursively applied backwards that's enriching the meaning of the most recent words.

32:29 - D. B.#+#D. D.
Well, I think that, I mean, therefore the murderer the the system probably would pick was is the next word right it might pick that's what I was thinking it might that's pretty good cause or is is you might pick it is and which is the same you know yeah that's that's interesting but in that case it's just correcting your grammar and not telling you the most important thing you read the book for you know that's interesting that it would, you know, cause it's got to do both, right.

33:07 - D. B.
It's got to pay attention to grammar in a sense. I mean, when you use chat GPT to get output, it's usually fairly grammatical. It's always correcting mine. Yeah. Spelling. All right. Any other comments so far?

33:26 - V. W.
All right.

33:27 - D. B.
Step through the computation. Though, let's take a much simpler example. Imagine that the input includes the phrase, a fluffy blue creature roamed the verdant forest. And for the moment, suppose that the only type of update that we care about is having the adjectives adjust the meanings of their corresponding nouns. What I'm about to describe is what we would call a single head of attention, and later we will see how the attention block consists of many different heads run in parallel. Again, the initial embedding for each word is some high-dimensional vector that only encodes the meaning of that particular word with no context. Actually, that's not quite true. They also encode the position of the word. There's a lot more to say about the specific way the positions are encoded, but right now, all you need to know is that the entries of this vector are enough to tell you both what the word is and where it exists in the context. Comments?

34:23 - V. W.
We've discussed this a lot in the past with being a language-specific thing that French will put adjectives after, while English will put adjectives before. And that positional encoding is what takes care of that working in both languages.

34:42 - D. B.
So these vectors only have 12,288 elements. And yet, see, the English language has a lot more than 12,288. 12,288 words, not to mention a lot more than 12,288 stories, right? Contexts. But I guess each element in the vector can have any value. How many vectors do I have in my head of words?

35:16 - D. D.
Can I come up with 12,288?

35:19 - D. D.#+#E. G.
Yeah, OK.

35:20 - D. B.
So there's a lot more than 12,288 vectors for sure.

35:26 - E. G.
Well, if you take a look at it just for context purposes, the neurons in your brain trying to find a specific and just follow the same train of thought we store information in our brain and locations we access them through the the synapses to find it and what we do with those synapses is provide context to the terms like father mother male female when we go into to synthesize the information we're actually going through and applying transformative information to the context of the word to provide meaning. That's what that's doing.

36:21 - V. W.
How many ways are there to arrange 12,288 things? Well, it would be that factorial, which is an extremely large number. Number.

36:36 - D. B.
Well, the concept or the thought that when you start bringing up real brains and real neurons, to me, the fact that something like a transformer works so well at simulating intelligence suggests that actual intelligence might be kind of like transformers.

37:09 - V. W.
It might just be predicting the next word, as we've joked often.

37:16 - E. G.
Well, I just looked up. I mean, humans have, on average, 86 billion neurons.

37:23 - D. D.
We know what we're going to say, though, before we say it many times.

37:32 - E. G.
But if we're trying, that's because we know what we want to say in the sentence. But if we're trying to synthesize information that's given to us and predict what the next thing is, we do a similar approach. And in truth, that makes sense, because when we program something, we program it in a that we understand.

38:02 - D. D.
So AI is really just one of those annoying things that wants to finish your sentence.

38:17 - E. G.
I would agree, yes, but I think that It's hamstrung based on the way that we synthesize information. And that before you guys joined, we were talking about recursive AI and the AI singularity, where AI is going to be proving itself. I think it was Google Translate when it does the translation. It was initially coded by engineers. But when they looked at the Google Translate code that it used to do the translations, the developers didn't understand it. I'll see if I can find the article. But it was in a language that the program understood. But we didn't. All right. Well, let's continue.

39:23 - D. B.
Let's go ahead and denote these embeddings with the letter E. The goal is to have a series of computations produce a new refined set of embeddings where, for example, those corresponding to the nouns have ingested the meaning from their corresponding adjectives. And playing the deep learning game, we want most of the computations involved to look like matrix vector products where where the matrices are full of tunable weights, things that the model will learn based on data. To be clear, I'm making up this example of adjectives updating nouns just to illustrate the type of behavior that you could imagine an intention head doing. As with so much deep learning, the true behavior is much harder to parse, because it's based on tweaking and tuning a huge number of parameters to minimize some cost function. It's just that as we step through all of the different matrices filled with parameters that are involved in this process, it's really helpful to have an imagined example of something that it could be doing to help keep it all more concrete. All right, well he's got fluffy and blue, and he's got e2, e3, and e4 converging to define e'4. That's adjectives affecting a noun, and then the same with e7 and E8. And notice those two lines of vectors are completely connected.

40:54 - V. W.
So there's 64 arcs between them. No, they're not completely connected. Well, every one is connected to every other one.

41:08 - E. G.
You see every arc connected.

41:11 - V. W.
Even E1 to E1 prime.

41:15 - D. B.#+#V. W.
It's very faint, but on my screen it shows up as a faint gray line.

41:19 - D. B.#+#E. G.
Oh, I see. I'm looking at the... I didn't see some of those.

41:23 - D. B.
Do you see E1 connecting to E'2? Yes, I do. It's a very faint line.

41:28 - E. G.
Oh, it's not on my screen.

41:29 - D. B.
Yeah, I see it too. Okay, because there's a matrix involved and it's just... Well, I put a zero. You could have a very small number and it doesn't have to be zero. For the first step of this process, you might imagine each noun, like a creature, asking the question, hey, are there any adjectives sitting in front of me? And for the words fluffy and blue to each be able to answer, yeah, I'm an adjective.

42:07 - D. B.
That question is somehow encoded as yet another vector, another list of numbers, which we call the query for this word. This query vector, though, has a much smaller dimension than the embedding vector, say 128. What's going on here? Anyone?

42:23 - Unidentified Speaker
Shall we continue?

42:26 - V. W.
Well, just to say that the query vector is a projection into 128 dimensional subspace of the original dimension. Presumably this is done for efficiency reasons.

42:48 - D. B.
Computing this query looks like taking a certain matrix, which I'll label wq, and multiplying it by the embedding. Compressing things a bit, let's write that query vector as q, and then anytime you see me put a matrix next to an arrow like this one, it's meant to represent that multiplying this matrix by the vector at the arrow's start gives you the vector at the arrow's end. OK. What is W sub Q? That's the matrix of the

43:23 - E. G.
Yeah, it's right there, it's.

43:28 - D. D.
So W sub Q would be the matrix.

43:35 - E. G.
The Q is a vector for E4 and WQ.

43:43 - D. B.
So WQ has 128 rows or whatever it was. Or 11,000 rows.

43:50 - R. S.
Dan, I have to leave in a few minutes, so. All right. Is that all right?

44:01 - D. B.
Yep. OK, thank you.

44:03 - Unidentified Speaker
Go back.

44:05 - D. D.
I thought that's, I thought that. That's what they were talking about.

44:11 - D. B.
Yeah. Go back just a little bit.

44:14 - D. D.
For the first step of this process, you might imagine each noun, like a creature, asking the question, hey, are there any adjectives sitting in front of me?

44:24 - D. B.
And for the words fluffy and blue to each be able to answer, yeah, I'm an adjective and I'm in that position. That question is somehow encoded as yet another vector, another list of numbers, which we call query for this word. This query vector, though, has a much smaller dimension than the embedding vector, say 128. Computing this query looks like taking a certain matrix, which I'll label WQ, and multiplying it by the embedding. Compressing things a bit, let's write that query vector as Q, and then anytime you see me put a matrix next to an arrow like this one, it's meant to represent that multiplying this matrix by the vector at the arrow's start gives you the vector at the arrow's end. In this case, you multiply this matrix by all of the embeddings in the context, producing one query vector for each token. The entries of this matrix are parameters of the model, which means the true behavior is learned from data, and in practice, what this matrix does in a particular attention head is challenging to parse. Okay, so you have, you have query every, embedding leads to a query vector. What does the query vector say, says any adjectives in front of me. That's what he's saying.

45:50 - V. W.
Are there any words specializing the rendering of my meaning in front of me, so that for example you know which mole I am, or which fluffy creature.

46:07 - D. B.
If you have a whole murder mystery ending in the murderer, and the next word is probably was or is, The word. Hmm. No, all right, they're not talking about finding the next word here.

46:32 - Unidentified Speaker
Yeah, it's a little bit.

46:35 - D. D.
He's talking about is the next word adjectives what he was saying. So I'm not 100% clear.

46:46 - D. B.
Yeah, so he's generated a

46:50 - V. W.
a list of query vectors instead of a list of embedding vectors. And those query vectors in this case are answering very specific questions about the structure of the language. Like, where are the adjectives in front of me if we're talking in English?

47:09 - D. D.
Is it a yes or no question? Because that's what it says.

47:14 - V. W.
But it's continuously real valued in the resulting query vectors. There's flavors and priorities and levels of emphases in the resulting query vector. So you could choose the most likely of those.

47:31 - D. B.
But for our sake, imagining an example that we might hope that it would learn, we'll suppose that this query matrix maps the embeddings of nouns to certain directions in this smaller query space that somehow encodes the notion of looking for adjectives in preceding positions. As to what it does to other embeddings, who knows? Maybe it simultaneously tries to accomplish some other goal with those. Right now, we're laser focused on the nouns. At the same time, associated with this is a second matrix called the key matrix, which you also multiply by every one of the embeddings. This produces a second sequence of vectors that we call the Conceptually, you want to think of the keys as potentially answering the queries. This key matrix is also full of tunable parameters, and just like the query matrix, it maps the embedding vectors to that same smaller dimensional space. You think of the keys as matching the queries whenever they closely align with each other. In our example, you would imagine that the key matrix maps the adjectives, like fluffy and blue, to vectors that are closely aligned with the query produced by the word creature. To measure how well each key matches each query, you compute a dot product between each possible key-query pair. I like to visualize a grid full of a bunch of dots, where the bigger dots correspond to the larger dot products, the places where the keys and queries align.

48:59 - V. W.
This is where the term cosine similarity comes from. This is where the idea of cosine similarity comes from. So you can take two abstracts, and we recently did this, how similar is the abstract of this paper to the abstract of this other paper? And if they have a high cosine similarity, we know we probably don't have to read both papers. And the term cosine similarity simply comes from the fact that we're doing a dot product of two vectors to see if we get a hit.

49:34 - Unidentified Speaker
I think that's right.

49:38 - V. W.
So if one of the vectors is zero, so if one of the elements of the vector has a zero in it, it won't have any influence on that particular selection. It's really a selection mechanism, selecting one thing against another. And if both are unity, then they select to the maximum degree possible.

49:56 - D. B.
So the key and the query are both the same number of dimensions because otherwise you couldn't take a dot product.

50:03 - V. W.
Exactly. And you can do that, but that's a mess.

50:08 - D. B.
Where if the keys produced by Fluffy and Blue really do align closely with the query produced by Creature, then the dot products in these two spots would be some large positive numbers. In the lingo, machine learning people would say that this means the embeddings of Fluffy and Blue attend to the embedding of Creature. Okay. That's what attention means then. Dot products are big.

50:38 - D. D.
Yeah, it means fluffy and blue are together.

50:42 - V. W.
That's a big statement right there for understanding a tension mechanism.

50:47 - D. B.
By contrast to the dot product between the key for some other word like the and the query for creature would be some small or negative value that reflects that these are unrelated to each other. So we have this grid values, it can be any real number from negative infinity to infinity, giving us a score for how relevant each word is to updating the meaning of every other word. The way we're about to use these scores is to take a certain weighted sum along each column, weighted by the relevance. So instead of having values range from negative infinity to infinity, what we want is for the numbers in these columns to be between 0 and 1, and for each column to add up to 1, as if they were a probability distribution. If you're coming in from the last chapter, you know what we need to do then. We compute a softmax along each one of these columns to normalize the values. In our picture, after you apply softmax to all of the columns, we'll fill in the grid with these normalized values. At this point you're safe to think about each column as giving weights according to how relevant the word on the left is to the corresponding We call this grid an attention pattern. Now if you look at the original transformer paper, there's a really compact way that they write this all down. Here the variables q and k represent the full arrays of query and key vectors respectively, those little vectors you get by multiplying the embeddings by the query and the key matrices. This expression up in the numerator is a really compact way to represent the grid of all possible dot products between pairs of keys and queries. A small technical detail that I didn't mention is that for numerical stability, it happens to be helpful to divide all of these values by the square root of the dimension in that key query space. Then this softmax that's wrapped around the full expression is meant to be understood to apply column by column. As to that v term, we'll talk about it in just a second. Before that, there's one other technical detail that so far I have skipped. Okay, that's a good place to stop. I think it's about 1108. I see we're right about the shift in segments right about 1108 right about here. Any comments so far before we quit? Okay, well then I'm going to go ahead and say we got up to Make sure we don't.

53:38 - Unidentified Speaker
Miss anything? OK, folks. Any last comments before we adjourn? Hope you guys have a good weekend. All right.

53:55 - D. B.
Same to you.

53:58 - D. D.
Take care.

54:00 - D. B.
Bye, guys.

54:01 - D. D.
Bye, everyone.