AIn't What It Used to Be: 2/7/25: Deepseek discussion + standard discussions

Artificial Intelligence Study Group

Welcome! We meet from 4:00-4:45 p.m. Central Time. Anyone can join. Feel free to attend any or all sessions, or ask to be removed from the invite list as we have no wish to send unneeded emails of which we all certainly get too many.

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

Agenda & Minutes (149^th meeting, Feb. 7, 2025)

Table of Contents

* Agenda and minutes

* Transcript (when available)

Agenda and minutes

Announcements, updates, questions, presentations, etc.

Feb. 14: HM informally presents proposed MS project on "Evaluation of Snowflake Data Cloud Data Pipelines and AI/ML Capabilities"
Feb. 21: BH informally presents proposed PhD project on "Unveiling Bias: Analyzing Federal Sentencing Guidelines with Topological Data Analysis, Explainable AI, and RAG Integration"
Soon: VK will report on the AI content of a healthcare data analytics conference attended in FL.
MM suggests we view: Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks
MM suggests we view: LangChain free tutorial, https://www.youtube.com/@LangChain/videos
A review paper by MM and students is uploaded to our site.
Opportunities from MM:

New INVIDIA CODE FOR all courses, https://learn.nvidia.com/en-us/training/self-paced-courses, access code available from MM (we can't post it publicly here)
Want a certificate? https://www.nvidia.com/en-us/learn/certification/
https://www.accenture.com/us-en/insights/technology/technology-trends-
2025?c=acn_glb_accenturetechnogoogle_14215622&n=psgs_1224&&&&&gad_source=1&gclid=CjwKCA
iA-
ty8BhA_EiwAkyoa35H4kSP4ZCs6YInx4SEq4cJg_TZTu4Dwgd7522BqTbpxjZidBbumQRoCHaQQAvD_BwE&
gclsrc=aw.ds
Important papers: https://adasci.org/top-ai-research-papers-of-2024/

Deepseek is in the news! Any comments or questions about it?

Recall the masters project that some students are doing and need our suggestions about:

Suppose a generative AI like ChatGPT or Claude.ai was used to write a book or content-focused website about a simply stated task, like "how to scramble an egg," "how to plant and care for a persimmon tree," "how to check and change the oil in your car," or any other question like that. Interact with an AI to collaboratively write a book or an informationally near-equivalent website about it!

BI: Maybe something like "Public health policy." Not present today.
LG: Thinking of changing to "How to plan for retirement."

Looking at CrewAI multi-agent tool, http://crewai.com, but hard to customize, now looking at LangChain platform which federates different AIs. They call it an "orchestration" tool.
MM has students who are leveraging agents and LG could consult with them

ET: Gardening (veggies, herbs in particular). Specifically, growing vegetables from seeds.

ChatGPT started to get repetitive.
Trying to deal with possibility of hallucinations.
Makes images but not great ones.
Plan to make a website, integrating things together.

Anything else anyone would like to bring up?
We are up to 19:19 in the Chapter 6 video, https://www.youtube.com/watch?v=eMlx5fFNoYc and can start there.

Schedule back burner "when possible" items:

If anyone else has a project they would like to help supervise, let me know.
JK proposes complex prompts, etc. (https://drive.google.com/drive/u/0/folders/1uuG4P7puw8w2Cm_S5opis2t0_NF6gBCZ).
The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI Pedagogy in the Curriculum." IU is on it and may be able to provide updates when available, every now and then but not every week.

1/31/25: There is also an on-campus discussion group about AI in teaching being formed by ebsherwin@ualr.edu.

Here is the latest on future readings and viewings

We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
https://www.forbes.com/sites/robtoews/2024/12/22/10-ai-predictions-for-2025/
https://arxiv.org/pdf/2001.08361

Computer scientists win Nobel prize in physics! Https://www.nobelprize.org/uploads/2024/10/
popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks
LangChain free tutorial,https://www.youtube.com/@LangChain/videos
We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt). 6/7/24: vote was 4 3/7. We read the abstract. We could start it any time. We could even spend some time on this and some time on something else in the same meeting.

Transcript:

ML discussion group
Fri, Feb 7, 2025

0:00 - M. M.
I was thinking that is a good thing, but I don't know. For NVIDIA, at least for stock NVIDIA, it's good or bad. Yeah. I sent to D. B. a lot of links, but not many people are coming today.

0:15 - D. B.
Yeah, I even had one of your students ask me to add him and three of his other students to the list, but they're not here. Give him another minute. Oh, sure, sure.

0:28 - M. M.
Let's see the link.

0:30 - E. G.
Dr.

0:30 - M. M.
M., I posted it. Yeah, I can see it. OK. Yeah. Restricted AI, yeah. Well, but there will be a market for this, definitely. It's not something that people can People will love to use it. Because the crypto will become more and more popular. So I think that there is a lot of market for this. Yeah. So particularly for large linguist models.

1:07 - E. G.
I think crypto is going to go the way of the dodo. Quantum computing comes in. I read that news. It appears more of, you know, the news is trying to make somebody happy than real news.

1:33 - Y. P.
And because if you read through the article, it's, I mean, it seems that they, somebody wants to stop China from getting it. And then I think NVIDIA is getting the heat off. How did they get access to the chip? If you see like why they are doing this means if you think from business standpoint, it absolutely doesn't make sense from technology standpoint. I'm trying to see what is the benefit for NVIDIA to do that strategy.

2:10 - Multiple Speakers
And there is no logic.

2:12 - Y. P.
means practical logic that comes up, that they would do it for this reason, then when you start reading the article, and then you start reading that, hey, China got access to their GPUs, and they're trying to block and now I know, I don't know how they are going to block if the stuff is already in China.

2:32 - E. G.
So yeah, there's a lot of things to read between the lines.

2:36 - Y. P.
And I don't know, is this a good source?

2:40 - Unidentified Speaker
Tech.

2:40 - Y. P.
I use it a lot.

2:42 - E. G.
It gives me an idea of actually what's going on. And I did corroborate it with several new sources.

2:50 - Multiple Speakers
I know it's not just one new source. And also it seems that they're talking about just one chip.

2:58 - Y. P.
That doesn't mean they will not have other chips that are focused on doing certain functions. So there are so many things if and buts. But if you look at it at the surface, it just seems to be a news for someone that, hey, yes, Chinese people got access to our chips. But, you know, we are going to block it somehow. And I don't know, you know, how these Chinese people will force those updates on the current chips that are doing it so as to not to do it. So there's a lot to read between the lines, I guess, for this article. China compliant, that is the main, if you see the first line is China compliant. So I laugh at it when they say that. So it has become a joke. And I mean, if you think geopolitically, there is also a big assumption that yes, they were able to maybe copy the model and build the software. But there's also an assumption that China doesn't have capability to build something like this, the chips at all. So we'll see. I mean, it seems that there will be other chips that would do other things. But this particular RTX 5090D, just one of the many products NVIDIA has will not do certain things.

4:33 - D. B.
Well, OK. So here we are. Welcome, everybody. A couple of things on the agenda. So one of the PhD students, B. H. is his name. He'd like to present his proposed PhD project informally, is entitled as follows. And he'll do that on February 21, which I guess will be in two weeks from today. And then two weeks after that, another student, master's student, will present informally his project a much more kind of commercially focused project. And there's the title there. And any other students who want to present something, just let me know and I'll schedule them.

5:38 - Unidentified Speaker
And then Dr. M. had a few suggestions.

5:42 - D. B.
Here's two of them. So F., if you'd like to tell us about these.

5:51 - M. M.
Go right ahead. You just click on this blue one, brown. We follow them, but they have more new videos. So click and share with everybody. I really recommend everybody to go through these videos. Okay. So we discuss many of them. Including transformer, but go down and see they have a new stuff. They have a large language models, inside large language models, and memory in large language models. So yeah, this is, I think it's pretty new. Yeah, from 2024, but yeah. So to continue with our studies, if you have time, When we have time, we can continue with these videos.

6:48 - D. B.
It looks like some of these are chapters and some are, like this one's not at one of the chapters, but it still looks good.

6:57 - M. M.
For the beginners, yes, and this is for the beginners.

7:01 - D. B.
Yeah, a lot of these are chapters that we're currently on.

7:05 - M. M.
I think we're on chapter seven right now.

7:07 - D. B.
Yeah.

7:08 - Unidentified Speaker
Okay.

7:08 - Multiple Speakers
Okay, this is one link. Very cool, very cool.

7:11 - M. M.
Probably they do something new Maybe we have to check another one is I promise to give you this long chain Tutorial it's free. This is free tutorial that my students are using the tutorial for Nvidia code we discussed with D. B. that We want to give the code only for our students. So please feel free to contact me or D. B. For our students, we have still code that you can use, and any course in NVIDIA that is self-placed course, online learning course, will become free for you. I can share again how we can do this, and I really like it.

8:01 - D. B.
You can get a certificate too.

8:04 - M. M.
Yeah, you get the certificate too. Actually, this certificate, we are thinking maybe with the future to do some kind of hackathons. And this is the certificate, it's kind of preparation for the hackathon. Support this kind of activities.

8:25 - D. B.
Okay, let me show you one more thing here that Dr. M. provided. I'm going to go to the The website. Oh, there's a mini review.

8:42 - Unidentified Speaker
I've just uploaded it right to here by Dr. M. and two of her students.

8:50 - D. B.
And here you go. Here's what it is. Yeah. Feel free to read it.

8:59 - M. M.
We can read it together, too.

9:02 - D. B.
No, no, no. Not necessary.

9:05 - Multiple Speakers
No, no. This is exactly what we present the storytelling actually.

9:11 - Unidentified Speaker
Right.

9:11 - M. M.
This is what we present. So don't worry about this, but it's a storytelling, any kind of educational contents you can generate with multi-agents. It can be included with vision, you know, or text or graphics or whatever you need it. But I. will present extent version. So this is the main author, I. But he will present the extent version. So don't worry about this right now.

9:45 - D. B.
It's kind of OK. By the way, this student was in my course like a year ago or something like that.

9:54 - Multiple Speakers
This is I.'s wife. One of the women. His who? Husband and wife.

10:00 - M. M.
Oh, OK.

10:01 - Unidentified Speaker
Wow.

10:01 - D. B.
Cool.

10:02 - Multiple Speakers
Cool, cool. OK. I have a quick question for E. G.

10:07 - M. M.
I was asking here to prepare the quantum computing educational program and you mentioned the quantum computing. Are you familiar with quantum computing?

10:22 - E. G.
Yes, actually I got my IBM certification in QPL, the quantum programming language.

10:31 - M. M.
Oh, really?

10:32 - Unidentified Speaker
They have a work also kind of integrating with UDA, with NVIDIA.

10:41 - M. M.
I'm not familiar with this stuff, but probably we need to talk. So you see the future of quantum computing?

10:54 - D. B.
Yes.

10:55 - E. G.
I mean, I think quantum computing, if we can't address the temperature required to do quantum computing and qubits to manage it, it's not going to reach anything other than... Do you remember the Cray systems of the 80s and 90s where they had to be in nitrogen chambers? Had to be in nitrogen chambers for it to operate. You'll only have access to those in these big, big centers.

11:35 - D. B.
Why nitrogen? What's so great about nitrogen?

11:39 - E. G.
Because the temperature, you had to keep these great temperatures, the processors, under a certain temperature. Do they use liquid nitrogen or something? As far as I was aware, yeah, it was in a liquid nitrogen column.

11:59 - D. B.
Wow. All right. To the link for educational pro program from Canada, maybe E. G. or somebody else will be interested in forwarded on to me.

12:11 - E. G.
It's something it's a hobby. It's one of the things I enjoy reading about.

12:18 - M. M.
Yeah, I like it. I will show you right now. Okay. This is the Canada University is doing quantum computing and give some classes. I'm learning from there. Okay. Like you mentioned this cube, cubic, cubic.

12:38 - D. D.
The quantum computing though, that they can, big companies, you know, could use them to, you know, potentially train AIs and other things, right?

12:52 - E. G.
Yeah, buy time in it.

12:54 - D. D.
I mean, break encryptions, just, you know, basically do any number of things. It's just that we won't be able to put any of them in our house.

13:10 - M. M.
Right.

13:10 - E. G.
I mean, if you remember back in the 60s, you'd buy time on computers. That's what I think this is doing, is you just buy time on a computer.

13:28 - D. B.
Okay. I sent the link.

13:32 - M. M.
I am.

13:33 - E. G.
It's in French. All right, so DeepSeek is in the news.

13:41 - Multiple Speakers
Some people were talking about it earlier. Like before the meeting and during the beginning of the meeting.

13:51 - E. G.
Anything else anyone wants to ask about it or say about it?

13:56 - D. B.
It's being banned at all government sites right now.

14:00 - E. G.
Wow, why?

14:01 - D. B.
Security, data security. Yeah, well, you have to assume that these models are sucking up everything you type in.

14:10 - D. D.
Well, they were saying that there can be a direct to the Chinese government is what they were saying. But that hasn't stopped a lot of people from using it. And businesses are using it because it's open source so they don't have to pay. There's a lot of people using it. And I think someone said it earlier that there's some substantial evidence that what they did was they queried GPT-4 mm-hmm and and use that as their training data so yeah wait but if their method holds true then that means if a company did have enough money and enough of those older chips they could there's nothing to stop them from training large language models as they had previously thought so the It won't slow them down. They've got access to all the technology that they need to train large language models. And I think they published their method in a paper, and so I can't imagine if it doesn't work, I can't imagine it sticking around too long before they're exposed. What do you mean exposed?

15:38 - D. B.
For having trained them that, trained it that way? For, you know, if they were exposed, they really didn't train it the way they said. Oh yeah. Then they would be exposed for that. Well, you know, all these other AIs are kind of in the hot seat for sucking up everybody's copyrighted pages and information and using that for training. So is it really so different? You know, so they, you know, all the training data that was pilfered by, by the open app and everything.

16:18 - Multiple Speakers
So yeah.

16:19 - D. B.
They would have had to pay chat GPT for every token.

16:25 - D. D.
Yeah.

16:25 - D. B.
But maybe chat GPT should have paid everybody on the web who has a webpage for using their stuff.

16:34 - D. D.
I can't argue that.

16:36 - Unidentified Speaker
Yeah.

16:36 - A. B.
I think it was somebody was suing them because of that specific issue where they found that their content was directly tied to like outputs that I'm prompts and whatnot.

16:50 - D. D.
Yeah. It's, it's all on shaky ground. But it hurts. It hurts the big, the big ones that spent, you know, billions and what terrible things the Chinese Communist Party.

17:04 - Multiple Speakers
Sorry, sorry. Sorry, guys, my I had a comment, but

17:10 - E. G.
No, the it's actually pretty cool, though, because it makes you wonder we're banning a deep seek because of the data it could gather from us. It's communist China. But what's to prevent these other ones from being. Nefarious in the data collect, because it knows who it's collecting it from Oh, yeah.

17:39 - D. D.
Yeah, it's like, What was what's the Chinese government going to do with our data that everybody else is doing with our data? You know, it's it's not.

17:53 - Multiple Speakers
It's still just big Data to them, right?

17:57 - D. D.
They're just using it.

17:59 - E. G.
Oh, no, I'm sure they're mining it for like patterns So that way they could do sentiment analysis on what's going on because they want to destabilize The population here to put themselves in a better position Because China has one thing we don't in the U.S.

18:20 - D. D.
Continuity of government Well, I don't know if China's motives are that nefarious, but they might be, I don't know. But I suspect that they probably just want to sell us stuff.

18:34 - D. B.
Russia is definitely, they've been involved in destabilizing operations as a tool of foreign policy for a couple of hundred years. China, they may decide that's the way to go, but they don't have that mission that Russia has. But, you know, another thing is, of course, you know, China's probably going to be developing dossiers on every American that they're interested in based on, in part on their use of programs like this.

19:03 - D. D.
Well, you know, you consider they might have dossiers on all their citizens. Oh, yeah, I'm sure they do.

19:11 - D. B.
Maybe they want dossiers on the world.

19:14 - D. D.
You know, I don't, I don't I don't know, but I think that the United States and China are so intertwined financially that nobody wants to destabilize the other one that much. Yeah, who knows? Who knows? Yeah, I don't know. I don't know, but I do know that you can go download that model right now for free. You don't have to pay anybody. You can put it on your computer and someday if you get powerful enough hardware, you can infer it right at your house.

19:51 - D. B.
Isn't it nice that this is an advance in algorithms that really makes the AI much more accessible to people without a billion dollars to train? It should really stimulate the development of AI by making it it much more accessible to more companies to develop.

20:19 - E. G.
I think if we can create models that will fit on standard GPUs within the 24 gig frameworks, then we'll start having the lightning straight up advancements. And utilization.

20:40 - D. D.
But in a way, that's kind of what they're saying they did. They took these smaller chip sizes with less memory. And so if you could stage it down the way they did, then there's no reason why you couldn't stage it down to a smaller chip size of a GPU that's standard. And in a way, that's That's what they've done. How much time that would take, you know, I don't know on a standard size, but I mean, it's still got to make this all this the computations. So, but it would certainly be faster than the CPU.

21:25 - E. G.
Well, if that's the case. I'm going to. Maybe have a working session with anybody, and I'll get it running on my system.

21:38 - D. B.
Yeah, if you want to load it and demo it for the meeting or something, we can do it sometime. Somebody else had a comment.

21:51 - Unidentified Speaker
Somebody?

21:51 - Y. P.
Yeah, I was going to say something. Um, uh, this is, uh, Y. P. We actually, I started using, uh, deep sea, uh, we have been using it. Uh, and obviously I don't share any confidential information or what I, what we feel for our business or our clients is confidential. We are not comfortable sharing it. And I know, uh, When I prompted earlier, actually, my son was talking to me. He was listening to the conversation. And there are some famous prompts on social media going on, where if you type, for example, in DeepSeek, certain things that, for example, what has Chinese government done wrong, it will actually say that this is not in scope, whereas the moment you ask what the U.S. Government has done, terrible things U.S.

22:59 - D. B.
government has done, it will start writing.

23:02 - Y. P.
So what is clear is that Deep Seek is controlled by the Chinese government. And for that matter, from geopolitical and business and commercial standpoint, I'm just avoiding putting anything confidential that I don't want the third party to know. So that is one thing that I've told my team, not to use DeepSync for that is private, confidential, or you don't want to share with anyone. Number two is, when it comes to free versions, I'm not kind of that impressed. Like, oh, wow, this is so great compared to the other models we are using. So for the free model, I don't see any particular benefit of using that for the usage that we have, like code generation or marketing or other use cases that we are doing internally or for our clients. I don't see any significant benefit. But yes, when it comes to the paid version, not just the $20 to $100, but the, the, the tokens and the, that is significantly cheaper. So without the data or with open source data, if you want to build something and just clicked on download option of DeepSync, but I think if you want to get into actually building models and all that, I think it is great. It is a great learning experience. It is a great research opportunity to actually learn what they have done and perhaps replicate so that it becomes more commoditized. So there are a lot of things that we can learn from positively that how could they do it, the operational efficiency, but I'm not comfortable sharing or using it, especially for confidential information. So that is the input that I wanted to share with.

25:17 - D. D.
So if I understood you correctly, you're saying that it just doesn't cost that much to use the for-profit AIs anyway. Is that what you're saying? The cost savings is not that great because the other AIs are inexpensive? No, not really.

25:40 - Y. P.
So there are three dimensions. One is you can still use chat GPT as is for various things free, right? There are some models you have to pay and there are free and the free version is free for both. And when it comes to actual outputs that we are seeing, DeepSeek is not that impressive. For example, cloud for Python code development, we like it a lot. Or v0 for front-end development, we like it a lot compared to DeepSync. It is not great. So why would I change if it's doing something better? Then when it comes to the paid version, for especially the tokens, if you're building RAG models and if you're building, If you are doing engineering, then obviously DeepSeek is extremely cheaper compared to JAD-GPT, almost one-tenth. So you can work on engineering using DeepSeek to learn how they're built and the backend and whether we can commoditize just the build engineering piece of it because it is open source. But even if it is open source, when it comes to training and that engineering piece, I will use open source data train, the moment you want to do anything confidential, private, or that needs to be more secure and protected, you would not use DeepSeek. Just because of the test examples that I gave, that the moment you ask questions about Chinese government, Communist Party, etc., it becomes extremely protective. Whereas if you ask about US government, what terrible things Communist Party has done, you know, not in scope, US government has done completely in scope. So that's why I will be very cautious about it. But there are some things where it is extremely cheaper to do. But on the positive note, what I'm saying is, there's a lot of research opportunities universities have to see what they have done, and perhaps replicate that here, so that These models can be more commoditized, more open for people at large. That is the thinking that I have, uh, that we can take deep sick positively that, okay. You know what? Yes, there is issue with confidentially data and geopolitical issues, but we have to learn from them how they did what they did. So yeah. Yeah. Yeah. It's semi could get a grant right now.

28:28 - D. D.
to test that, I'm sure.

28:31 - Y. P.
Exactly. Yeah.

28:32 - Unidentified Speaker
OK.

28:32 - D. B.
Anyone else have any thoughts on this? All right. So next time on the agenda, recall that we have this master's student, set of master's students doing projects where they're using ChatTPT or other AI, generative AI, to write a book. Or a equivalent website. And the idea was that they would meet with us weekly and give us a quick update on how they're doing and ask any questions, get some advice from all of you smart people, and see how it goes. And we could all learn from their experience. So I thought all these students would be here every week, but I know L., he he wrote in and said he had something, so he can't be here. Others are not, just not here, but E. T., you're here, so maybe you could give us an update on how you're doing and get some feedback as needed.

29:42 - E. T.
Hello. So, past weeks, what I've tried for my project, I, again, I expanded the steps, but I tried to analyze the redundancy. And it started actually repeating some steps, some fat parts underneath several steps. And the other thing I wanted to analyze is, okay, AI, ChatTPT has given a lot of information, but these are actually, are these actually truth? So I wanted to compare the results from ChatTPT with with the scientific results. So it gave me, I think it was 500,000. Let me check one more time. It updated up to 500,000 words. And it started comparing what ChatTPT, what itself AI wrote to scientific studies from web. Next, I tried diagrams. But again, as I said previously, I mean, it creates diagrams. It's beautiful. It's very useful. But they're not at the point where I would like it. The last thing I want to start off this week is starting to get these things together and try to build my website with these. OK. So are you going with a book or a website?

31:34 - D. B.
Oh, I changed my mind to a website. OK. Hmm.

31:42 - D. B.
So when you say 500,000 words, are you talking about the memory, the context memory of the thing when you're interacting with it? Or what is it?

31:57 - E. T.
Oh, it explained every step over, not over and over. It explained the steps for planting as starting with the seeds and expanded up to 500,000. 500,000 words as compared with the scientific researches.

32:15 - D. B.
Well, 500,000 words, you mean you have 500,000 words that it created for you? Yes. That's a lot. It is.

32:24 - E. T.
Yeah, it was pretty long. I didn't have much time to go through everything, but...

32:31 - D. B.
I mean, a whole full-size book is, you know, typically on the order of 100,000, maybe less. I think.

32:41 - E. T.
So yeah, 500,000 words is a lot of stuff. Interesting. Any suggestions, any more guidance? So what are you using to build actual website? Well, I tried crew AI, but honestly, I'm not sure if that's something for personal use. It asks for my company. I mean, I work at a public school, so is that what I'm supposed to put there as the company?

33:20 - D. B.
Well, typically, if they want the company, you'd say, well, your company is the Little Rock School District, or the city of Little Rock, or something like that.

33:31 - E. T.
OK. I mean, I'll try that. But again, I don't have much experience with crew AI, so I'll probably explore what it does.

33:41 - D. B.
Actually, you need to be a little careful because you're not using this for your work. So maybe you don't want to say what company you're from in case it thinks that you're doing it for work.

33:55 - E. T.
I don't know. I'm not sure. Should I use it or not?

34:02 - D. B.
Maybe it should say you're using it for your, for personal use or something.

34:07 - E. T.
I don't know. Okay. Well, I'll do that.

34:10 - Y. P.
There are, there are other utilities you might want to try. So if you're asked to change ChatTPT or other AIs, I'm trying to use crew AI and it's giving me a problem. What are the other competitors and you will get options and try the other options. So there are many, many solutions. If you give right prompts, it will build the website for you.

34:37 - D. B.
Well, do you have any questions for us specifically that we can address?

34:44 - E. T.
Well, I believe I will have more clear questions once I start building the website. But as I mentioned in my email, I think it would be be more beneficial for me to have weekly goals.

35:03 - Multiple Speakers
The way I envisioned structuring this is for the students to meet with us weekly and get report weekly and get advice weekly, as opposed to meeting with me in an appointment every week or something like that.

35:20 - D. B.
That was my hope. But I mean, I'm certainly happy to meet individually with students as needed. I just want to use this meeting as a substitute for me meeting individually with each student weekly.

35:35 - E. T.
I see.

35:35 - Multiple Speakers
Yeah, I mean, this will work as well. OK. We'll try it. And then, as needed, I can always meet with people individually outside the meetings.

35:46 - E. T.
Thank you. OK.

35:47 - D. B.
My hope was that other students would be here, and they would be able to learn from each other as they their progress, but it hasn't happened yet.

35:59 - Multiple Speakers
So D. B. actually, we will start, somehow her email continued to go to my spam, then she reached out to me. So B. will start working with us on a research project starting Monday.

36:15 - Y. P.
I'll send you the, I requested her to create a summary, but essentially we are building a trust model. Since we are already building, she'll do a component of that. Okay.

36:29 - D. B.
It's a combination. Yeah, so for those of you who are not, don't understand what we're talking about, Y. P. requested to work with one or more of our master's students on an intern, like a CPT, Curriculum Practical Training basis at an internship. And so we have, I guess, one student who now is working with with Y. P., I guess that's finally come together. So I look forward to hearing more about that in another time.

37:05 - Unidentified Speaker
All right.

37:07 - H.
Anyone else have anything before we go to our reading?

37:13 - D. B.
Okay, well, let's see where we are. Oh, we're up to 1305 in the video. Dr.

37:23 - H.
V., this is H. Go ahead. Hi.

37:25 - D. B.
So I was here to present my project proposal, if I can.

37:30 - H.
OK. This was, I don't see that on the list, but yeah, I think we discussed it on the email. Which one was your proposal?

37:40 - D. B.
This was on Snowflake.

37:42 - Unidentified Speaker
Yeah.

37:42 - H.
Yeah, I think I see it. Yeah, right.

37:45 - D. B.
I have you down for March 7th. Is that, is that okay?

37:50 - H.
Oh, okay. Yeah, I thought, I believe we had some formal, informal discussion about the project proposal, right? Is that today or is it on March 7th?

38:00 - D. B.
Well, I didn't, I normally don't want to schedule something for the same week, because then I haven't had a chance to put it on the agenda, tell people to expect it. Okay.

38:12 - H.
But you know, you want to go next week?

38:15 - D. B.
do it next week, February 14th.

38:18 - H.
Sure, that should work. OK. All right.

38:22 - D. B.
Well, at least now you know what we're like, so you know we're not going to be mean to you.

38:31 - Unidentified Speaker
Sure, yeah.

38:32 - H.
I thought I would give like a high level on what I'm doing, so. Well, we can do that next week.

38:42 - D. B.
Sounds good, thank you. I appreciate your flexibility on that.

38:47 - H.
No problem. I should have checked the date. I saw the invite and I was like, OK, this must be today.

38:57 - D. B.
Oh, we obviously miscommunicated. I'm sorry about that. OK, so we're going to go to, where were we?

39:11 - D. B.
Minute 1305. This is why context size can be a real issue. Hang on. Minute 1305. Sorry. Actually, I'm sorry. I do need to unshare and then share again, optimizing for the video. So I'm going to do that. Now I'm going to do share again. Screen. Optimize for video. All right, now it should come out OK.

40:04 - Unidentified Speaker
Another fact that's worth reflecting on about this attention pattern is how its size is equal to the square of the context size. So this is why context size can be a really huge bottleneck for large language models, and scaling it up is non-trivial. As you might imagine, motivated by a desire for bigger and bigger context windows, recent years have seen some variations to the attention mechanism aimed at making context more scalable. But right here, you and I are staying focused on the basics.

40:30 - D. B.
Any comments or thoughts on this n squared problem? Okay. I was just feeling sick.

40:41 - Unidentified Speaker
Advertisement. That's the model deduce which words are relevant to which other words. Now you need to actually update the embeddings. Allowing words to pass information to whichever other words they're relevant to. For example, you want the embedding of fluffy to somehow cause a change to creature that moves it to a different part of this 12,000 dimensional embedding space that more specifically encodes a fluffy creature. What I'm going to do here is first show you the most straightforward way that you could do this, though there's a slight way that this gets modified in the context of multi-headed attention. This most straightforward way would be to use a third matrix, what we call the value matrix, which you multiply by the embedding of that first word, for example, fluffy. The result of this is what you would call a value vector. And this is something that you add to the embedding of the second word. In this case, something you add to the embedding of creature. So this value vector lives in the same very high dimensional space as the embeddings. When you multiply this value matrix by the embedding of a word, you might think of it as saying, if this word is to adjusting the meaning of something else, what exactly should be added to the embedding of that something else in order to reflect this? Comments or questions?

42:11 - D. B.
So multiply two matrices and then add the result to the word. I'm still not clear on how this, how this determines the relevance of one word to another. If two words are not relevant to each other, for example, fluffy and the word concept, then when you multiply these matrices, will you get very low values, zero or something in here?

42:51 - E. G.
I think in programming, and this is how I understand a lot of this is in programming terms, how you have a decorator class or a decorator pattern, you're adding a decorator to the word. So in here, fluffy for concept could mean amorphic or general or something like that, so it would look at synonyms that may be related to it. Because what I'm seeing here is you've got a base object, and now you've added kind of like a decorator pattern to that object.

43:34 - D. B.
So is this vector in this one that I'm showing with the cursor? Is that the word fluffy? I can't remember.

43:45 - E. G.
Oh, in a previous segment, it said it would take each term and pass it to the next. So that way it would pull it out. So if it said short haired versus long haired, it would put in what it identifies as short and hair. So it'd take hair and say short, hair, long.

44:14 - D. B.
It may be way off.

44:17 - E. G.
And since we're missing V. to argue with me. All right.

44:23 - D. B.
Well, any other comments on this? Okay. Looking back in our diagram, let's set aside all of the keys and the queries.

44:35 - Unidentified Speaker
Since after you compute the attention pattern, done with those, then you're going to take this value matrix and multiply it by every one of those embeddings to produce a sequence of value vectors. You might think of these value vectors as being kind of associated with the corresponding keys. For each column in this diagram, you multiply each of the value vectors by the corresponding weight in that column. For example, here, under the embedding of creature, you would be adding large proportions of the value vectors for fluffy and blue while all of the other value vectors get zeroed out, or at least nearly zeroed out. And then finally, the way to actually update the embedding associated with this column, previously encoding some context-free meaning of creature, you add together all of these rescaled values in the column, producing a change that you want to add, that I'll label delta e, and then you add that to the original embedding. Hopefully, what results is a more refined vector encoding the more contextually rich meaning of a fluffy blue creature.

45:37 - D. B.
And of course you don't just do this to one embedding, you apply the same weighted sum across all of the columns in this picture, producing a sequence of changes. Adding all of those changes to the corresponding embeddings produces a full sequence of more refined embeddings popping out of the attention block.

46:03 - Unidentified Speaker
Any comments?

46:04 - D. B.
So is the idea that for some of these columns, like maybe E4 plus delta E4 is pretty much the same as E4 again, whereas E5 plus delta E5 is quite different because fluffiness is relevant here, but not, you know, relevant to E5, but not to E4?

46:34 - Unidentified Speaker
Okay, well, continue. Zooming out, this whole process is what you would describe as a single head of attention. As I've described things so far, this process is parametrized by three distinct matrices, all filled with tunable parameters, the key, the query, and the value. I want to take a moment to continue what we started in the last chapter with a scorekeeping where we count up the total number of model parameters using the numbers from GPT-3. These key and query matrices each have 12,288 columns, matching the embedding dimension, and 128 rows, matching the dimension of that smaller key query space. This gives us an additional 1.5 million or so parameters for each one. If you look at that value matrix by contrast, The way I've described things so far would suggest that it's a square matrix that has 12,288 columns and 12,288 rows, since both its inputs and its outputs live in this very large embedding space. If true, that would mean about 150 million added parameters. And to be clear, you could do that. You could devote orders of magnitude more parameters to the value map than to the key and query. But in practice, it is much more efficient if instead you make it so that the number of parameters devoted to this value map is the same as the number devoted to the key in the query. This is especially relevant in the setting of running multiple attention heads in parallel. The way this looks is that the value map is factored as a product of two smaller matrices. Conceptually, I would still encourage you to think about the overall linear map, one with inputs and outputs, both in this larger embedding space. For example, taking the embedding of blue to this blueness direction that you would add to nouns. It's just that it's broken up into two separate steps. The first matrix on the right here has a smaller number of rows, typically the same size as the key query space. What this means is you can think of it as mapping the large embedding vectors down to a much smaller space. This is not the conventional naming, but I'm going to call this the value down matrix. The second matrix maps from the smaller space back up to the embedding space, producing the vectors that you use to make the actual updates. I'm going to call this one up matrix, which again is not conventional. The way that you would see this written in most papers looks a little different. I'll talk about it in a minute. In my opinion, it tends to make things a little more conceptually confusing. To throw in linear algebra jargon here, what we're basically doing is constraining the overall value map to be a low-rank transformation. Turning back to the parameter count, all four of these matrices have the same size, and adding them all up, we get about 6.3 million parameters for one attention head. Thoughts or questions? All right, continue. As a quick side note, to be a little more accurate, everything described so far is what people would call a self-attention head, to distinguish it from a variation that comes up in other models that's called cross-attention. This isn't relevant to our GPT example, but if you're curious, cross-attention involves models that process two distinct types of data, like text in one language and text in another language that's part of an ongoing generation of a translation, or maybe audio input of speech and an ongoing transcription. A cross-attention head looks almost identical. The only difference is that the key and query maps act on different data sets. In a model doing translation, for example, the keys might come from one language while the queries come from another, and the attention pattern could describe which words from one language correspond to which words in another language.

50:41 - D. B.
Any thoughts or questions about this?

50:44 - E. G.
I do. Go for it. In the attention heads, it said it was running them in parallel. But if you're passing information from one to the other, how does the other one know to operate on something when it doesn't have that information yet? How could it be parallel?

51:10 - D. B.
Hmm. That's the essence of multi-headed, multi-headed.

51:15 - Unidentified Speaker
Yeah. I can't answer that question.

51:19 - D. D.
And it's a good question, E. G. Is it? Yeah. How can it be parallel except because it can't operate on that? Maybe it adjusts. It takes what it can get and then adjusts. I think that's where I end up doing some research on.

51:46 - Unidentified Speaker
Yeah.

51:46 - D. D.
All right.

51:47 - D. B.
Well, I guess that'll do for today, and we'll just start from there in a couple of weeks. Or if there's time next week, we can do that. But meanwhile, we are going to hear from H. next week. And B. H. the following week, and we'll see you next time.

52:15 - Unidentified Speaker
Thanks everyone.

52:17 - D. D.
Bye guys.

52:19 - Unidentified Speaker
Bye everyone.

Displaying jan10FinishedTranscript.txt.

AIn't What It Used to Be

Friday, February 7, 2025

2/7/25: Deepseek discussion + standard discussions

No comments:

Post a Comment