Machine Learning Study Group
|
Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu
Agenda & Minutes (138th meeting, Nov. 15, 2024)
Table of Contents
* Agenda and minutes
* Transcript (if available)
Agenda and minutes
- Announcements, updates, questions, etc.?
- The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI
PedaPedagogy gogy in the Curriculum." IU is on it and may be able to provide an update when available.
- A
demo of the real time use of AI to create the Doppler effect
interactive animation and perhaps other demos will be scheduled as soon as
convenient for RM and VW. Next week we may likely have a demo of this process.
- Here is a tool the library is providing. Anyone try it? Should we try it together in a meeting? (Yes, some attendees thought it would be a good idea.) Thoughts?
Library trial of AI-driven product Primo Research AssistantHello,Feel free to reach out with any questions or concerns.
The library is testing use of Primo Research Assistant, a generative AI-powered feature of Primo, the library's search tool. Primo Research Assistant takes natural-language queries and chooses academic resources from the library search to produce a brief answer summary and list of relevant resources. This video provides further detail about how the Assistant works.
You can access Primo Research Assistant directly here, or, if you click "Search" below the search box on the library home page, you will see blue buttons for Research Assistant on the top navigation bar and far right of the Primo page that opens. You will be prompted to log in using your UALR credentials in order to use the Research Assistant.
We value your feedback on this and other generative-AI resources. Please try it out, share it with anyone who might be interested, and let us know your thoughts here: Feedback form
--
Bonnie Bennet | Discovery and Systems Manager
University of Arkansas at Little Rock | Ottenheimer Library
501.916.6563 | bbennet@ualr.edu | https://ualr.edu/library
- Here is another event. Anyone go? No, but there was a lot of discussion about the abstract below.
Harness the Power of Generative AI for Good, not Evil Register now for the Ark-AHEAD Fall (Virtual) Workshop: Helping Students (and Faculty) Harness the Power of Generative AI for Good, not Evil
Presenter: Liz McCarron, EdD, MBA, ACC, CALCWebinar Nov 14, 20249:30-NoonStudents quickly adopted Generative AI, but faculty have been slower to get on board. Worried about cheating, many schools banned the technology. But this can hurt neurodiverse students who have adopted GenAI at a higher rate than neurotypical peers. This session will help beginners learn what GenAI is and what it is not, what it can do and what it can’t. Attendees will gain a basic understanding of how ChatGPT works and its key features,capabilities, and limitations. Attendees will also experience creating and refining prompts. We will discuss the ethical implications of using GenAI and how to create assignments that help students use GenAI responsibly. Join us and get inspired to experiment with GenAI to help your students and yourself. - Suppose a generative AI like ChatGPT or Claude.ai was used to write a book about a simply stated task, like "how to scramble an egg," "how to plant and care for a persimmon tree," "how to check and change the oil in your car," or any other question like that. Just ask the AI to provide a step by step guide, then ask it to expand on each step with substeps, then ask it to expand on each substep, continuing until you reached 100,000 words or whatever impressive target one might have.
Would this work, would the result be alright, or garbage, or what would it be like? Would it be reasonable to have a master's student do this as a master's project to see what happens? Should the master's student come to these meetings and provide weekly updates and get suggestions from all of us? Yes.
- Can someone send me read.ai anonymized transcripts to include in the minutes? Yes, DD has kindly volunteered to do that.
- Anything else anyone would like to bring up?
- Here are the latest on readings and viewings
- Next
we will work through chapter 5:
https://www.youtube.com/watch?v=wjZofJX0v4M. We got up 15:50 but it has
been awhile so we started from the beginning and went to 15:50 again. Next time we do this video, we will go on from there. (When sharing the screen,
we need to click the option to optimize for sharing a video.)
- We can work through chapter 6: https://www.youtube.com/watch?v=eMlx5fFNoYc
- We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
- Computer scientists win Nobel prize in physics! Https://www.nobelprize.org/uploads/2024/10/popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
- We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
- Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
- Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
- Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
- Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
- LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt).
6/7/24: vote was 4 3/7. We read the abstract. We could start it any
time. We could even spend some time on this and some time on something
else in the same meeting.
Transcript:
Fri, Nov 15, 2024 1:35 - Unidentified Speaker Well, R. M. gave an excellent talk yesterday at the Baptist health college to the amateur radio emergency services group. 1:41 - V. W. And I credit the fact that she did so well with the dress rehearsal that you guys provided her here in the ML seminar. So thank you for that. And, uh, I wrote a recap of it that I'll send you. If you'd like to see it, it's a couple of pages. 1:57 - D. B. Well, congratulations to her. Well, out. 1:59 - V. W. There was enough technical discussion at the end from all the radio experts at the meeting that there was the just the proper amount of blood on the floor of, you know, defending dissertation proposals and defenses that will come down the road, but done in a very, say, home field advantage situation. 2:19 - Unidentified Speaker So, that's good. 2:39 - M. M. Are they in the distance? 3:46 - D. B. Well, I guess we should get started, even though some more people are signing in. 4:08 - D. B. Let me share my screen. 4:16 - Unidentified Speaker Wrong screen. 4:34 - Unidentified Speaker So. Let's see where we are. 4:38 - D. B. So last time I. T. told us about this committee she's on as a representative of our STEM college, and I thought, you know, anytime she's here, I'll ask you for an update, but she's not here, so we won't get any update today. We're hoping for a demo of real-time use of AI to do things like creating the Doppler effect interactive animation we saw last time. Our next week is the target for to have some kind of little short presentation put together. Okay. 5:23 - Unidentified Speaker So next. Dr. 5:29 - V. W. B., I sent you the revised transcript. 5:43 - D. D. Oh, great. 5:48 - D. B. I'm going to go ahead and let the read.ai join the meetings as long as somebody will send me the anonymized transcripts. And by anonymized, I mean replace names with initials or something like that. And the easy way to do that is to just paste the transcript into ChatGBT and say, please replace names with initials, something like that. 6:37 - D. D. It's semi-easy. I had to keep telling it to go past the token limit. Use the token limit twice. Huh. 6:46 - D. B. Try Claude.ai because Claude.ai at least used to have a long, you know, that was its sort of virtue was that it would. 6:57 - D. D. Let me see how I tried first though. 7:01 - D. B. Okay. You can also go to meta.ai. 7:04 - D. D. It allows you to use the Clima 3 405b which has 128 I tried Gemini and it just over and over again would just stop in the middle and say, and the text continues on. Even Gemini Advanced is trash compared to these other LLMs. 7:24 - Multiple Speakers And I may just not subscribe to it anymore because I'm not sure it's worth it. 7:30 - V. W. I get from a 20 bucks a month in Poe.com, I get access to all these bots, including the ones mentioned here today, and I don't And so I'm not sure Google Gemini advanced is worth the money. 7:45 - J. K. What are you using? The only. 7:47 - V. W. Yeah, podoc podoc com right now is the deal on wheels. 7:52 - E. G. Because I sign up for each one individually, like for chat GPT. 7:57 - V. W. Well, that'd be so expensive. 7:59 - J. K. I'd be out of money. The only real like innovative edge that I can see to Gemini is someone's approached me recently about uh, auto transcription, like they'll upload a video and that, that aspect of Gemini is really impressive. That's true. Yeah. If you have to do anything with video, uh, Gemini, Gemini is the one to go to. But other, other than that, I, I agree with you, Vince. 8:25 - V. W. You know, also I thought that the fact that Gemini has up to the date web, it's more continuous training than the others, which are lump sum training. So there's an advantage to, Gemini if you need current information? 8:39 - D. D. Well, the other aspect is the API. I don't think you can get access to the API through Poe. I think you just get the chat program. But if you really want to get in there where you can access it with Python, I think there's other ways to access it. And I think currently you have access to the system prompt now in chat, but the API is such a powerful tool. I mean, I want to- It also depends on what kind of working, like for you and E., that API access might be essential. 9:20 - V. W. Yes. But for people who are programming using the LLM like myself or Read, we can come into the LLM and just get work done all day long. Yeah, I've seen that. 9:32 - D. D. tokens and permissions and all that. 9:34 - V. W. Yeah. It's amazing what you're doing. Yeah. 9:37 - D. B. Well, um, D., if, if it's not, if it doesn't want to anonymize the entire transcript, you can anonymize it, you know, in two parts or three parts or something. That's what I'm doing. That's right. 9:49 - D. D. That's, that's, that's my next plan, but I got one, I got this, this week's done and I'll just keep, I'll just keep doing it. 9:58 - D. B. Okay. Happy to contribute. Well, I appreciate it. Read.AI is involved in this meeting, and I'll just keep it going as long as we get the transcripts that I can post. What else? Okay, well, there's a... E.? 10:16 - E. G. Yeah, I sent you a message. Is it po.com? 10:20 - Multiple Speakers Poe.com, like Edgar Allen. 10:22 - E. G. Ah. Okay. All right, so the library is providing a tool Primo research assistant, anyone? 10:30 - D. B. Well, I'll let you read all this and then we'll see if there's any thoughts about it. 10:41 - V. W. And the reason for that response is Read just finished a 12 section broad literature survey on 10 specific aspects of her dissertation. And it's just the results are unbelievable. I've even cataloged the statistics on the use of them. And so, yeah, I would say the library should save its money because everybody can access the stuff nearly for free. So Primo research assistant is not good, is what you're saying? I'm saying it may represent overhead for the library that doesn't benefit the institution, although perhaps for people who are in a loop of already going to the library first for all their literature searches, maybe that would be a good access for them. But it seems redundant to me, given the network, the internet, and the personal computer. 11:41 - D. D. I mean, Google is still a very powerful search engine, and it will return an adequate number of results for, especially Google, Yeah. I mean, I get a lot from Google Scholar, but I haven't tried it. It might be a really good tool. I think we should test it. Yeah. 12:01 - A. B. And I know too, like I've gone to the library for certain articles. Cause like, um, if it's, you know, something like a certain, certain, you know, you need a, um, what do you call it? Like a, a lot, a login or something, institutional login. Like sometimes those are hidden behind some of these, uh, like Abisko, like the, I can't remember like the databases and so forth. I don't know. If they use Preem Research, if it would connect through the library that way and give you access to more data. 12:30 - Multiple Speakers That's true, too. 12:31 - D. B. Well, should we try this thing in one of our meetings, like in real time? 12:37 - D. D. Yeah, that's a good idea. Yeah, I would vote for that. Any other thoughts on that? 12:44 - A. B. I give it a 3.9. 12:46 - Unidentified Speaker OK. Yes, I agree. 12:47 - M. M. It's a good idea. Library also you can ask for loan or copy. They have in the webpage actually the form how you can request the article that you cannot access. But let's try this. Yeah. All right. Well, I'll schedule it soon, maybe even next time or the time after. 13:13 - D. B. As soon as I can schedule it. We'll do that. OK. There was an event. I guess it was yesterday. Today is the 15th. This was the event. Anyone go or? Any thoughts on it? I did not go. Here's the blurb about it. 13:47 - M. M. Can you tell us about this event, please? 13:51 - D. B. Yeah, I don't know anything about it except that I was wondering if anybody went to it and could tell us. No. 14:02 - M. M. I'm reading the description right now. Who is talking? It was the speaker over there. 14:12 - M. M. So many experts in the area. 14:20 - Unidentified Speaker It's good. 14:22 - D. B. My impression of this is, this is sort of the stuff we've been talking for a couple of years now already, and I'm not sure. You have to look at the content, have to know what the content is to know whether it was anything, they presented anything that I, you know, that I didn't already know. 14:56 - V. W. An issue that comes up for us is, does the part of the university that makes sure that people with disabilities have the correct accommodations, are they enabling the use of AI as accommodations for the neurodiverse, but then the neurotypical could come along and say, well, they're getting an unfair advantage over me by the ability to freely use AI without censorship. And then it goes to court and goes way up and it's ruled, okay, everybody can use it, leave us alone. So, you know, you can run the whole thing out in your head and see where it's going to go. 15:32 - D. B. It's a pretty strong claim that neurodiverse students are using it more. If it's true, I wonder what's going on. 15:41 - J. K. I think it's an excellent learning tool. 15:45 - V. W. I mentor a neurodiverse student and I have firsthand witnessed the fact that it levels the playing field for rapid learning. 15:58 - J. K. Yeah. I attended one of the university disability resource center presentations on just technology accessibility tech and I asked in that meeting how they view chat GPT and generative AI and they were a little skeptical like I think I think they're still trying to kind of like V. mentioned like navigate out to recommend these tools I did I sent them a prompt that is meant for students with dyslexia where basically I just I trained the AI to be aware of the rules that make words difficult to read for people with dyslexia, and then the user can upload a body of text that's standard, and according to those rules, it will retain the meaning but completely change the wording to avoid those challenges. So they have that, but I know from being in that talk that they're still just kind of feeling out exactly how to endorse generative AI for students with learning disabilities and stuff like that. That is fantastic, J., that you wrote a tool that specifically does that. 17:14 - Multiple Speakers And offline, I'd be interested in learning the tricks that you use to create such a thing. 17:21 - V. W. That's a novel thing that should be just living on a server and made available to any Euler student or anywhere. 17:29 - J. K. Yeah. Well, it's an application of the tool that plays to all its strengths. I mean, there's no reason, again, it's context awareness, make sure that no matter what the verbiage or phrasing or sentence length is, that a student can understand those things. And I'd even included small rules to say, like, if the user indicates that certain words or certain phrases are difficult, that it would accommodate those, just because, I mean, there's a pretty broad spectrum of dyslexia, but I will work to send that to you guys later today. It was a really fun project, and I think it's one that has a lot of merit just for people who need it. 18:14 - V. W. And you know, not only for text output, but when the output can be spoken back to the student or read back aloud instead of requiring the student to necessarily read it themselves, that can also circumvent certain disabilities that people carry. And so the fact that you can give your questions and receive your questions as spoken words, as opposed to text is a advantage for the neurodiverse. 18:41 - J. K. Yeah. I also think, I mean, I had a discussion not too long ago about how large language models will affect language acquisition in general. Um, and I was, I was focused on kids and the, the AI, just based on the tools we were discussing, seemed to think that it could cut down the time it takes to acquire language by, I think it was 40% or something, something ridiculous like that. Which if you think about it, I mean, it's just capable of rewording or making minute adjustments based on the learner. And so we're just going to get to the point where for most of these basic skills, an AI is so much more capable of speeding up learning and really helping people close gaps. It's going to be astounding, I think. 19:41 - Unidentified Speaker Correct. 19:41 - Multiple Speakers I'm using Grammarly turned on. Let me just have one second, V. 19:48 - M. M. And J., also for people for second language learning. 19:53 - J. K. Yes, absolutely. 19:54 - M. M. And we have so many requests for kids learning foreign language or any. 20:01 - Multiple Speakers My daughter actually is working in this area and I have a lot of stuff in learning foreign languages. Yeah, absolutely. 20:11 - J. K. We talk a lot about AI as an equalizer or just as a remover of barriers. And I think there's so many brilliant minds that just because English is the de facto for a lot of academia and stuff like that, I mean, it's going to really change the landscape of how people can collaborate regardless of what language they speak or what level they speak at. 20:41 - V. W. It'll democratize academic participation to a larger cross-section of the world. 20:45 - J. K. Absolutely, yeah. 20:46 - V. W. You know, I have wondered to myself whether in 10 years, of AI, I will be a babbling caveman saying, me want milk, or I would be more erudite and better able to express myself more fluently. I have wondered what impact it will have on me. But I know with Grammarly, over the last couple of years of using it to just correct typos, rewrite little idioms that aren't very portable, it's really improved my writing to the point that I make those mistakes, unquote, of my own spoken voice less frequently while still retaining the spoken voice of me. And so I'm thinking that this tool is going to elevate us and not de-evolve us or primitivize us. 21:37 - J. K. The other thing I would mention about just interlanguage conversation or communication is that Google Translate has always been a thing. Or this hasn't always been a thing, but it's predated a lot of this tech. And I've found, again, by using chain prompting and multi-agent, if you need something translated into a target language, it helps to have a two-step process where you provide the piece of text and say, please, while retaining the meaning of this, please remove any language that would make it difficult to translate into Spanish, Chinese, like, like, it's, that's, that's one of the things that it excels at is just ironing out anything free translation, getting rid of any, anything that doesn't translate well. 22:31 - D. B. So that's, that's been very successful for me. Any, did you have something that you wanted to mention? Somebody did. 22:45 - D. B. I thought that was V. 22:47 - E. G. Well, she's not here. 22:48 - V. W. Her mic may be off. No, she's not on mute. 22:53 - M. M. No, I'm here. I'm here. Just the camera is not working here. I don't know why. Oh. Yeah, I like mentioned the foreign languages. 23:03 - V. W. Yes, it's exactly what we're talking about. 23:05 - D. B. That's fantastic. I've asked ChatGPT to talk to me, but in its response, replaced the most common with the translations in another language, Spanish or something. And if with a little pushing, it will do it. It doesn't do a great job, but it will do it. And you can, you know, you sort of like can kind of tune the amount of translation to, you know, to what you're capable of comprehending without turning it and turning your light reading into a heavy duty study session. So he reads and studies it. 23:39 - V. W. and perfect Spanglish. Yeah. 23:42 - E. G. Um, I'll speaking as a neurodivergent person who's enjoyed the benefits of it with apparently very little of the drawback. One of the best ways of analyzing chat GPT from a neurodivergent perspective is we look at patterns, patterns that work patterns that don't. We don't regurgitate information well. We have to understand what's going on. So yeah, we're a lagging adopter of chat GPT. The use of chat GPT or some other large language models, does that make us better? Truthfully, no. I think a lot of times coming into it, understanding the foundational pieces, Because as we look at things, because when we look at things, you're able to look at one space, at a two-dimensional model. We're looking at the underlying piece. I can't understand something unless I know the underlying concepts involved. So when I look at something like chat GPT, I didn't use it until maybe a year, year and a half ago. Because I had to spend time learning all of the pieces that went into it to really understand how to approach it. 25:10 - J.K. I would, I mean, I, I'm also neurodivergent and I use ChatGPT a lot. And I would just say, I mean, there's this universal idea that if you design for, um, the outliers, if you design, design anything, whether it's just physical access, accessibility or digital accessibility, it's the best design out there. I mean, I would say neurodivergent or neurotypical, we all have, we can only process so much information. And so I would just say, I believe this statement that neurodiverse students are using it at a higher rate just because we have to. We have to be tool learners. But I would just add that we're ahead of the curve in a lot of ways. I don't think there is an advantage for neurodivergent students that doesn't exist for neurotypical people at the end of the day. Now, don't get me wrong. 26:13 - E.G. Like V., Grammarly has been my savior because as I go to present information, I tend to be very succinct. 26:25 - Unidentified Speaker And I will assume a lot of different pieces are in place, but grammarly actually forces me into describing. 26:32 - E.G. So that way they follow the continuity of my thought because the way I present it is we go from A to B, B to C, C to D. A lot of times I'll go from A to D and skip the B and C because that's not part of the equation that I'm trying to I'm trying to get us here. But I need to sometimes draw a map kind of like the old days of MapQuest. Take this turn at this point, printed out all I had. 27:08 - V.W. That's what Grammarly does for me, right? 27:11 - D.B. Alright, well. Let's see where we are. 27:14 - Unidentified Speaker I see I.T. welcome. I thought if you had any updates on the campus effort. 27:21 - D.B. that you introduced us to last time. I think we'd be glad to hear about it. 27:35 - Unidentified Speaker He's still muted, you're still muted. 27:41 - D.B. I don't know, maybe she's not even really there. Anyway, what I thought, you know, with something like this, as long as somebody involved is here, they could keep us updated. I have a question for you all. Supposing a generative AI, let's say ChatGPT, was used to write a book in the following way. You ask it a simple question, how to plant and care for a persimmon tree, how to check the oil in your car, and please give step-by-step instructions. Well, it'll do it, but it'll give you 10 steps to scrambling an egg or checking the oil in your car or planting a persimmon tree. And then you can just ask it to expand on each step with sub-steps, and it'll do that. And you could then ask it for each of these sub-steps, can you expand it in much more detail, look at all those alternatives and all the possibilities and it'll do it. And you could, I mean, what would happen if you continued until you got to a hundred thousand words? 28:53 - Multiple Speakers There's actually, there's a term for that now that's emerging. And the term emergent term is AI slop. 28:58 - V.W. Now that said on the extreme side, if you naively go in and ask ChatGPT one zero, how to check and change the oil in your car, you're going to get 10 steps that if you follow them, your oil will be changed and you probably wouldn't have done any lab casting damage to your car, the threads on the filter or anything like that. So I would say that that would be how I would delimit it. 29:23 - D.B. So my question is, what would happen? Would it be a result that would be useful or would something go wrong? And if it went wrong, what would go wrong? Anyway, so my thought was, would it be reasonable to have a master's student do that as a master's project what would happen. And in the process... And the title could be on the creation of AI slop in mass. 29:48 - V.W. I mean, the question is, with proper prompting, would it come up with something good or not? 29:54 - D.B. I don't know. Well, then you have the thing that just how many steps do you need? 30:00 - V.W. Because someone who's a car mechanic type is not going to need as lucid or detailed an explanation as someone who's never changed the oil on anything before. So you have a point in that you need more sub steps as you need as your familiarity with the landscape of that is decreased. So there's a contextualization for any user doing any task from spinning CD wafers to changing the oil on their car. There's a set of steps that you can enumerate that a person who simply can follow the instructions could accomplish it. All we're negotiating is the number of things. And so if anybody tries to work out of their domain, we all have our domains at which we're facile. But if trying to work out of domain, we need instructions. Example, the printer on my, I have this Epson printer that has endless EcoTank ink. I never have to buy ink cartridges again. And it's been the best thing since sliced bread until after a year of faithful service, the printhead quit working. And so I chatted and said, oh, you don't have to buy a new printer. You just have to dive deep into in here and change this one thing. And then I did that and I got the answer. And with with the assistance also of a Jeep, of a YouTube video. So even I, although I'm comfortable with mechanical things, I didn't know if you disassemble the printer in the wrong order, not only will you not get it back together, but you'll probably break something expensive in the process. So so I see this sort of like, are we in band or out of band for the task being specified? 31:33 - A.B. But again, these things are more, I don't know, I think, I feel like often they're more optimized to give you an answer that sounds right versus is right. And like, you know, if you go back and forth with them, you can convince them that it could give you a right answer. And I feel like with enough prompting, you can kind of convince it that it's, you know, it's, it's, you can say, Oh, that's wrong. And then eventually you can kind of like, you know, arm wrestle it into giving, giving the answer that you want. 31:58 - V.W. That comes up a lot when you're trying to write a program with specific objectives and you ask it to do it and it simply stops complying because about 30 shots in, it loses context and can no longer remember what it was doing like a person with a senility or dementia. So you have to kind of coax it along. So sometimes you have to take the entire previous transcript, load it into another LLM and get it to pick up where you left off or figure out how to factor your task. But if your interfaces are complex in a programming that it was doing for you, you have to manually re-articulate all those interfaces and the leverage that you were getting from the AI disappears. So for simple tasks, scrambling an egg, persimmon tree and changing oil, it works, but asking it to write a complex program that we'll talk about next week, hopefully, there can be definite gotchas because we have these zero and three shot successes that R. benefited from And then we also have these 30 shot epic failures that never gave us the answer. 32:56 - J.K. we wanted so I've actually experimented with kind of composition on longer form longer form writings and most people I mean you you'll see a lot of AI slop in the Kindle market these days and you can read it and pretty quickly identify what is what is strictly AI generated my personal theory is is that so whenever I mean, whenever an author is writing a piece of fiction, like, we don't really talk about it very much. But you're kind of using different mental processes to write dialogue than you use to write internal dialogue or description. And so to, to get a quality human sounding piece of prose, I've found instead of focusing on chapter by chapter, you do scene by scene, and you generate one layer at a time. I mean, we come back to chain prompting and decomposing tasks and things like that. But I like to have it generate dialogue between two characters, then the internal dialogue as they're saying those things, and then finally the exposition or narration. And when you layer it like that, you're, again, replicating the fact that a human author is having to shift gears in order to write those different components. So I think what you're describing where it's like, again, you're, you're gradually expanding or adding complexity to this writing, it's entirely possible, it's just kind of counterintuitive, because you have to go in and approach it and say, how are each little sub? How is each sub category of this content? Different and how are the rules, how do you apply the rules so it comes together as something that reads naturally. 34:54 - V.W. And one problem with that is it knows those rules if you bother to specify them in the prompt, but then you run out of your token or GPU allocation that's going to allow it to retain that biggest state in memory from which it could write the correct solution to your problem with its existing training. A lot of times you're hitting up against a gas pump limit instead the limit of the training of the model. 35:18 - J.K. Well, and that's the beauty of going scene by scene, too, is just that you're never, if you optimize for the smallest token window or whatever is the smallest subcomponent of whatever you're trying to write, you're good. You're golden. It's going to be pristine, high quality stuff every time. It's just a little more complicated. I mean, we want it to be a shortcut where we put in a prompt and it spits out a book. But the fact is just that unless you optimize the process and say, I want the highest quality scene and I'm going to generate a dozen scenes for a chapter, you're going to get this very vanilla, odd-sounding fiction, unfortunately. 36:06 - V.W. It reminds me a little bit of digital image compositing or even audio compositing where with your notion of layers you have the foreground you know the two actors the over the shoulders camera shots and just the close-up of their conversation you also have the background wherever they are in location and then you have the middle ground of the extras and so forth that are circulating to make the scene appear real I think the layering is a really good idea that not only applies to creation of novel content as in not new novels but novel novels. And yeah, that's all. 36:43 - Multiple Speakers I mean, well, anybody else who hasn't had a chance to contribute to this question? 36:48 - D.B. Have anything they want to add? All right, well, I could look for I mean, I sure I could find a master's student who's willing to do it. If I did find a master's student like that, and they were willing to come to these meetings and provide us weekly updates and take our suggestions, Should I do it? 37:11 - A.B. Yeah, I think it's really interesting. 37:14 - V.W. Yeah, it is. It'd be good for them too because they could distinguish slop from craft. 37:22 - Unidentified Speaker Okay. 37:22 - D.B. Any other comments on that? I'll put a big yes if not. All right. Well, I'll see if I can find somebody and then they'll have to be in to make these meetings or they won't be a suitable candidate for the project. That's their main qualification is they must be free on Fridays at 4 p.m. 37:47 - V.W. Well, it could be seminar credit too, right? 37:50 - D.B. No, the seminar is a different time actually, which is good because then they'd be more likely to be free at 4. 37:58 - V.W. But I mean, this ML seminar has bordered on something that I have invited students of mine to come to and participate participate in, I think that this is becoming, and it has moved from the informal seminar to actually a thing that we look forward to each week of being able to dive deep into the current milieu of AI. 38:19 - Multiple Speakers I think attending these meetings could be done for some kind of credit. 38:23 - D.B. I think it will be possible. I haven't looked into it. It deserves that, I think. All right. Let's see. What else? 38:32 - Unidentified Speaker All right, well, we got about 10 minutes left. 38:40 - D.B. I could start, we could go back and review that video. Let me do that. 38:53 - Unidentified Speaker All right, I probably need to and re-share with optimizing for video. 39:05 - D.B. So let me see if I can do that. 39:14 - Unidentified Speaker OK. Share. Optimize for video. Sharing. 39:20 - D.B. Okay. Let me start this thing going. 39:25 - Unidentified Speaker The initials GPT stand for Generative Pre-trained Transformer. Is that coming through okay? Yes. 39:35 - D.B. Yes. All right. So that first word is straightforward enough. These are bots that generate new text. 39:46 - Unidentified Speaker Pre-trained refers to how the model went through a process of learning from a massive amount of data, and the prefix insinuates that there's more room to fine-tune it on specific tasks with additional training. But the last word, that's the real key piece. A transformer is a specific kind of neural network, a machine learning model, and it's the core invention underlying the current boom in AI. What I want to do with this video and the following chapters is go through a visually driven explanation for what actually happens inside a transformer, we're going to follow the data that flows through it and go step by step. There are many different kinds of models that you can build using transformers. Some models take in audio and produce a transcript. This sentence comes from a model going the other way around, producing synthetic speech just from text. All those tools that took the world by storm in 2022 like Dolly and Midjourney that take in a text description and produce an image are based on transformers. Even if I can't quite to understand what a pie creature is supposed to be, I'm still blown away that this kind of thing is even remotely possible. And the original transformer, introduced in 2017 by Google, was invented for the specific use case of translating text from one language into another. But the variant that you and I will focus on, which is the type that underlies tools like ChatGPT, will be a model that's trained to take in a piece of text, maybe even with some surrounding images or sound accompanying it, and produce a prediction for what comes next in the passage. That prediction takes the form of a probability distribution over many different chunks of text that might follow. At first glance, you might think that predicting the next word feels like a very different goal from generating new text. But once you have a prediction model like this, a simple thing you could try to make it generate a longer piece of text is to give it an initial snippet to work with, have it take a random sample from the distribution it just generated, append that sample to the text, and then run the whole process again to make a new prediction based on all the new text, including what it just added. I don't know about you, but it really doesn't feel like this should actually work. In this animation, for example, I'm running GPT-2 on my laptop and having it repeatedly predict and sample the next chunk of text to generate a story based on the seed text. And the story just doesn't actually really make that much sense. But if I swap it out for API calls to GPT-3 instead, which is the same basic model just much bigger, suddenly, almost magically, we do get a sensible story, one that even seems to infer that a pi creature would live in a land of math and computation. This process here of repeated prediction and sampling is essentially what's happening when you interact with ChatGPT or any of these other large language models, and you see them producing one word at a time. In fact, one feature that I would very much enjoy is the ability to see the underlying distribution for each word that it chooses. Let's kick things off with a very high-level preview of how data flows through a transformer. We will spend much more time motivating and interpreting and expanding on the details of each step, but in broad strokes. When one of these chatbots generates a given word, here's what's going on under the hood. First, the input is broken up into a bunch of little pieces. These pieces are called tokens, and in the case of text, these tend to be word or little pieces of words or other common character combinations. If images or sound are involved, then tokens could be little patches of that image or little chunks of that sound. Each one of these tokens is then associated with a vector, meaning some list of numbers, which is meant to somehow encode the meaning of that piece. If you think of these vectors as giving coordinates in some very high-dimensional space, words with similar meanings tend to land on vectors that are close to each other in that space. This sequence of vectors then passes through an operation that's known as an attention block, and this allows the vectors to talk to each other and pass information back and forth to update their values. For example, the meaning of the word model in the phrase a machine learning model is different from its meaning in the phrase a fashion model. The attention block is what's responsible for figuring out which words in the context are relevant to updating the meanings of which other words, and how exactly those meanings should be updated. And again, whenever I use the word meaning, this is somehow entirely encoded in the entries of those vectors. After that, these vectors pass through a different kind of operation and, depending on the source that you're reading, this will be referred to as a multi-layer perceptron or maybe a feed-forward layer, and here the vectors don't talk to each other, they all go through the same operation in parallel. And while this block is a little bit harder to interpret, later on we'll talk about how this step is a little bit like asking a long list of questions about each vector and then updating them based on the answers to those questions. All of the operations in both of these blocks look like a giant pile of matrix multiplications, and our primary job is going to be to understand how to read the underlying matrices. I'm glossing over some details about some normalization steps that happen in between, but this is after all a high-level preview. After that, the process essentially repeats. You go back and forth between attention blocks and multi-layered perceptron blocks, until at the very end, the hope is that all of the essential meaning of the passage has somehow been baked into the very last vector in the sequence. We then perform a certain operation on that last vector that produces a probability distribution over all possible tokens, all possible little chunks of text, that might come next. And like I said, once you have a tool that predicts what comes next given a snippet of text, you can feed it a little bit of seed text and have it repeatedly play this game of predicting what comes next. Sampling from the distribution, appending it, and then repeating over and over. Some of you in the know may remember how long before ChatGPT came into the scene, this is what early demos of GPT-3 looked like, you would have it autocomplete stories and essays based on an initial snippet. To make a tool like this into a chatbot, the easiest starting point is to have a little bit of text that establishes the setting of a user interacting with a helpful AI assistant, what you would call the system prompt, and then you would use the user's initial question or prompt as the first bit of dialogue, and then you have it start predicting what such a helpful AI assistant would say in response. There is more to say about an added step of training that's required to make this work well, but at a high level, this is the general idea. In this chapter, you and I are going to expand on the details of what happens at the very beginning of the network, at the very end of the network, and I also want to spend a lot of time reviewing some important bits of background knowledge, things that would have been second nature to any machine learning engineer by the time transformers came around. If you're comfortable with that background knowledge and a little impatient, you can probably feel free to skip to the next chapter, which is going to focus on the attention blocks, generally considered the heart of the transformer. After that, I want to talk more about these multi-layer perceptron blocks, how training works, and a number of other details that will have been skipped up to that point. For broader context, these videos are additions to a mini-series about deep learning. And it's okay if you haven't watched the previous ones, I think you can do it out of order. But before diving into transformers specifically, I do think it's worth making sure that we're on the same page about the basic premise and structure of deep learning. At the risk of stating the obvious, this is one approach to machine learning, which describes any model where you're using data to somehow determine how a model behaves. What I mean by that is, let's say you want a function that takes in an image and it produces a label describing or our example of predicting the next word given a passage of text, or any other task that seems to require some element of intuition and pattern recognition. We almost take this for granted these days, but the idea with machine learning is that rather than trying to explicitly define a procedure for how to do that task in code, which is what people would have done in the earliest days of AI, instead you set up a very flexible structure with tunable parameters, like a bunch of knobs and dials, and then somehow you use many examples of what the output should look like for a given input to tweak and tune the values of those parameters to mimic this behavior. For example, maybe the simplest form of machine learning is linear regression, where your inputs and your outputs are each single numbers, something like the square footage of a house and its price. And what you want is to find a line of best fit through this data, you know, to predict future house prices. That line is described by two continuous parameters, say the slope and the y-intercept, and the goal of linear regression is to determine those parameters to closely match the data. Needless to say, deep learning models get much more complicated. GPT-3, for example, has not two, but 175 billion parameters. But here's the thing, it's not a given that you can create some giant model with a huge number of parameters without it either grossly overfitting the training data, or being completely intractable to train. Deep learning describes a class of models that in the last couple decades have proven to scale remarkably well. What unifies them is that they all use the same training algorithm. It's called backpropagation. We talked about it in previous chapters. And the context that I want you to have as we go in is that in order for this training algorithm to work well at scale, these models have to follow a certain specific format. And if you know this format going in, it helps to explain many of the choices for how a transformer process is which otherwise run the risk of feeling kind of arbitrary. First, whatever kind of model you're making, the input has to be formatted as an array of real numbers. This could simply mean a list of numbers, it could be a two-dimensional array, or very often you deal with higher dimensional arrays, where the general term used is tensor. You often think of that input data as being progressively transformed into many distinct layers, where again, each layer is always structured as some kind of array of real numbers until you get to a final layer which you consider the output. For example the final layer in our text processing model is a list of numbers representing the probability distribution for all possible next tokens. In deep learning, these model parameters are almost always referred to as weights and this is because a key feature of these models is that the only way these parameters interact with the data being processed is through weighted sums. You also sprinkle some nonlinear functions throughout but they won't depend on parameters. Typically though, instead of seeing the weighted sums all naked and written out explicitly like this, you'll instead find them packaged together as various components in a matrix-vector product. It amounts to saying the same thing if you think back to how matrix-vector multiplication works, each component in the output looks like a weighted sum. It's just often conceptually cleaner for you and me to think about matrices that are filled with tunable parameters that transform the vectors that are drawn from the data being processed. For example, those 175 billion weights in GPT-3 are organized into just under 28,000 distinct matrices. Those matrices in turn fall into eight different categories, and what you and I are going to do is step through each one of those categories to understand what that type does. As we go through, I think it's kind of fun to reference the specific numbers from GPT-3 to count up exactly where those 175 billion come from Even if nowadays there are bigger and better models, this one has a certain charm as the first large-language model to really capture the world's attention outside of ML communities. Also, practically speaking, companies tend to keep much tighter lips around the specific numbers for more modern networks. I just want to set the scene going in that as you peek under the hood to see what happens inside a tool like ChatGPT, almost all of the actual computation looks like matrix-vector multiplication. There's a bit of a risk of getting lost in the sea of billions of numbers, but you should draw a very sharp distinction in your mind between the weights of the model, which I'll always color in blue or red, and the data being processed, which I'll always color in gray. The weights are the actual brains. They are the things learned during training, and they determine how it behaves. The data being processed simply encodes whatever specific input is fed into the model for a given run, like an example snippet of text. With all of that as foundation, let's dig into the first step of this text processing example, which is to break up the input into little chunks and turn those chunks into vectors. I mentioned how those chunks are called tokens, which might be pieces of words or punctuation, but every now and then in this chapter, and especially in the next one, I'd like to just pretend that it's broken more cleanly into words. Because we humans think in words, this'll just make it much easier to reference little examples and clear verify each step. The model has a predefined vocabulary, some list of all possible words, say 50,000 of them, and the first matrix that we'll encounter, known as the embedding matrix, has a single column for each one of these words. These columns are what determines what vector each word turns into in that first step. We label it WE, and like all the matrices we see, its values begin random, but they're going to be learned based on data. Turning words into vectors was common practice in machine learning long before transformers, but it's a little weird if you've never seen it before, and it sets the foundation for everything that follows, so let's take a moment to get familiar with it. We often call this embedding a word, which invites you to think of these vectors very geometrically, as points in some high-dimensional space. Visualizing a list of three numbers as coordinates for points in 3D space would be no problem, but word embeddings tend to be much, much higher dimensional. In GPT-3, they have 12,288 dimensions. And as you'll see, it matters to work in a space that has a lot of distinct directions. In the same way that you could take a two-dimensional slice through a 3D space and project all the points onto that slice, for the sake of animating word embeddings that a simple model is giving me, I'm going to do an analogous thing by choosing a three-dimensional slice through this very high-dimensional space and project the word vectors down onto that and displaying the results. The big idea here is that as a model tweaks and tunes its weights to determine how exactly words get embedded as vectors during training, it tends to settle on a set of embeddings where directions in the space have a kind of semantic meaning. For the simple word-to-vector model I'm running here, if I run a search for all the words whose embeddings are closest to that of tower, you'll notice how they all seem to give very similar tower-ish vibes. And if you want to pull up some Python and play along at home, this is the specific model that I'm using to make the animations. It's not a transformer, but it's enough to illustrate the idea that directions in the space can carry semantic meaning. A very classic example of this is how if you take the difference between the vectors for woman and man, something you would visualize as a little vector in the space connecting the tip of one to the tip of the other, it's very similar to the difference between king and queen. So let's say you didn't know the word for a female monarch, you could find it by taking king, adding this woman minus man direction, and searching for the embeddings closest to that point. At least, kind of. Despite this being a classic example for the model I'm playing with, the true embedding of queen is actually a little farther off than this would suggest, presumably because the way that queen is used in training data is not merely a feminine version of king. When I played around, family relations seemed to illustrate the idea much better. The point is, it looks like during training, the model found it advantageous to choose embeddings such that one direction in this space encodes gender information. All day creamy Fontina and smoky prosciutto at low prices. Alright. 55:40 - D.B. He's like a good so that was review any comments or questions about that. Alright. Well, we'll start from there next time we do this. That was a repeat, so I didn't stop it in the middle. Any last points before we adjourn? 56:07 - D.D. It never gets less cool. 56:10 - D.B. Okay. Good. I'm glad it was the truth. 56:14 - D.D. That is the truth. 56:16 - D.B. Okay. Have a good afternoon. Alright. Bye everyone. 56:20 - D.D. Bye guys. Thanks for coming.
No comments:
Post a Comment