Artificial Intelligence Study Group
|
- Announcements, updates, questions, etc. as time allows.
- AS: has MS students doing AI projects.
- What could be in a rubric for MS projects in AI? (JH)
- Does it have a problem statement?
- Needs background discussion about the problem (inc. related work cited)
- Is there experimentation?
- Is there evaluation of the experimentation
- Claude.ai says:
- |---|---|---|
| Literature Review | Survey depth, critical analysis, research gaps | 15-20% |
| Technical Implementation | AI methods, code quality, experimental design, innovation | 30-35% |
| Results & Evaluation | Rigor, metrics, baselines, statistical analysis, limitations | 20-25% |
| Presentation | Writing clarity, visualizations, thesis organization | 10-15% |
| Contribution | Novelty, publication potential, real-world impact | 10-15% |
| Ethics | Bias analysis, societal implications | 5-10% | - If anyone has an idea for an MS project where the student reports to us for a few minutes each week for discussion and feedback - a student could likely be recruited! Let me know ....
- We discussed book projects but those aren't the only possibilities.
- VW had some specific AI-related topics that need books about them.
- Any questions you'd like to bring up for discussion, just let me know.
- GS: would like to compose some questions for discussion regarding agentic AI soon, presenting and/or guiding discussion
- Soon: YP would like to lead a discussion or present expanding our discussions to group.me or a similar way. Perhaps on 6/13/25 or TBD.
- Group website with records of meetings. It uses a blog platform.
- https://AIntwhatitusedtobe.blogspot.com
- Summer AI course:
- Interested in Machine Learning? Check Out IFSC 7399: Data Fundamentals (Summer 2025 - Second Half (07/07-08/08)).
- ST: Data Fundamentals
- CRN: 30354 Subject IFSC Course Number 7399 Section H01
- Class meets MWR 9:00am-12:00pm, EIT 220
- Instructor: Wu, Ningning
- Anyone read an article recently they can tell us about next time?
- Any other updates or announcements?
- We did the Chapter 6 video, https://www.youtube.com/watch?v=eMlx5fFNoYc, up to time 13:08. We can start there.
- Here is the latest on future readings and viewings
- Let me know of anything you'd like to have us evaluate for a fuller reading.
- https://transformer-circuits.pub/2025/attribution-graphs/biology.html.
- https://arxiv.org/pdf/2001.08361. 5/30/25: eval was 4.
- We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
- popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
- https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-refusals
- https://venturebeat.com/ai/anthropic-flips-the-script-on-ai-in-education-claude-learning-mode-makes-students-do-the-thinking
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html
(Biology of Large Language Models) - We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
- https://www.forbes.com/sites/robtoews/2024/12/22/10-ai-predictions-for-2025/
- Prompt engineering course:
https://apps.cognitiveclass.ai/learning/course/course-v1:IBMSkillsNetwork+AI0117EN+v1/home
- Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks
- LangChain free tutorial,https://www.youtube.com/@LangChain/videos
- Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
- Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
- Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
- Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
- LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt). 6/7/24: vote was 4 3/7. We read the abstract. We could start it any time. We could even spend some time on this and some time on something else in the same meeting.
- Schedule back burner "when possible" items:
- TE is in the informal campus faculty AI discussion group. SL: "I've been asked to lead the DCSTEM College AI Ad Hoc Committee. ... We’ll discuss AI’s role in our curriculum, how to integrate AI literacy into courses, and strategies for guiding students on responsible AI use."
- Anyone read an article recently they can tell us about?
- If anyone else has a project they would like to help supervise, let me know.
- (2/14/25) An ad hoc group is forming on campus for people to discuss AI and teaching of diverse subjects by ES. It would be interesting to hear from someone in that group at some point to see what people are thinking and doing regarding AIs and their teaching activities.
- The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI Pedagogy in the Curriculum." IU is on it and may be able to provide updates now and then.
AI Discussion Group
Fri, May 30, 2025
1:25 - D. B.
Well, we'll get started in another minute.
1:35 - Unidentified Speaker
Give people another moment. Hello, everybody. Hey, V., how are you?
1:46 - D. B.
Busy, I think.
1:48 - D. B.+M. M.
We're working. Maybe it's time to relax a little bit.
1:56 - D. B.
Oh, yeah. Yeah, maybe.
2:00 - M. M.
Maybe. Maybe.
2:02 - Unidentified Speaker
Maybe not.
2:04 - Y.
Maybe not.
2:06 - M. M.
So much.
2:08 - Y.
D., this is Y. Y. Hi. Joined. I met him last weekend. And he told me about some interesting project he's doing in his field with master's student in AI. Welcome, A.
2:28 - D. B.+Y.
Do you want to give us a few words about it?
2:34 - D. B.
Yeah, sure. Yeah. So yeah, one of my master's students, he is working on developing AI his model to predict the performance of the bioenergy reactor.
2:47 - A. S.
So he took my course with Dr. V. This semester it was a very useful course. And then he did some course project similar to that what we are targeting right now. And yeah, he was very excited, happy and I also want to know in that direction how AI because there's a huge amount of data available and very limited literature combining all this data together to predict the performance. So that's one project we are working on right now and looking forward to learn more and to help that student.
3:23 - D. B.
Yeah, well, you know, if a student has a presentation or even wants to rehearse a presentation, they're welcome to do it here before they do it. Oh, sure. Thank you so much.
3:34 - A. S.
Yeah. You know, anything like that.
3:36 - D. B.
you know, we could support that. All right, well, welcome, everybody, again. Let's see. So I don't really have any real announcements or anything. Anyone have any announcements or updates? So I met with one of my students earlier today. I don't know if She's going to be here. But anyway, she asked the question, what could be in a rubric for MS, master's level projects in AI? And I thought I'd bring that up potentially for a discussion at some point. I don't know if today's the right day. But because we had that one student write a book using AI, and we had a lot of discussion about how a project like that should be arranged and designed. But then that raises the general question of how could you have a rubric for evaluating AI projects for master's students? Any thoughts on that? So what does that mean? Yeah, I don't know. I mean, well, a rubric is usually like a grading rubric, where It gives you several, like a grid, right? A two-dimensional grid of sort of brief text descriptions of how you rate a project.
5:17 - M. M.
Guiding. Guiding how to write the project, how to evaluate, how much significance, how much contribution.
5:28 - D. B.
Yeah. Might have lines for significance, some lines for technical difficulty, lines for good exposition, those different rows of the rubric, and then the columns would be novice level, expert level, and it gives criteria for each of those.
5:47 - Y.
Got it.
5:48 - M. M.
And particularly in the area of computer science or information science, doesn't matter if the topic is different, The I think computer or information science needs to be mentioned or do you want engineering to be included? No Or just computer science People We can We can ask chat gpt and all of this Yeah, uh, finally, that's exactly what came to my mind that before I start thinking let me ask that gpt.
6:22 - Y.
So essentially the question is what are the dimensions, what are the kind of criteria that should be considered for a project? Is that right? Basically, how do you evaluate it?
6:37 - M. M.+Y.
Yeah, I mean, I mean, if you I'll share some of the things that are coming on top of my mind, but I've really not thought about this.
6:51 - Y.
I've just sharing whatever is coming on the mind, I think it first, whether it's business or a project, it starts with a problem statement. What is the problem that you're solving has a lot of importance than the solution that you're solving. Do you have the right problem, you know, and have you done enough analysis about the problem? I think that's the, so the quality of the problem is something that I think I would definitely look at. Then the second piece is once you have the problem, the thinking, so I could say strategy or planning around solving that problem, that you are not just going with one dimension, but you are saying this problem can be solved in A, B, C, D, all these three manners. So many people in the project, they just start solving it, but they don't give enough time and thinking how to solve it. So that's the second thing that I would say. So label it as strategy or strategy and planning.
8:12 - E. G.+Y.
And then the next thing, experimentation and evaluation, because exactly, exactly.
8:17 - V. W.
Oh, because I put I put a rubric in the chat. Oh, you see, rather than it's already done, baby, I know GPT for the win.
8:28 - V. W.+Y.
Actually, it's C. for O. for the win, because I'm trying the new C. out and I really like it.
8:37 - Y.
Yeah, I like I'm looking at it from what I had at Georgia Tech.
8:43 - E. G.
I pulled it up on. What they have, because experimentation and evaluation, creativity and originality, the presentation, report presentation. Yeah, I think that will come towards the end.
9:06 - Y.
But a lot of times, you know this do you have the right thing that you're solving people miss that but yes then um but I guess I have not seen what v. has put into but I'm pretty sure yes d.'s got it on the screen already so it's now a relic of the past and he has percented it all that too see all right I I apologize that I was only able to provide my Markdown and not a beautifully formatted table.
9:40 - V. W.
And I find it a little frustrating, because I'm really used to using Markdown now in my Python notebooks, and so forth. And it not everybody accepts it's interpret doesn't every everybody doesn't interpret it properly. So but anyway, I think you can read from this table.
9:55 - D. B.+V. W.
I think you could probably ask for quad that I had to give you a beautiful formatted table.
10:00 - V. W.
I did. But then I it didn't it didn't this is what it gave me when I asked for a neatly formatted table.
10:07 - D. B.
It gave me the neatly formatted markdown.
10:10 - V. W.
Now I could press it and it would give me an HTML version of that, but then that would be an image and this is text. And so text is more flexible, blah, blah, blah.
10:21 - Y.
Do you have the paid version or the unpaid version of code?
10:25 - V. W.
Well, I use it through p.com. So I get like a hundred bots for the same price. But it's interesting because on this particular one, it said, Uh, you're going to up how many points I need for you to get this answer. So I gave it 41,000 points to answer it out of my 4 million remaining points. So, and I always like as a matter of practice, if I'm bothering to use an LLM, I always peg it on the highest number of points that it wants, because I don't want half ass answers to my questions or I wouldn't be doing this. So yeah.
11:03 - Y.
Got it. All right. Any other thoughts on this question?
11:07 - D. B.
All right. Well, since we've been talking about these MS projects and even doing them, I'm thinking probably I might look at recruiting another student to write a book using AI, and we'll try to run the project better next time. But more generally, if anyone here has an idea for a master's project for the student, where they'd like the students to report to this group for a few minutes each week, then we can track the project and give them feedback as a group and so on. And if you have an idea for a project like that, or you have a student doing a project like that, we'll do it. We'll do it in this group. I thought it worked out pretty well for the book project.
11:53 - V. W.
Books on the various divisions in AI and machine learning would be super useful. Image-based methods like like generative adversarial networks and convolutional neural networks versus say the other branch of reinforcement learning and value versus policy-based learning methods. And so you could have a book on RL and a book on CNNs and using AI to write those books is a really good idea because you're probably going to get fairly up-to-date information and working examples that you can use as tutorials. In your own pedagogy?
12:30 - M. M.
Genetic, I mean, yeah, genetic AI, agents, agents, agents. Agentic AI, right.
12:38 - V. W.
Yeah, I mean, yeah, agents is everywhere right now.
12:43 - M. M.+V. W.
So people love it now, how to integrate it, how to use multi-agents and And this is very, very popular now.
12:57 - R. S.
Yeah, that came out of Google IO pretty strongly with the GEMMA models.
13:06 - M. M.+V. W.
Yeah, GEMMA. Very, very, very interesting topics over there.
13:12 - D. B.
All right.
13:14 - M. M.
Yeah, I have several of my students, they compare different large language models, comparison or sentiment analysis of audio, video, text. A very interesting project. And this project that Dr. S. mentioned, yeah, I remember the student.
13:39 - Unidentified Speaker
We have to invite the student.
13:42 - A. S.
This is a biomaterials. Sure. Yeah. P., yes, P.
13:48 - M. M.
Very, very good student and you should invite him to present the work. It's interesting.
13:55 - A. S.
Yeah, I'll do that, yeah, sure.
13:58 - V. W.
I asked Chet, I'm sorry, I asked Claude, if one was going to write a set of short primer books on AI and ML, what would be good divisions of the field that would represent all the members of the menagerie, for example, reinforcement learning, convolutional net neural networks, support vector machines, auto encoders, et cetera. And so here it comes up. Oh, this is nice. I love it. Okay. So here we go. We are good to go on. These are our books coming into the chat now. Oh, it's more than a thousand characters. Oh gosh. I won't let me post it. I got to, I got to trim it down. Hang on. Please repeat this answer with less than cause Claude allows more than a thousand characters, but this zoom application doesn't like a post that large. You know what I could do? I could just truncate it and give it in two pieces. That's probably a better way than re okay. So I'll just do the first four books. Okay. There's the first four books and then there's second four books. So it's eight books and it's foundations and classical methods like linear and logistic regression, which should be covered in every course on AIs, Dr. M. so aptly does. Decision trees, random forest, KNN, naive Bayes, supervised and unsupervised learnings, book two, deep learning fundamentals, perceptrons, CNNs, RNNs, LSTMs, book three, generative models and unsupervised learning, clustering, K-means, GANs, variation encoders, diffusion models, self-supervised learning, book four, sequential models, transformers, attention mechanisms, BERT, GPT, dot, dot, dot. Book five, it's a division to its own. Reinforcement learning, Q learning, policy gradients, actor critic, policy methods, exploration strategies, multi-armed bandits. I haven't, I'm not really familiar with those. Probabilistic models and inference.
16:04 - E. G.
Does anybody multi-armed bandit is yes it's a selection mechanism where you have say five different things but you're gonna reach a certain percentage that you're gonna use a cutoff so in a multi-armed bandit like like an a B test you're doing the same thing except instead of two you're testing five or six but you drop them off based on so many that you're running through that they don't see and you run with the ones that meet a certain criteria.
16:41 - V. W.
Does it relate at all to large ensembles of weak learners? Does it have any metaphorical connection to that?
16:50 - E. G.
No, this is more like an A-B test where you're testing different approaches. Like if you have five different screens that you're testing to see which one like or five different marketing campaigns, right? It's the same type of thing, which one has the most appeal. So you've got two that have no appeal.
17:13 - E. G.+V. W.
You drop those right away or there's a clear front runner.
17:18 - E. G.
So it's a runoff kind of mechanism.
17:21 - V. W.
Yes. Okay. Book six probabilistic methods and inference Bayesian networks, Gaussian processes, variational inference book seven specialized architectures and apps, graph neural networks, neural, ODEs, I assume that's ordinary differential equations. All right.
17:34 - D. B.
I get the idea. So if you want a student to specifically write a book about one of those topics that you want to be the supervisor of the student's project, let me know and I'll try to recruit someone to do it. What I did last time was I had the student figure out what kind of book they want to write. But if you have a book you want written, we'll find a student to do it.
18:00 - M. M.
Do you have experience, if you copy maybe for paper, okay, maybe this works for books too, but copy and say, write me similar content like this, but in different style, It's using the model or capture. No, they teach us something else. They teach us that from this paper or book, capture the main ideas, how it's created, but change the content. Did you try this? It's very cool, by the way, and it's working.
18:54 - V. W.
That sounds interesting. Instead of assigning a student one of these books, which could be a great deep dive, I would tend to assign them multiple books so they could compare and contrast the methods that could be used. So, for example, you gave an assignment one time on the voice print analysis of Parkinson's disease. And I used six different methods to solve that problem that had significantly different performance than I sorted them from high performance to low performance. And those methods were drawn out of multiple instances of these books. So I would go ahead and just ask the LLM to write the book in the voice that pertains to my personal profile and historic information. And I might front load the book if I have a large corpus that I'm particularly fond of, that summarizes the topic, I would attach that to my prompt so that I know that I would get a high quality book at the end, and then it'd be done. And then I'd read the and then I would work the examples in the book. And if the book didn't have examples, I would make sure that if any ideas weren't particularly clear, that there was an example either in Python or even HTML to drive the point home with the user. And this thing could be like, you could turn this around in a couple of days, just, you know, pizza under the door and go for it.
20:17 - Unidentified Speaker
D. D., can I share a screen real quick? Sure. One of the things that D.D. and I are doing is to actually start talking about, I won't go into the slide, but one of the things is basically the different approaches to models and different approaches to testing. And one of the things you brought up is to compare models. Now, while that is a valid and required approach, I would also do interrogation of the models themselves because a lot of times just the interrogation of the model will lead you to identifying whether or not it's a good fit. Now these include things like chi-squareds doing transformations on the data, EDA. Now some of the things that you've mentioned in the book is going into the models. Now here I'm talking about standard and Gaussian models. Here I'm validating the residuals on the model. Now I'm going to talk about things like multiple linearity, where we start looking at the residuals to see if there are patterns within the residuals to ensure that we're not trying to go with skewed data itself. But then we start going after all of the different types of models. Now you did mention...
21:40 - V. W.
Like Lasso and Ridge regression, for example.
21:42 - E. G.
Well, those are optimization models. Those where you go in and you say, okay, here's my daddy, you figure out what independent variables or explanatory variables really fit. And we talk about LASSO, RIDGE, Elastic Net, Principal Component Analysis, and the brute force OLSS, where it does left, right, and full analysis. Where it can actually have as many different models as there are 2 to the nth predictors. Now all of these are valid and viable and I'll be presenting this with D.D. but I'm actually going to put in code so that way when you're done with it you actually have scaffolding to actually perform of these. Now, these should be included, and it's easy just to write these. But the model itself is not important. It's the interrogation of the model when you're done with it that's important. Do we have things like a, what's it called, a Simpson paradox? With every model, there's multiple ways to interrogate it to see if it's a good model or not. So just by choosing models and throwing it at it, that's one approach. Choosing a model, then interrogating it, I think is probably a much better approach.
23:22 - R. S.
All right. Well, very good.
23:24 - D. B.
So we talked about some different questions today. For example, we talked about the rubrics MS Projects question. If anybody else has any questions, just let me know. I'll put them in the agenda, and we can have a group discussion about them.
23:42 - G. S
Yeah. Dr. B., this is G.S. here. I had a question. I'm sorry. I think I was unable to attend the last few sessions. So agentic AIs would be of interest, right? It is pretty big, even in Informatica, I think last couple of weeks back, we presented the entire framework of how we are going to use agentic AI for various artifacts and various data processing workflows within the product. So I think understanding agentic AI in more detail would be beneficial. And I can talk about that as well if required.
24:22 - D. B.
Do you want to make a presentation or ask some discussion questions? Guide the discussion? Yeah, sure.
24:29 - G. S.
I can put together some presentation and then we can take that up for discussion.
24:37 - D. B.
Okay. Maybe not next week, but you know, in the next couple of weeks here.
24:44 - Y.
Are you thinking the use of specific use case or just generally use of agentic AI?
24:52 - G. S.
So I will be coming in from the Informatica perspective, from the industry perspective, right, how we are approaching agentic AI solutions to help various customers. So I will be talking from that perspective.
25:14 - Unidentified Speaker
I think, sorry. Yeah, go ahead.
25:17 - G. S.
We had some software. Y. And I was thinking, maybe you give your presentation.
25:25 - Y.
But if you want to go ahead and think about a particular industry use case, as a step after that, then we could have a part of it. But I would like to see this. Yeah, I have a few ideas.
25:42 - G. S.
But if you want, I mean, I have a healthcare background and a banking background. So if there is a specific interest on either of those, then I can customize the presentation to talk about those use cases, if that makes sense. Or I can just start off with a generic overview, and then we can, in the next session, following on, we can talk about an industry-specific use case.
26:09 - Y.
Sounds good. I was thinking healthcare. And that reminds me, D., I wanted just to remind everyone that there will be a Mental AI Hackathon week after next. Both my kids are participating. So R., if you're around, you have some mentees on the ground. But if you are available, at least to M.J., I don't know whether she's attending the call or not. Be worthwhile for you all to see what the outcome of that session or event would be, as I wanted to remind everyone around that. So week after next, the Mental Health AI Hackathon is happening.
27:00 - M. M.
Maybe we can share later the topics, you know, we have challenges, five challenges that are really very interesting coming from veterans, hospital from B.C., from Children's Hospital. So five challenges that we will share with you guys. Yeah, when we finish. That's right.
27:24 - Y.
That's a good idea. Maybe on Monday, I will email everyone because we will not meet the following Friday. Once they are announced, once they are or whatever, you could share it with the audience. Or if you know already, V., and if you can share. I was kicked out of the discussion because there would be a conflict of interest because of my children. Yeah, sorry.
27:55 - M. M.
But it's better for the kids, you know.
27:58 - Y.
Yeah, I'm absolutely okay with that. I'll still be a mentor to a kid. But if you have that information, Next Friday, feel free to do it.
28:10 - M. M.
If not, once we get it on Monday, then we'll share it with the audience. I will share Friday. I think it's okay.
28:19 - Y.
I mean, no problem. And I'll not tell my kids.
28:24 - M. M.
Even if I tell my kids, they'll not do anything over the weekend. No, no.
28:30 - Y.
M.J. will distribute the challenges to everybody.
28:33 - D. B.
Okay, boss. All right. So, yeah. The next thing is, I thought I'd review how the website for this group works, because I think people get confused. So I'm going to just show you the website. And it's ain'twhatitusedtobe.blogspot.com. Or AI, right because AI is not what it used to be back when I was a grad student. So the way it works is this is blogspot.com, which is a blogging platform. So it's a chronological sequence of blog postings. And this is the way it's set up here, maybe not the best way, but it's set up as a sequential list of postings. Today's posting is here, starts here, and it scrolls all the way down to the end. And then the next one from last, or two weeks ago, is the next one. So it's just like a blog. So each one is chronologically listed from most recent at the top. This is two weeks ago's one, and it's really long because it's got the transcript. This is from three weeks ago. So you can just scroll down forever. I don't know. Maybe there's an end to it or not. I don't know. It does reach an end. Then you can always click on older posts and get from earlier in the month. The other way you can navigate is the blog archive has every month. It gives every month and the number of meetings, the number means in that month. So you can click on a month from October of 2023. There were four meetings.
30:38 - R. S.
So October of 2023, there were four meetings.
30:42 - D. B.
So I click on it. And this is the most recent one at the end of the month. And if you scroll all the way down, meeting from the previous week, scroll down, meeting from the previous week to that, and so on. And then you can still click older posts or newer posts to scroll up and down this long, long list of chronologically arranged postings. Okay. And then I've got some pages here. So some of these are out of date, but for example, here's a review on just different pages, right? It's a blog. You can click on any of these links and find the page, how to do PhD-level research, a few hints. I show that to any PhD student who seems like they want to know some hints, I'll point them to that location. And so that's how it works. And you can just always go to ain'twhatitusedtobe.blogspot.com. And here's the chronological list. If you want the sort of one page with the full posting, you just click on the title here, which this one doesn't have a title yet. That's today. I didn't give it a title yet, but we can go down and find the next one. This one is called Book Project Redux. You click on that, and it'll just bring up that posting as the one page showing in the browser.
32:16 - V. W.
Is there any way you could add documents M. M. to this blog so that she, if she ran a meeting, we would have a good record that matches the style of previous meetings?
32:27 - D. B.
I don't know how, I mean, this is my own personal, you know, from my personal Gmail account, so I don't know if I can get her or get somebody else to you, you know, if you were interested, a way to gain access.
32:42 - V. W.
Yeah, just a mechanism where it's sort of, we preserve your sense of privacy with respect to anything else, but we kind of maybe denote that we have a Dr. M. moderating or whoever, and then that person just fills in the gap for that time. Or alternately, we could just create a transcript and then send that to you, and you could incorporate it into the blog with any improvements. Yeah.
33:06 - D. B.
You know, if someone else wants to run a meeting when I'm not here, or even if I am here, I'd be happy to offload some of that. We'll have to, you know, use your Zoom account.
33:20 - V. W.
Yeah, but the Zoom account, co-hosting is difficult.
33:23 - M. M.
You can make co-hosts during the meeting, not before the meeting. Yeah, well, we could send out an email with a new Zoom link to your own Zoom account.
33:36 - D. B.
Yeah, but it's complicated. Then people forget to... they don't read the email and they go wrong one and everything. Yeah, that's true.
33:45 - V. W.
I don't know how to solve that one.
33:48 - D. B.+V. W.
Yeah, if then you have some suggestions, let us know, because I don't know how.
33:52 - M. M.
I'll think about it. It can get thorny really fast.
33:55 - V. W.
Oh, I know what I could do.
33:58 - D. B.
If we're going to do that, you could send me the link and I'll put it in the minutes for that meeting. So people will go, well, that wouldn't work because they might not go to the web... Yeah, well, you know, we'll just have to send out a couple of emails saying, you know, for the upcoming meeting, go to this Zoom link instead, because Dr. M. is going to be running it from her Zoom.
34:24 - Y.
I have no idea why I'm not able to open the site.
34:28 - Unidentified Speaker
I'll try again.
34:29 - V. W.
Sorry? Check the spelling. It's a long URL.
34:32 - D. D.
Yeah. Yeah, I misspelled it. But then I got still got it now.
34:40 - V. W.
Ain't is not typically a word that gets spell-corrected properly.
34:46 - D. B.
Yeah, I that's what I messed up too.
34:51 - Unidentified Speaker
Here it is.
34:53 - D. B.
It can be lower.
34:56 - V. W.
Quick footnote to E. talking about interrogating the models and this brings up the issue of figures of merit for evaluating how well a model did, like cross-entropy and different performance metrics and rock and all that sort of stuff. And some people swear by one and some people point out their weaknesses and say, no, you got to consider this too. Anyway, I've rewired the definition of book one in the chat to account for these multiple metrics and that's it.
35:32 - V. W.
And I want to make sure that notion matches with E.'s intention of how you would interrogate a model because that can mean, you know, that gets into explainability and that gets past simple statistical measures and into reflections of what the model's thinking.
35:48 - E. G.
That's exactly what I'm going for because if you assume the model's wrong, then you use these mechanisms and they keep pointing it to correct because your assumption, when I go into something, I assume it's not going to be significant. Until I can pass all of the rigor of the interrogation, will I say, OK. And that's why when you say you don't accept the alternative hypothesis, just as not enough information to reject the no hypothesis, that's why you say that. You don't say, well, the information says this is the answer. There's not enough information to reject that this is not the answer.
36:40 - Y.
Y., I have a question for you. Pardon the lack of my technical knowledge as well as you do, but when it comes to generative AI, where you may not have access to the models and you want to test the governance at risk and other aspects of it, how would you do that?
37:07 - E. G.
That's where, and generative AI is unique because it's a black box model. You hand stuff in and it spits stuff out. It doesn't give you the internal What you have to do is apply an approach that touches around the area. So when you look at cancer and you take out a solid tumor, you create a margin of tissue around it to extract it. You do the same thing with an LLM. You go in, you ask the question you want, then you go after the pieces around it to see if they support that answer.
38:03 - Y.
So, and we can take this discussion offline, but essentially, until you get satisfaction or some level of accuracy, indirectly you're saying you would need another parallel model to test until you say, okay, most of the time LLM is giving me a correct answer, so now I can stop checking it. Until that point, you may want a parallel automated model on top of it that you have tested and you have more confirmation on so that you can start trusting. Is that right? And again, I don't want to consume this call on this topic, but I'm trying to have some something going on and I'm trying to solve this problem.
38:54 - E. G.
Well, you can't use a parallel model because then you're actually changing the underlying infrastructure. It's like using a linear model than a Gaussian linear model. They're two separate models. While they're similar, they're two separate models. So that's why you'd have to stay with one model and interrogate it to see if it's useful. Out for confirmation bias. One of the biggest issues that I had when I was mentoring my kids at Georgia Tech was confirmation bias. They went in with this attitude that it was going to be this outcome and as soon as the model hit it, boom, they were done.
39:37 - D. B.
Okay, so we do have just a few minutes left. What I'd like to do then is we don't really have have time to finish the chapter six video. Maybe we can do that next week. But I do have a list of papers that we can evaluate for further papers and videos that we can evaluate for further reading. And here is one of them. I forgot where it is.
40:04 - Y.
It's somewhere in here.
40:05 - D. B.
Oh, it's this one. This one. So I brought it up here. And I thought we could read the abstract together. Discuss it, and then evaluate whether we want to read it in more detail. This is an old paper, but I guess it's very influential, so we'll see. We don't have to read it, but we can.
40:26 - V. W.
It's a who's who of contributors. By the way, I've done a follow-up on interrogation through metrics, and it's in the appendix to the at. All right. I'd be interested if I'd like to know if it passes the smell test from E.'s point of view. All right, so let's take a couple of minutes to and then we'll discuss it.
41:21 - Unidentified Speaker
Any comments?
41:30 - R. S.
Can you fill? All right, this is a little dense. Does anyone have any comments or questions about it?
41:57 - D. B.
It looks really interesting.
41:59 - E. G.
I'm not familiar with that one, but that follows what we're talking about on how to interrogate.
42:10 - R. S.
Are scaling laws trying to make Big, you know, large language models smaller. Is that what it is or what? Or more an easier way to use them or something?
42:27 - E. G.+J. H.
Is that what it's trying to do?
42:30 - J. H.
I think it's graining efficiency, data minimizing. Yes.
42:34 - D. B.
And I also agree, this looks fascinating. It's relevant. Okay.
42:39 - D. D.
You know, are they talking about, like, once have the large language model, you're talking about retraining it, or are they talking about training from zero?
42:53 - D. B.
Well, they're talking about cross-entropy loss. I don't know what that is.
42:59 - M. M.
The loss function.
43:01 - Unidentified Speaker
It's an important loss function.
43:04 - E. G.
It's the most important. So as they're scaling it, I'm expecting to reduce it train it faster, how much loss are you going to be or how much?
43:18 - V. W.
Cross-entropy loss measures the difference between predicted probabilities and true labels, essentially quantifying how wrong the predictions are. Cross-entropy loss measures how far off your model's predicted probabilities are from the true labels with larger penalties for confident wrong predictions.
43:37 - D. D.
That's what I thought it was too, v. The most important loss function. It's how you measure where you're learning, what you've learned, where you're going.
43:55 - M. M.
This is a good paper, really very good.
43:59 - D. D.
I would like to read the introduction.
44:03 - D. B.
Can you show that? Yeah. Let's see, I can shrink it a little bit.
44:19 - Unidentified Speaker
Oops.
44:20 - D. B.
Not too much. Oh, gosh.
44:34 - D. B.
All right, why don't we start with the first paragraph here instead of reading. Whole thing.
44:43 - Unidentified Speaker
Has anybody heard of those models?
45:05 - D. B.
This paper is a 2020 paper, so it precedes the transformers.
45:09 - D. D.
Yeah. I haven't heard those models.
45:11 - D. B.+D. D.
I'd like to know, you know, what size they are, all kinds of stuff. Yeah.
45:17 - D. D.+E. G.
I'm, I'm, I'm interested in it.
45:20 - D. D.
How they perform human level performance on these models. Can I, is that small enough that I can make that model? There's a lot of questions. Okay.
45:31 - D. B.
Any other comments on that paragraph? You All right, next paragraph.
46:07 - Y.
This is Y. Pardon, this is not area of my expertise and I have to take my daughter somewhere, so I'll drop off. But one thought that I had in my mind, and you all can decide whether it's right or wrong thought. If we could have, I know every Friday we meet, but do you think having like a group meeting or anything where we can share on an ongoing basis, something would be a good idea among us. Yes, we could have it. It's not just me.
46:40 - D. B.
Why don't I put it in the minutes for discussion like next week or something and you can you can. Yeah. OK. Next week, I'm out for scout camping.
46:51 - Y.
But week after, I'll see you all. Thank you very much. Have a nice weekend.
46:57 - M. M.
Thank you. OK. Have a nice weekend. I think all of this they are references, but they are all old, before Transformers. So I don't know how useful will be because right now everything is after Transformer.
47:17 - E. G.
They talk in the first paragraph about generative modeling. And in this, they talk about, in this paragraph, transformer architecture. It sounds like they're talking about maybe the precursor or the initial transformer. Correct.
47:36 - D. D.
Then the transformer came out in 2017.
47:40 - M. M.
And this is a 2020 paper.
47:42 - D. D.
This might include the transformer. OK, great.
47:46 - M. M.
Because the previous reference, as you can see, 18, 19, you know. Yeah.
47:52 - D. B.
Yeah, I mean, it does focus on transformer architecture, which is, you know, obviously the transformers, when this paper was written, are different from the, you know, are not as well developed as the transformers they have now, but they're based on it. Okay. All right, next paragraph, one very short paragraph, then we'll evaluate. How are we going to evaluate?
48:22 - D. D.
Four out of five?
48:24 - D. B.
Yeah. Any comments or questions? I guess my comment is that this would be really great as long as the power laws that they derive are still valid, right? If it's just historical and it's changed, then it would be of less interest, but it might stay might still be just as valid as they were then.
48:55 - V. W.
Scaling laws remain important. The true choice of basis functions of power laws may or may not be as useful now as they were.
49:04 - D. B.
Wait, what? Can you say that one more time?
49:08 - Unidentified Speaker
Sure.
49:08 - V. W.
It's like you're identifying these scaling laws with which you can predict what's going to count when you train, test, and do inference. So you can figure out which part of the models are costing you the most and scale accordingly based on your objective. But then if you represent those scaling laws as power laws, that's specifically choosing exponential functions to model these properties when exponentials may not be the best basis function to reproduce the criteria underneath the hood.
49:44 - D. B.
Well, power laws sound more like Read's law, not exponential. But your point is well-taken. It's still a valid point. They're picking a particular type of model or type of mathematical curve.
49:59 - Unidentified Speaker
Right.
50:00 - V. W.
And you want to make sure that making that selection doesn't commit you to missing something that'll turn out to be important.
50:10 - D. B.
Yeah. Any other comments? All right. Well, let's conclude with our evaluation. In the chat, just evaluate whether you want to read this paper in more depth together, you know, paragraph by paragraph, just like we did today on a scale of one to five. Okay, so one is you don't, you definitely don't want to read it. Five is you definitely do want to read it. Three is, you know, either or, and then two and four are, you know, between, between. Does that make sense?
50:43 - D. B.
So evaluate it from one to five, just like a course evaluation, like your students do. One to five, definitely don't want to read it. Five, definitely do want to read it.
50:56 - Unidentified Speaker
Three, either way, you don't care. Two and four, halfway.
51:00 - D. B.
And put your vote in the chat, and I'll get an average.
51:10 - R. S.
The title of the article again was something. Scaling laws? No, I said what was the title of the article again? Something scaling down large language models or something?
51:34 - Unidentified Speaker
Okay.
51:38 - V. W.
We're currently averaging 3.375, if my calculations are correct.
51:42 - D. B.
OK, I'll take your report. OK.
51:45 - Unidentified Speaker
Dr.
51:45 - V. W.
Bowling, I have a quick suggestion.
51:48 - G. S.
I'm not sure what is the criteria for selecting these papers, but would it make sense to consider the latest papers for reading, rather than looking at something that is five years old?
52:02 - D. B.
Yeah, I mean, if anybody has a paper or a video that they want us to consider, just let me know, I'll add it to the list and we'll read it.
52:14 - R. S.
If it was a newer version of something like this, then that might be even more appropriate.
52:20 - D. D.
Yeah, see what we can find. You know, not all the models are black box. There was probably still open-source stuff going on. Right now, it's so proprietary, they've got their locked down like Fort Knox.
52:36 - R. S.
You will also bear in mind there's a timeline from when these people sent their research in to be reviewed, so this research is probably more than five years old.
52:50 - M. M.
That's correct. Yeah. Vince suggests some biology of LLM.
52:55 - Unidentified Speaker
Interesting links.
52:56 - V. W.
Yeah, that was an anthropic paper that I think is as important for us to look at from its formatting point of view in terms of communicating interactive knowledge as it is for the actual content, which is quite good and state of the art. I'll try to dig up the link for that real quick.
53:16 - M. M.+V. W.
Yeah, I like it. I like it. All right.
53:19 - D. B.
Well, the average is literally exactly 3.375. So I'm going to go ahead and figure out how to go back to here.
53:38 - Unidentified Speaker
All right. And again, if anyone has any papers or videos they want to put up for evaluation, just let me know.
53:53 - D. B.
I'll add them to the list. And then next time we're ready to read something, we'll pick the one with the highest evaluation. Okay, all right.
54:08 - R. S.
Well, I guess we're at the end, so thanks, everyone, for tuning in.
54:13 - D. B.
One second, I've got that link for that transform paper and it's now in and you can look at it.
54:21 - V. W.
There we go, on the biology of a large language model. Thank you, thank you.
54:27 - M. M.
You want to evaluate it?
54:29 - D. B.
We can do that next time.
54:32 - M. M.
Yeah, it's good.
54:35 - Unidentified Speaker
Yeah, OK. I'll add it to the list. That's good. Thank you. All right. Thank you, everyone. Bye now.
54:56 - D. B.
Bye.
54:57 - Unidentified Speaker
See you all.
55:11 - Unidentified Speaker
finished AI Discussion Group Transcript.txt.