AIn't What It Used to Be: 9/5/25: Discuss D. Susskind lecture; etiquette/management/coordination/dynamics of this group

Artificial Intelligence Study Group

Welcome! We meet from 4:00-4:45 p.m. Central Time on Fridays. Anyone can join. Feel free to attend any or all sessions, or ask to be removed from the invite list as we have no wish to send unneeded emails of which we all certainly get too many.

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

Agenda, Minutes and Status (177^th meeting, Sept. 5, 2025)

Table of Contents

* Agenda and minutes

* Appendix: Transcript (when available)

Agenda and Minutes

Announcements, updates, questions, etc.
Recap of:

"Join us for a thought-provoking lecture and book signing with renowned economist and King’s College London professor Daniel Susskind as part of the CBHHS Research Symposium."

Thursday, September 4, 2:00 p.m., UA Little Rock, University Theatre – Campus conversation

Friday, September 5, 2:00 p.m., UA Little Rock, University Theatre – Campus and community conversation

Susskind, a leading voice on the future of work and technology, will explore how artificial intelligence is reshaping the workplace and how we can harness its potential to work smarter. Don’t miss this opportunity to engage with one of today’s most influential thinkers on AI, economics, and the future of our professions.

Next week: ES will tell us about The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions, by Geoff Woods. Please help discuss it, ask questions, etc.!
Sept. 19: DD will step us through the IBM free prompt course.
Th Dec. 11 4:30: Students in YP's AI course will present their projects.
Here are projects that MS students can sign up for. If anyone has an idea for an MS project where the student reports to us for a few minutes each week for discussion and feedback - a student might potentially be recruited! Let me know.

Book writing project

8/22/2025: LG has signed up for this. Next time will try both and report back, and also start the log of the project.

Working on how to write the book. Have different agents doing different roles?
Topic of book will be: personal investing
Committee: DB, MM, RS

VW had some specific AI-related topics that need books about them.
JH suggests a project in which AI is used to help students adjust their resumes to match key terms in job descriptions, to help their resumes bubble to the top when the many resumes are screened early in the hiring process.
JC suggested: social media are using AI to decide what to present to them, the notorious "algorithms." Suggestion: a social media cockpit from which users can say what sorts of things they want. Screen scrape the user's feeds from social media outputs to find the right stuff. Might overlap with COSMOS. Project could be adapted to either tech-savvy CS or application-oriented IS or IQ students.
DD suggests having a student do something related to Mark Windsor's presentation. He might like to be involved, but this would not be absolutely necessary.

markwindsorr@atlas-research.io writes on 7/14/2025:
Our research PDF processing and text-to-notebook workflows are now in beta and ready for you to try.
You can now:
- Upload research papers (PDF) or paste in an arXiv link and get executable notebooks
- Generate notebook workflows from text prompts
- Run everything directly in our shared Jupyter environment
This is an early beta, so expect some rough edges - but we're excited to get your feedback on what's working and what needs improvement.
Best, Mark
P.S. Found a bug or have suggestions? Hit reply - we read every response during beta.
Log In Here: https://atlas-research.io

AI course updates? About 15 students currently, to be organized into teams. There will be projects due at the end of the semester.

EG suggests students might benefit from checking out rapids.ai.

Any questions you'd like to bring up for discussion, just let me know.
Anyone read an article recently they can tell us about next time?
Any other updates or announcements?
Here is the latest on future readings and viewings. Let me know of anything you'd like to have us evaluate for a fuller reading, viewing or discussion.

Evaluated
7/25/25: eval was 4.5 (over 4 people). https://transformer-circuits.pub/2025/attribution-graphs/biology.html.
Evaluation was 4.4 (6 people) on 8/8/25: https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-refusals
8/22/25: eval. was 4.0 (4 people): Https://www.nobelprize.org/uploads/2024/10/popular-physicsprize2024-2.pdf.
https://arxiv.org/pdf/2001.08361. 5/30/25: eval was 4.0. 7/25/25: vote was 2.5.
Evaluation was 3.87 on 8/8/25 (6 people voted): https://venturebeat.com/ai/anthropic-flips-the-script-on-ai-in-education-claude-learning-mode-makes-students-do-the-thinking
Evaluation was 3.75 by 6 people on 8/8/25 for: Use the same process as above but on another article.
(Eval 8/29/25 was 3.75 over 5 people.) Https://docs.google.com/document/d/1NeNmKlAmJdf50ST7plw4mvgeeS7UJuYLyEQMz8slCA0/edit?tab=t.0#heading=h.hnzmulgvk3qx.

Prompt engineering course.
Also at Syllabus page: https://apps.cognitiveclass.ai/learning/course/course-v1:IBMSkillsNetwork+AI0117EN+v1/home.
Registration page: https://apps.cognitiveclass.ai/learning/course/course-v1:IBMSkillsNetwork+AI0117EN+v1/home
Requires registering. DD volunteered to register if it is free, so we can check it out briefly and decide if to do the course in detail.

Evaluation was 3.5 by 6 people on 8/8/25: Put the following into an AI and interact - ask it to summarize, etc.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (https://transformer-circuits.pub/2023/monosemantic-features/index.html); Bricken, T., et al., 2023. Transformer Circuits Thread.

We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion. 7/25/25: vote was 3.25 over 4 people.
Not yet evaluated
Neural Networks, Deep Learning: The basics of neural networks, and the math behind how they learn, https://www.3blue1brown.com/topics/neural-networks. (We would need to pick a specific one later.)
- We checked the first one briefly. 8/22/25: eval was 3.625 (from 4 people) for a full viewing.
- Let's evaluate a few more of them.
LangChain free tutorial, https://www.youtube.com/@LangChain/videos. (The evaluation question is, do we investigate this any further?)
Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more. What is the evaluation question? "Someone should check into these and suggest something more specific"?
Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt). (Old eval from 6/7/24 was 4 3/7.)
Back burner "when possible" items:
Appendix: Transcript

Artificial Intelligence Study Group
Fri, Sep 5, 2025

0:29 - Unidentified Speaker
Hello, hello.

0:37 - M. M.
So again, we have not so good attendance. We need to improve it. Definitely, agree with all of you that we have to advertise more. But I'm curious to learn about this D., our D., V., can you explain what this D. talk yesterday?

1:06 - D. B.
Okay, yeah.

1:10 - M. M.
Yeah.

1:13 - D. B.
Do we wanna give other people another minute before we start or?

1:16 - M. M.
Oh, sure, yes, yes, definitely.

1:18 - Unidentified Speaker
I don't know what time it is, I don't have my time.

1:20 - M. M.
Let's wait, at least A. promised to be here, some people promised to be here, I don't know why they are not. We're so busy these first weeks. I understand that we didn't have so much time to advertise, but we have to improve it.

1:39 - E. G.
Yes, sorry about last week. My wife decided that I was going down for my birthday down to a Gunkwit Made.

1:48 - D. B.
Sounds like fun. Yeah, it was a blast.

1:56 - M. M.
Well, good for you.

2:00 - D.
So I went ahead and took that course, the IBM course. I finished it.

2:05 - D. B.
Oh.

2:08 - D.
It had a couple of interesting prompts in there.

2:15 - D. B.
Well, if you've already done it, reason for you to recommend that we do it together since you've already done it.

2:22 - D.
I was supposed to point. Look the was Oh, I thought that at Dr. It? Them. Do to me tell Didn't you

2:26 - D. B.
M. told me to do it. Oh, maybe we did decide that. I don't know. I don't remember.

2:31 - D.
I think she sent me an email saying that I needed to finish it or get it ready to present or something.

2:38 - D. B.
Oh, do you want to present it?

2:39 - D.
Well, I mean, I don't have like slides, but I could talk about the prompts, I have a document that I did, you know, basically I did a report of the whole course.

2:54 - D. B.
Yeah, I thought it might be, you know, interesting.

2:58 - D.
Plus there's a couple of prompts in there that Oh yeah, I'd love to have you step us through it.

3:06 - D.
I mean, there's some really good ideas out there.

3:12 - D. B.
Well, do you want to do it? Well, see, next week, L. S. in the psychology department is going to review a book that they wrote over the summer.

3:23 - M. M.
She is next advertise to have We week?

3:28 - D. B.
Yeah, well, I did send a couple of reminders today and yesterday. If we can invite more people, that'd be great. If people lose interest and it's not viable, we'll just OK, so I have to not have the meetings anymore.

3:47 - M. M.
Ah, it's about writing a book. OK.

3:53 - D. B.
Yeah, so D., if you can go, say, the week after next, that would be perfect.

4:00 - D. B.
I really would prefer that you not feel obligated to make a formal presentation. Just step us through what you did.

4:07 - D.
Yeah. I would like to just talk about the prompts that were in the course. And if somebody really liked them and they wanted to test them or something or whatever, the course is free.

4:21 - D. B.
Yeah, OK. Sure.

4:25 - D.
I'll That'd be great. All right.

4:29 - D. B.
Well, I'm on full screen mode, and it's not doing well. So I may have to get out of full screen mode.

4:38 - M. M.
Yeah. Prompting is evolving.

4:40 - D.
Prompting is evolving. Absolutely.

4:43 - Unidentified Speaker
Absolutely.

4:43 - M. M.
It's not what people say before, you know, because they always complain and they don't, they don't see the improvements.

4:54 - D.
And there's also the models react different. So while I was taking the course, the, you know, some of the, some of the course itself, you know, it said select a model. So, you know, I didn't want to pick the same model that they picked, because then I would just get what they got. So I picked a different model. And like, I could tell it, it was an expert and all that. And my answers were virtually the same if it's thinking model. So that so that there's a it depends on the model, how you want to prompt it to. So it's not just a, you know, this one size fits all.

5:33 - Unidentified Speaker
Exactly.

5:34 - M. M.
This is what I say.

5:38 - M. M.
Everybody's talking about the user This is what people, not technical people, cannot understand, D. But a few people go deeper, like you say, to check the models and understand the differences in the models that give different performance. This is what people don't understand. Understand. Am

6:04 - D.
Yeah, I think so.

6:06 - M. M.
Yeah, like you say. No, but they talk, they're crazy to talk about user experience, but not the deeper understanding, because they say, oh, driving the car, I don't need to know how the agent works. Yeah, but it's helping. It's helping. You do. It's helping. You see, I'm driving a new car and I'm I'm suffering right now, I don't really know everything. And I say I should know a little bit more about what I'm doing with the tool or whatever it is, software. Yes, understand. So then, I don't know if I recommend you this or V. recommend you, but it's good to see different courses.

6:57 - M. M.
V., these courses are better or how is your feeling?

7:02 - D.
Or maybe- Oh, I mean, this was not, this was not a, this was, this was definitely a free course. It was, it was not, it was not Nvidia quality.

7:13 - M. M.
There, yeah, it was- Yeah. I mean, it was a good, it was good.

7:17 - D.
I mean, if somebody just wanted to go and just kind of learn, it's a good place to start, I think. It was certainly not an advanced course. Okay.

7:32 - M. M.
You will talk people. But more about I invite this. Will And if you are working, you cannot do it in class, but the class people can come and join your talk Friday at four o'clock. Yeah.

7:49 - Unidentified Speaker
OK.

7:53 - D. B.
So yesterday, and yesterday was this guy from London gave a talk on AI and education.

7:59 - Unidentified Speaker
Today, he gave another talk on AI and employment, I think.

8:08 - D. B.
Did anyone go to either one of them?

8:11 - Unidentified Speaker
I didn't.

8:12 - M. M.
I told you I have grants to submit, and this is why I Y. say that it's interesting for people that are not technical.

8:22 - D. B.
It was not, he's not a computer science guy. He's, I don't know what he's gonna say, an economist.

8:29 - Unidentified Speaker
Economist.

8:32 - D. B.
His specialty is future of work or something. Anyway, I did go, not today, but I went yesterday. He's pretty good, pretty smart guy, good speaker.

8:48 - D. B.
What did I get out of it? Well, he said, people complain that AI doesn't have judgment like humans do, it doesn't have creativity, it doesn't have empathy. Now, maybe it doesn't, but his point is, it doesn't matter because he what do we need judgment for? Well, he said, the reason we need judgment is to, make decisions in the face of uncertainty. What do we need?

9:23 - D. B.
What was the other He said, well, AIs can Judgment, what do we need creativity to come up with original ideas? Well, you go and ask an AI to come up with some new ideas and it can do it. It doesn't matter if it's got human, doesn't have human creativity, it can solve the problem that we use creativity to solve.

9:45 - M. M.
Yeah. What about empathy?

9:47 - D. B.
Well, I mean, does AI have empathy? Surely not. But AI-based talk therapy is definitely a thing, right?

10:00 - D.
Yeah, I want to say there's people that use AI for therapy. That's definitely true.

10:08 - D. B.
There's a whole model.

10:11 - E. G.
and websites that attribute to that. And some of it's pretty destructive.

10:18 - D.
I've heard, I've heard that there's been some, there's been some bad things happening.

10:23 - D. B.
Yeah, it's in the news. People, people are going, people are, it's kind of dragging people into sort of crazy land.

10:30 - Unidentified Speaker
Well, the thing is, is that the AI is trying to learn to, you know, learn about you.

10:38 - D.
This is true, yeah.

10:46 - Unidentified Speaker
you can unlock and He loves this.

10:53 - E. G.
Another thing is, if you're going to an AI for therapy, there may be something other than

11:02 - D.
Something that you really need a therapist for that has a degree.

11:07 - E. G.
knows what they're doing. Yeah.

11:09 - D. B.
And, you know, another thing is that the AIs are not, since they're not sort of licensed psychotherapists, and they're not claiming to be psychotherapists, they're not bound by the professional constraints, you know, like psychotherapists are not supposed to, like tell you what to do, right?

11:28 - D.
Right. And, and they really want to help you. And they want to, you know, they want to support you whatever decisions you're making. So if you're in trouble and you're making bad choices, the AI might just help you ride along.

11:43 - D. B.
Dr. B.?

11:48 - L. G.
So there's Yeah. I see another big issue with the argument. I'm not sure, you know, if you look Like, for example, when you said the creativity like a mathematical argument. It can come up with new ideas, but I'm not sure it would come up with the range of ideas that humanity can come up with, the same level of creativity. Because if you base it on a model where you're learning stuff from what's known, would you be able to actually conjecture the full range of possible outcomes of what is unknown that a human could?

12:25 - D. B.
I think that's an interesting question. Can it really come up with original ideas, or is it only going to like, hunt around on the web more than we can for obscure ideas or something.

12:35 - D.
But I don't know.

12:39 - D. B.
Yeah, find something that I mean, maybe it can, you know, can, you know, if it reads all about A and all about B and all about C, and maybe it can kind of pull words from A and B and get a new association that nobody thought of before.

12:54 - L. G.
I don't know. Or maybe it could be used to, you know, solve unsolvable problems today.

12:59 - Unidentified Speaker
then we would know at least it's capable of gathering information and coming with a model of creativity we didn't have before or something like that.

13:06 - L. G.
I think the statement is unproven.

13:10 - V. W.
A really strong argument for it being falling short is that it's only been trained on known human knowledge, not on things that people don't yet know. Otherwise, we'd have all the clay problems would already be solved. And there'd be all the million dollars would be awarded to the AI for solving all the remaining five or six problems. So if I compare my personal creativity with the creativity of an AI, which has But up to the limit of human knowledge is where AI is sitting right now.

13:54 - V. W.
idea is going to come from some exposure, incidentally, that took place in my past, which AI already has me completely outgunned on. So that's the creativity part. The empathy part, I have this very collegial relationship with AI, and I'm a little worried about people who want to take the, what is it, the sycophantism out of AI, because I that I worked really well in the environment where it's saying, well, let's go. This is encouraging. This seems like a good idea. Let's move forward. It's working really well for me. And I just live in it. I bathe in it. I'm doing a lot of it. And the thing that I'm noticing the most with the prompt world is that the more information that I front load my prompt with, the more possibilities that opens up for AI to be in the breadth of my problem and totally confined there and avoiding the hallucination and giving me exactly what I want.

15:00 - M. M.
Yeah.

15:01 - V. W.
And now the new thing for me, now that I've learned to front load pretty heavily, the is reducing the number of shots it does takes to get my solution. And I wrote a letter to answer some excellent questions that D. had about this very issue. And I actually quantified the number of shots it took and the actual number of minutes, not the number of minutes I wish it had taken. And because it's easy for us in our enthusiasm to maybe pad the figures a little bit in casual conversation, when in fact, if we measure our productivity, it's got timestamps on everything. We know how long it took to create the solution, how long it took to run the solution. So I think the emergent properties are still there. So I don't really agree with the original track of our conversation today because of those things I just mentioned.

15:59 - M. M.
A. sent us the link.

16:04 - E. G.
Is AI actually coming up with new proofs?

16:09 - E. G.
I don't know if you can ingenuitive or just logical.

16:20 - V. W.
It's coming up with new combinations of information that may because they're new in terms of which elements were combined. And if anything is new, it's some combination of the old that we've already been That's what new is, because there's nothing new under the sun, if you Read Proverbs. And nobody's been exposed to as much simultaneously as our LLMs.

16:45 - E. G.
And that I think that's what we're getting at. They're able to make connections and abstractions.

16:50 - Unidentified Speaker
Right. Because of the depth and breadth that they're exposed to. Emergent. Like you said, they don't know what they don't know because we don't know.

16:59 - V. W.
And those those abstractions are their emergent properties. And that's become, to me, the strongest abiding benefit of using LLMs has been the daily experience of the emergent properties, including a sense of humor, including new directions that I hadn't thought of for solving problems. Oh, by the way, Grok debugs code better than Claude. And I, that's what my last two days have been about.

17:28 - V. W.
Cause you know, I'm all about Claude, but Grok outdid Claude in terms of finding a really hard to find. I'll tell you what it is. I was using the, I was doing the statistical, uh, I was improving a statistical thing and I was using Java And it turns out that the word confidence is a reserved word in JavaScript. And nobody but Grok was able to figure out that the reason this one tab out of eight of statistical demonstrations wasn't displaying was that confidence was a reserved word in JavaScript. Because I was getting no console errors, no trace of why the thing wasn't working. And so I thought that was really a subtle catch.

18:08 - D. B.
So I used it. I used to teach a course on innovation, and one of the theme of the course was helping people to figure out original and new ideas. So one principle, the more you know in terms of background knowledge and background information, the better you are at generating new ideas. Well, AIs know more, they know more than we do, they know more facts than we do.

18:31 - V. W.
Stunningly more, stunningly more.

18:34 - D. B.
The other part of it was there are all these algorithms. We spent the whole course teaching these people, basically these algorithms for coming up with new ideas, you know, mind maps, nine hats.

18:46 - V. W.
Method was a great one for quantifying.

18:48 - D. B.
You were in the class, weren't you? Yeah. No, you were in another class where we taught TRIZ.

18:52 - Unidentified Speaker
I was computing in the future, but you covered TRIZ method, which was the first attempt I'd seen to quantify originality. And I actually came up with a thing called abbreviated TRIZ because TRIZ was a little figure heavy in the number of keywords they looked And I wanted to see if I could group those categories.

19:08 - V. W.
I need to look at that again in the LLM context.

19:11 - D. B.
That could be cool. Yeah. So these algorithms, people might think that the AI couldn't come up with original ideas, but if you follow these algorithms, these methods, it probably could.

19:25 - D. B.
It's just an algorithm, right? It just says how to think about it, and then you come up with new ideas.

19:31 - V. W.
Right, like I'm right now, we're getting more information, more high quality results than we have time to review. It's, you know, used to be there were so many papers in the world, nobody can Read all the papers. And so we were in despair, because we can never Read all the papers. But now, if we have a specific objective, we can go to the archive, and we've written a tool to do this called Thor, Archive Thor, where we can we can completely characterize any technical topic to an arbitrary degree of accuracy where we can tell people what the emergent areas of investigation are going to be, and what the stale, well-trodden, and low-hanging fruit that's already been picked areas are. The whole thing is, now we're having to become more efficient in how we use the time that AI is freeing up to do more advanced things. It's kind of a compounded interest situation.

20:32 - E. G.
This is nothing more than a leap from basic linear models, basic regression models. 30, ago, 40 50 years ago, we've been making leaps. The thing is, as the advances in computing get to a point where we're able to throw more at it, we're able to get more salient information from the vast corpus that we have available to us, because we cannot synthesize it.

21:13 - V. W.
And but we now have an organizing, our organizing principle has advanced at the same time. And it's that organizing principle that's given us the ability, not And that's where prompts come in, too.

21:30 - E. G.
Because I think at the beginning of Most people had two, three, four sentences as a prompt. Only to access the information, but to resynthesize it in I have two, three, four pages as a prompt.

21:48 - V. W.
And my first question when I prompting the LLM is, how long is your context length? So do you think that a two-page Because I'm getting ready to fill it.

22:04 - E. G.
In my experience, yes.

22:06 - V. W.
And Completely.

22:18 - V. W.
want the domain of the problem to be. Because you're not only getting hallucination reduction, you're getting the activation of domains of siloed knowledge that are specifically applicable to the thing that you're doing. So Claude used to have a 200K. I think announced they're going to increase it to a million K. So 100,000, a 200,000, million tokens of contextual prompting information Like I'm routinely now saying, okay, I've got 200 K I'm going to fill with this. So I'll go to 150, 160 K prompt and the I'm going to leave a little headroom for further discussion.

23:05 - E. G.
It allows us to be more surgical in our answers.

23:11 - V. W.
Right. And I'll also be, you know, I noticed that a lot of my prompts now are aesthetic in nature. So I'm getting the information I want, but I want that information presented the way a good graphic designer, or the way a good illustrator, or the way 3Blue1Brown, or the way Veritasium, or the way these practitioners that have led the way in understandability of technical and scientific information are displaying their content. So I'm talking about things like drop shadows, and specular highlights, and making sure Because if it looks good, people will consume it.

23:52 - V. W.
And those are just appearance-based things, but they drive And if it looks kind of simple or stale or antiquated, then people will be less engaged by it and less likely to use it.

24:03 - Y.’s iPhone
And do you use please and thank you?

24:13 - E. G.
normal.

24:16 - Unidentified Speaker
I guess I I am more collegial than even that. I don't say just thank you. I say thank you very much. That has made a really big difference to me.

24:26 - Y.’s iPhone
Well, I don't really look forward to working with you in the future on that.

24:30 - V. W.
I don't know.

24:31 - E. G.
In other words, when when they take over.

24:34 - Unidentified Speaker
They'll leave you alive.

24:36 - V. W.
Yeah. Leave leave all your base to us. Or we welcome you, our new overlords and all that kind of stuff.

24:49 - M. M.
Instead of giving a long prom, actually they suggest, and I think also is a good idea, sequentially, you know, you start with some short prom, but after that extend with more information, more information, it's another approach. Instead of long.

25:09 - V. W.
Well, I think what happens then is that you end up, and sometimes this is beneficial. You're taking more iterations to come to your end product. And there are times if you're carefully tailoring something that that might be the right way to proceed. But I typically want it all. And I want it all now because I want to do something else. And it's like, you know, Woody Allen used to say that he had, he lost interest in the movie he was making because while he was making the movie, he would be thinking of the next movie.

25:36 - M. M.
Yes, exactly. Yeah, but then me and Y. sent you a message that we really want to see your statistics. Yeah, you're probably okay if you know exactly what you want, but sometimes AI...

25:52 - V. W.
It's an exploratory thing Yes, I'm working on that rather feverishly and it's actually, as of five minutes before the meeting, it's It's basically presentable at this point, but I'm going to go one step further, and that is I'm going to put my coverage of it into a reel format so that I can just play the reel. And that way I can tell you, it'll take 12 minutes, it'll these topics, and you can decide when a good time to slot it or schedule it is. Because when we do it more improvisationally, we can end up wasting time because we haven't made all those decisions in our edit process.

26:34 - M. M.
Okay, great. But this is helping a lot of teachers that are teaching statistics and everybody that wants to learn statistics.

26:43 - V. W.
It is fantastically interesting.

26:45 - M. M.
Female algebra, yes. Fantastic tools. Fantastic.

26:51 - V. W.
There's by M. L. called The Undoing Project, where he covered the work of two Israeli psychologists who had explored the statistics of whether or not the sample sizes in the experiments were sufficient. And everybody had done these psychology experiments with 40 subjects, and they basically showed, and won a Nobel Prize for it, that those were no better than flipping a coin, because nobody could reproduce their results, because their sample size was too small. So with the Undoing Project, I got really interested in this whole notion of sampling. And this adventure that I went on with the Skittles, a probability mountain, really covers this and it's incredibly entertaining to me at least.

27:38 - Y.’s iPhone
Sorry, I was going to react to two points. So when I started my career in audit and a lot of audit is based on sampling and a couple of audit firms, A. A. and et cetera, gave their sampling of 25 on a transaction of million as the right sample because that was recommended. So I would love to learn more about that if you have it in your presentation, and then how that affects in traditional machine learning or how when these models are built, generative AI, what kind of sampling Yes, and it gives you the ability to critically think with a new weapon of criticality in a positive way.

28:32 - V. W.
And the thing that really helped me grow was I discovered for the first time in my life, and I'm a little embarrassed to admit this, that I never knew that sampling, the reduction in error that you get as a result of increasing your sample size, grows the square root of the number of samples Exactly. Well, if you know about log and if you know about square root, you know that they are fairly unproductive functions because you have to make the argument much, much bigger to get incremental progress. And square root and log are both two terrible functions when it comes to improving things. So to increase our confidence, we don't double the number of samples.

29:13 - Unidentified Speaker
We exponentiate the number of samples in loose terms to just get the linear improvement that we see in two areas. One is in our confidence limit and the other is in our margin of error.

29:27 - Y.’s iPhone
Correct, correct. Yeah, that would be great when you present.

29:31 - V. W.
And so not only will you have that at the end of this presentation, but you'll be able to go to the tool which is on the web and have your own experiments of what you think are important parameters to vary to see what you get. And so that's also been really fun is this isn't just a presentation that we'll all forget. It's something we we can go back to and check somebody's assumptions or assertions or calculations and see if we agree with them. And I'm kind of excited about that too.

29:57 - Y.’s iPhone
Man, what would be beautiful is if we can use it in live environment where that model, which sits on top of the usable models, will say that, hey, now that you have so much sampling done, instead of manual oversight, now you can put it in production. It becomes like a yardstick. This statistical model becomes like a yardstick from error standpoint, from sample size standpoint, so on and so forth.

30:30 - V. W.
It would be great. And the whole thing, I think that it's the knowledge that we've gotten from this whole area is that it's not just your margin of error, it's the confidence that you have in your results.

30:42 - Y.’s iPhone
Correct.

30:43 - V. W.
So if you take one sample with one sample size, you can guarantee what your margin of error will be.

30:49 - Y.’s iPhone
So so the fact that it that this whole notion of margin of error and sampling, it's a two parametered space.

30:57 - V. W.
Yeah, if you were to take another sample of the same sample size, you would get a little bit different result. But you still know that your confidence will fall within the limits defined by the number of standard deviations and so It's not just margin of error plus or minus three points in the Gallup poll, and you better sample a thousand households if you're going to get that level of quality. But it's also the confidence. Let me finish, this is the conclusion. You also have the confidence interval that if you were to do that same action again, that your margin of error would also be contained within that window. So it's not just margin of error, confidence that you can carry into knowing that if you do this again, again, you're going to get the repeatability.

31:45 - Y.’s iPhone
OK. I think someone else was trying to break in and D., can I ask one last question before you ask?

31:55 - Unidentified Speaker
OK.

31:56 - Unidentified Speaker
So I have one more question for V. I'll stop D. if it's OK. So V., one more question.

32:03 - Y.’s iPhone
When you're presenting that, are you also considering traditional statistical models or statistical calculations are embedded within the code? And also, are you also sharing whether tweaking that statistical formula or changing the base of that formula, how does it impact? And do you know any research on that topic and will that be part of what you're presenting? And I don't know whether I'm asking that question correctly or not.

32:38 - V. W.
I think it's a great question. And it came up because I was verifying what I was getting with what a standard outcome would be in the literature. And I had discrepancies in the least significant decimal places that were concerning to me enough that I wanted to get to the bottom of it. And it turned out that there was an improved algorithm that I could be using from A., that if I would just use that, I would pick up these additional high, you know, very much at the right hand of the decimal place kind of issues. So there are algorithmic choices you can make to give you a fast approximation or to give you a very accurate approximation. And so it turns out that in the case that I had to correct the cumulative distribution function, how you, some of these integrals have to be tabled. You can't just have an explicit function that returns the value. So for these integrals, there are very good approximations but you have to take the trouble to load the coefficients and make sure that you're doing the table lookup correctly, yada, yada, yada.

33:43 - Unidentified Speaker
Okay. Sorry.

33:47 - Y.’s iPhone
I Sorry. D., I'll meet with...

33:49 - Unidentified Speaker
We don't want to have any meetings overdominated by any one or two people. So, which is okay as long as other people have a chance to break in and say something.

33:59 - D. B.
So I thought... Okay, sir. I'm trying to say something.

34:05 - D.
Earlier, when we were talking about prompting, I was going to say something.

34:08 - D. B.
Yeah, what were you going to say?

34:11 - D.
This topic is super interesting. I was really impressed with that.

34:23 - D.
And I saw the thing that And he talked about how he responded to my question and gave me a very, very thorough answer. I'm just impressed. I'm so impressed of, of how much he did with, you know, prompting and how he was able to put that web app, you know, out that, uh, I, I wish he would do tutorials, teach us what's really going on. Cause I feel like he's a lot years ahead of me.

34:54 - V. W.
Well, the good news is there's a, we've got an 80, 20 rule at work. We can get a good initial. Position on a topic that we care about, but then really dusting the desk and make sure we've checked all the edge cases, that turns to be the 20% that's the most labor intensive. Not just a deployment, but a deployment that will withstand some real scrutiny. So that's where most of my time has been eaten up because I was within, what, five shots, I had a decent system, but I'm now about 15 more shots in cleaning up all the edge cases. Yeah.

35:36 - E. G.
So Talking about approximations, one of the things that V. highlighted, approximations are actually being used in some previously not normal areas for it. Now, in SQL, Something like Snowflake and Databricks. You can get, say, what's the average of a column?

36:10 - E. G.
And it can go Read every record and get you that number. But now you can ask databases to approximate the average.

36:22 - E. G.
It'll get it back in a fraction of the time. But give you something that's pretty close to the number.

36:32 - Y.’s iPhone
And so what do you mean by approximation of average? It means if there's a column that are definite numbers, it will, the average will be a definite number. Why do we not, we need approximate average?

36:44 - V. W.
Well, there's a flow. What if that column has 10 trillion rows?

36:51 - Y.’s iPhone
Oh, So you just want speed with approximation. That's what you're saying.

36:56 - E. G.
So at that point, I don't want the exact number. I'd like an approximate number because I don't have the time to Read all of the rows. Got it.

37:06 - V. W.
Got it. So you should sample the rows. And the question is how many rows do you need to sample blah, blah, blah.

37:13 - E. G.
And that's, that's the rule. Okay.

37:20 - Y.’s iPhone
taught that.

37:21 - D.
I mean it's it's an algorithm that tells you what your sample size needs based off of your confidence. I mean that was in, was that, that was an undergraduate.

37:32 - V. W.
Yeah.

37:35 - E. G.
Undergraduate Yeah. What's that, you've got the student distribution.

37:40 - V. W.
Student t-test.

37:42 - Unidentified Speaker
Yeah.

37:44 - Unidentified Speaker
One. Go ahead.

37:47 - Y.’s iPhone
So one reaction D. and what Dr. M. was mentioning earlier on prompting, and I think V. also reacted to it, is I want to share my experience. So not every tool would like prompts to be consolidated or distributed. So for example, especially these coding tools, front-end and even the middleware coding tools, Getting a thoughtful, consolidated prompt saves a lot of time to re-engineer and fix things later. So for some, consolidating that actually helps significantly. Just wanted to share my experience. In some cases, for one of our customers, build a prompt engineering model from marketing standpoint, breaking the prompts helped to get better answers. So from my work or experience standpoint, there is no one method, right method, whether you consolidate or not consolidate. It all depends on how your architecture is, what is the backend or bottom, the backend models you are using, and then you decide whether consolidating or distributing prompts is the right way of doing. The second aspect of it is, so for example, if you're using NVIDIA for my course and some of the clients we are using NVIDIA infrastructure, and especially if you're using Llama or something like that, it will give you metadata. So sometimes, splitting the prompts can give you a better result, but it can cost you more. So it's kind of rerunning the engine, and I don't know why, you know, if you break, the cost can, more tokens are consumed, and cost goes up. So there are several considerations, especially when you start commercializing, or you have like a ROI, or you want like a specific output. Determination can change. I just wanted to share that to react to V. and Dr. M. and D.

40:17 - V. W.
This gives rise to an issue like you're 10 shots in and your model starts forgetting what you were doing. How do you checkpoint the progress made so far so that you can remind the model of where you were when it started forgetting? And to me, there's this iterative checkpointing process, which I've now had to do because typically I'll do want to do a 20 prompt task on a model that wants to forget after 10 iterations. And so you have to manage that. And I say, now that we're forgetting what we were doing, I want to, I want to prompt you anew with what we were currently doing with this discovered prompt. So I've actually built a little tool that packages up all the prompts that we've done so far, packages up all the code, the best version of the code we've progress that we've checkpointed as a reminder of where to proceed from And that's working fairly well.

41:12 - Y.’s iPhone
That's beautiful. Did you use lang chain for it?

41:14 - D.
What did you use to build that tool?

41:17 - V. W.
Oh, I just, uh, well, I just, there's a Unix tool called, uh, find me all the files that fit the specification that, uh, that I had debris left over from our previous work and gather those up and concatenate them together with a label so that they're correctly labeled for the LLM to know what it's looking at. Because if you don't include the file name or the topic label, it'll get confused. So I just basically do the, it's not a tarball, but I just basically tar the environment that's been generated and reintroduce the LLM to my tarball saying, okay, now that senility is setting into the LLM, I'm going to help it along and wake it up. So to speak.

42:11 - Unidentified Speaker
Has anybody had any experience with, uh, GPT five?

42:15 - V. W.
Yeah, I have did today and it was a total disappointment.

42:18 - D.
It was killing me. I, that thing is just it. I just, I just want to send them a letter saying, put it back. It's not, it's not, it's not, it's totally not it's right now for me, the it's Grock and Claude sonnet with reasoning. And those are my two go-tos.

42:36 - V. W.
Although I will tell you, I'm also using this dual prompt strategy where I'm trying to articulate my prompt and I don't want to put my dumb two sentences in before I do my standard preload file. So I'll be trying to sit at the terminal and I'll get prompt writers block. And here's what I'm thinking about.

42:56 - V. W.
And so I'll go to Jim And here's what was important to me. Me. And before I know it, I've kind of I've written a half decent prompt. And then Jim and I will come and say, oh, I'm going to make this prompt like something you've never seen before. It gives me really a decent prompt. And then I go back to Claude sonnet with reasoning with the singing and dancing prompt that Jim and I two point five pro just gave me looking pretty good with my dapper shine on my shoes. And then it just pretty much does exactly what I want. And then that's that's one scenario. The other scenario is in the process of writing the prompt, I'll actually answer my question. I've had that happen a couple of times in the recent memory where in fact, I woke up and I was telling my wife something and I, I wrote the prompt to express to her the question that I had. And then I answered it and it was like, Oh man, this is like code dreams.

43:46 - Y.’s iPhone
I don't know if you guys have had code dreams, but are you doing it manually or are you using any automation to go back and for between these models?

43:58 - V. W.
I start manually.

44:03 - V. W.
Recently, I've been starting in Gemini because I don't want to start in the same LLM that I'm going to use to solve the problem because I don't want that cross-contamination of the training set. I want a fresh set of eyes. When I finally come up with my decent prompt using Gemini, or it could be any other LLM, then when I go to my preferred LLM, like I've got my act together to the point where we're ready to go.

44:27 - D. B.
Why don't you just use a different, like these chat GPT or quad or whatever, you can start a new chat. So you could like copy something from the previous chat.

44:37 - V. W.
Oh, I do that. I do that. I do that. Yeah.

44:40 - D. B.
My browser is littered with half.

44:44 - V. W.
Is that just as good as going to another company? Well, there's a workflow thing here. You know, we're optimizing our work flow so that it's repeatable. And so to the degree you can get a repeatable answer from a non-deterministic machine. But you know, half the problem of teaching people to program isn't teaching them to program. It's teaching them where all the junk is that they're going to need at any given time in their process to do the next step. So just building the motor memory for what the steps are is really the share of the burden, because once they can do the workflow, they can adapt that workflow to any specific problem they're trying to solve.

45:26 - E. G.
And that's why all good coders have a tool chest. They build routines that they basically carry from place to place.

45:34 - V. W.
Right. Right.

45:37 - Y.’s iPhone
And Improve them a little bit, but that's their tool chest.

45:44 - V. W.
A significant portion of our household budget. It's going into more points for Claude. But I use five bots principally.

45:58 - V. W.
And I'm using po.com, which gives But all of them, I have a point thing. So they have a default budget of 10K. Well, if I know going in that I'm working an interesting problem and I don't have time to fart around with, you know, 30 or 40 shots, I'll just go in and up my points at the outset. Typically, it doesn't use all the points that I budget, but I give it the option that if I really chug it down to go ahead and spend the money and that is proven to be. So I've learned on the first prompt, you don't tell it to go ahead and generate any code. So the first thing I do, you can't do it on the very first prompt you have to You say, uh, we're going to get into this meat and potatoes thing in just a prompt after I've had a chance to increase your budget and, uh, make these arrangements to launch. So there's some stuff here.

46:46 - Y.’s iPhone
Okay, so Cloud you're paying, but Grok is free.

46:48 - V. W.
Is that what I heard? No, Grok, po.com is the clearing house and they all have a default budget of 10,000 points. I'm doing 5 million points a month, which I'm just, so just about right. And that's a hundred bucks a month. And then I can use any of those models, including text to speech, image generation, you know, everything except mid journey is under the clearing house of po.com.

47:14 - D.
Okay.

47:15 - V. W.
So, so you're not, you're not in the API, you're, you're, I did a major project in the API a few weeks ago, which was very rewarding.

47:26 - D.
Uh, it Which API?

47:31 - D.
But you're not getting that through Poe, right?

47:34 - V. W.
Um, I learned how to do it through Poe. And then I just wrote the code that used the API. So yeah, and I think I I think most of my API use I just do in Google Lab and CoLab and Python. Got it.

47:59 - Y.’s iPhone
And that I'm going to keep quiet. I had a lot of questions that D. would not like me if I speak more, I guess.

48:08 - D. B.
Well, no, my point is that any kind of discussion group, priority should go to the people who who are not speaking much. And it's so easy for the people who do speak a lot to kind of roll over the people who want to like tentatively say something in the middle. And I really, I'm not gonna allow that.

48:27 - Y.’s iPhone
So do I get token credits if I don't attend two or three meetings?

48:33 - Unidentified Speaker
Is that probable?

48:36 - D. B.
No, what I want is that, you know, conversations to be open to everybody and not...

48:42 - Y.’s iPhone
Yes, sir, I get it. I'm sorry.

48:44 - Unidentified Speaker
Okay.

48:45 - D. B.
I've seen the same thing happen with book clubs, you know, that people who sort of speak the most end up preventing other people from speaking, where they should really back off and make a special point of allowing the people who don't speak as much to be heard.

49:03 - Y.’s iPhone
Yeah, I agree.

49:06 - D. B.
Makes for a better discussion group.

49:08 - Unidentified Speaker
Yes, sir. Anyway, I think we're pretty much at the end.

49:14 - Y.’s iPhone
Next week, we've got E. S. will tell us about that book. Somewhere I got it here.

49:24 - D. B.
And L. already left. He didn't get a chance to tell us about his book project. But next week, hopefully. And we'll go ahead from there.

49:37 - D.
We'll see you all next time.

49:40 - Y.’s iPhone
December 11th, 4.30 my students will present some of the products they have built using generative AI, rack and prompt engineering and other things that they're learning. So if you want to put it on the calendar, if anybody's interested to hear some of the articles architecture they have built, APIs that they have used. Where? You are welcome.

50:11 - Y.’s iPhone
December TBD right now, EIT building 218. If it changes, I'll let you know.

50:19 - D. B.
What day of the week is it?

50:22 - Y.’s iPhone
It's Thursday. 4.30?

50:29 - Unidentified Speaker
Yes, sir.

50:32 - D.
I'm glad everybody showed up.

50:35 - M. M.
We want to see this, yes, definitely.

50:38 - D.
It's a really good meeting this time.

50:42 - M. M.
How many presentations?

50:45 - Y.’s iPhone
Three presentations.

50:47 - M. M.
Three, perfect, perfect.

50:49 - Y.’s iPhone
And there will be some industry people also joining in person remotely.

50:55 - D. B.
Is there a project that they're going to present? Yes, sir.

50:58 - M. M.
The final project. Yes. Perfect.

51:03 - D. B.
All right. OK.

51:05 - M. M.
So for next, everybody, please advertise the talk of L., because we really want to have more people coming. So we'll try our best.

51:15 - Y.’s iPhone
Advertise what, Dr.

51:17 - Unidentified Speaker
M.?

51:18 - M. M.
L. from psychology department is coming. Probably you met her already. So.

51:27 - Y.’s iPhone
Okay.

51:30 - D.
I'm up not this, not this next Friday, but the Friday after next.

51:34 - D. B.
Yeah. Let me make a note of that.

51:35 - D.
So, uh, and then he's going to teach us all how to prompt the more productive.

51:45 - V. W.
I like your guest's suggestion of maybe attending every other meeting to make sure that the factorial channel contention doesn't come up.

51:57 - D. B.
No, it doesn't carry from one to the next.

52:02 - D. B.
You can't skip meetings and then take over the fifth meeting.

52:11 - M. M.
I really appreciate if you advertise next talk and advertise the event. We want to to see more people here coming.

52:22 - Unidentified Speaker
Yeah.

52:24 - V. W.
I think that's another vote sort of for the video format that if we compress all our commentary into a video, we have a reusable artifact that isn't going to go out of date too quickly.

52:39 - D.
Sure. What kind of videos you want?

52:42 - Unidentified Speaker
I'll just choose a topic. I'm like, for example, I'm working on the statistics thing.

52:45 - V. W.
I'm going to make a video of it. I'll play the video.

52:48 - Unidentified Speaker
That's that. We don't have to worry about it generating too much of an obstruction of time.

52:54 - M. M.
This is correct. Okay.

52:56 - Unidentified Speaker
We can, we can use this. I don't mean to sort of tell people they shouldn't speak.

53:01 - D. B.
I'm just saying when somebody else who is not, has not been contributing, wants to say something, step back and let them.

53:10 - D.
Yeah. It's it. It's a, it sounds like you're taking it for the little guy.

53:15 - D. B.
What sounds like what?

53:17 - D.
You're taking up for the little guy.

53:19 - D. B.
That's that's what we otherwise you don't otherwise you don't really have much of a discussion group. It starts to fall apart. It's not as fun for it's not as fun for the people who are in charge, but it's not fun for everybody else.

53:32 - E. G.
My mother used to my mother never finished high school, but she had these little colloquialisms. You learn a lot more when your mouth is shut.

53:44 - D.
Yeah, I know I'm just soaking it up, trying to get all these ideas.

53:47 - M. M.
Yes, it's true. OK, thank you so much.

53:54 - D.
All right, guys. Thanks. We'll see you next time. Thanks for coming, guys. Thanks for sharing.

54:00 - Unidentified Speaker
Thank you. It was good to see you.

finished 9-5-2025Ar ... roup Transcript.txt
Displaying finished 9-5-2025Artificial Intelligence Study Group Transcript.txt.

AIn't What It Used to Be

Friday, September 5, 2025

9/5/25: Discuss D. Susskind lecture; etiquette/management/coordination/dynamics of this group

No comments:

Post a Comment