Machine Learning Study Group
|
- Announcements, updates, questions, presentations, etc.
- Grad student applies, is rejected due to AI-generated statement of purpose (AA):
- TE gave an informal walkthrough of his research prospectus.
- Hackathon: Join
us to innovate in LLM Agents technology across 5 dynamic tracks,
welcoming both innovative agent applications and research projects on
new technologies:
- Applications: Build cutting-edge LLM agents
- Benchmarks: Create innovative AI agent evaluation benchmarks
- Fundamentals: Strengthen core agent capabilities
- Safety: Address critical safety challenges in AI
- Decentralized & Multi-Agents: Push the boundaries of multi-agent systems
Incentives and Prizes:- Prize Pool: Over $200,000 in prizes & resources, including $100,000 in credits for top AI platforms and an additional $100,000 in credits, cash and merchandise.
- Special Raffle: Participate in the hackathon for a chance to win travel grants to the LLM Agents Summit at Berkeley in the Summer 2025, which includes up to $1,000 in travel stipends, and priority summit registration.
- Exposure Opportunity: Winners will have the chance to showcase their projects at the LLM Agents Summit.
Distinguished Panel of Judges
This hackathon represents a remarkable opportunity to push the limits of what is possible with LLM Agents technology, connect with a vibrant community, and potentially win big.
How to Join
To join the hackathon, please visit our hackathon registration page. You will find the submission requirements on the website as well. Don’t miss your chance to make an impact in the AI community!
We eagerly await your innovative contributions and are excited to see how you will drive the future of LLM Agents technology.
Best,
LLM Agents MOOC HackathonJoin the LLM Agents MOOC Hackathon: $200K+ Prizes/Resources Sponsored by OpenAI, Google AI, AMD, Intel and More
Dear developers and researchers,
We are thrilled to extend an invitation to the LLM Agents MOOC Hackathon, hosted by Berkeley RDI in collaboration with our LLM Agents MOOC. This exciting event, sponsored by leading organizations including OpenAI, Google AI, AMD, Intel, and others, is designed to challenge, inspire, and showcase the capabilities of developers and researchers in the exciting field of LLM Agents. We are excited to share that over 2,200 developers have signed up for the Hackathon. The hackathon is currently underway and will accept final project submissions until December 17, in conjunction with our LLM Agents MOOC. The MOOC has ~14K registered learners & ~7K Discord members, with 35K+ views for the 1st lecture to date!
NM has a problem: how to convert a masters thesis, made with liberal help from AI, into a shorter publishable paper, using AI to help but which will not be flagged by AI detection tools like gptzero.com, etc.- We're looking for masters student(s) to work on AI with Hexanika. Stay tuned.
- JK submitted a video to AAAI 2024: video (https://drive.google.com/file/d/1KJNQQU7IfSywljADxkGHMZuDkchbIc38/view?usp=sharing); call for videos (https://aaai.org/about-aaai/aaai-awards/aaai-educational-ai-videos). See also complex prompts, etc. (https://drive.google.com/drive/u/0/folders/1uuG4P7puw8w2Cm_S5opis2t0_NF6gBCZ).
- Here
is a tool the library is providing. Some people here thought it would
be a good idea to try it live during a meeting, so we can do that.
Library trial of AI-driven product Primo Research AssistantThe library is testing use of Primo Research Assistant, a generative AI-powered feature of Primo, the library's search tool. Primo Research Assistant takes natural-language queries and chooses academic resources from the library search to produce a brief answer summary and list of relevant resources. This video provides further detail about how the Assistant works.
You can access Primo Research Assistant directly here, or, if you click "Search" below the search box on the library home page, you will see blue buttons for Research Assistant on the top navigation bar and far right of the Primo page that opens. You will be prompted to log in using your UALR credentials in order to use the Research Assistant.
- DB plans to try to find a masters student to do the project below starting next semester. An important qualification for the student is to be able to attend these meetings weekly to update us on progress and get suggestions from all of us!
- Project description: Suppose a generative AI like ChatGPT or Claude.ai was used to write a book about a simply stated task, like "how to scramble an egg," "how to plant and care for a persimmon tree," "how to check and change the oil in your car," or any other question like that. Just ask the AI to provide a step by step guide, then ask it to expand on each step with substeps, then ask it to expand on each substep, continuing until you reached 100,000 words or whatever impressive target one might have.
- Anyone else have a project they would like to help supervise, let me know!
- The campus has assigned a group to participate in the AAC&U AI Institute's activity "AI Pedagogy in the Curriculum." IU is on it and may be able to provide updates when available, every now and then but not every week.
- Anything else anyone would like to bring up?
- Here are the latest on readings and viewings
- Next
we will continue to work through chapter 5:
https://www.youtube.com/watch?v=wjZofJX0v4M. We got up 15:50 awhile ago
but it was indeed awhile ago so we started from the beginning and went
to 15:50 again.
Next time we do this video, we will go on from there. (When sharing the
screen,
we need to click the option to optimize for sharing a video.)
- We can work through chapter 6: https://www.youtube.com/watch?v=eMlx5fFNoYc.
- We can work through chapter 7: https://www.youtube.com/watch?v=9-Jl0dxWQs8
- Computer scientists win Nobel prize in physics! Https://www.nobelprize.org/uploads/2024/10/popular-physicsprize2024-2.pdf got a evaluation of 5.0 for a detailed reading.
- We can evaluate https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10718663 for reading & discussion.
- Chapter 6 recommends material by Andrej Karpathy, https://www.youtube.com/@AndrejKarpathy/videos for learning more.
- Chapter 6 recommends material by Chris Olah, https://www.youtube.com/results?search_query=chris+olah
- Chapter 6 recommended https://www.youtube.com/c/VCubingX for relevant material, in particular https://www.youtube.com/watch?v=1il-s4mgNdI
- Chapter 6 recommended Art of the Problem, in particular https://www.youtube.com/watch?v=OFS90-FX6pg
- LLMs and the singularity: https://philpapers.org/go.pl?id=ISHLLM&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FISHLLM.pdf (summarized at: https://poe.com/s/WuYyhuciNwlFuSR0SVEt).
6/7/24: vote was 4 3/7. We read the abstract. We could start it any
time. We could even spend some time on this and some time on something
else in the same meeting.
Transcript:
ML discussion groupFri, Dec 6, 2024
1:36 - Unidentified Speaker
All right.
1:37 - D. B.
I'm going to share my screen.
2:19 - D. B.
Okay, so although we're sort of classes are over and we're at the end of the semester, there's actually a lot of stuff that sort of ended up on the on the list of things to potentially get to today. We're not going to get to them all. So if we don't get to it today. We'll get to it next time. Here's one. Someone sent me this note from a grad school. So a grad student applied to grad school, and the AI generated their statement of purpose. And the grad school checked, did an AI, generative AI check on their statements of purpose and rejected the applicant. I'll let you, let's see if I can, Let's go to this. Here's the letter.
3:24 - Unidentified Speaker
So the fourth paragraph there.
3:36 - A. B.
Yeah. Your personal statement was AI generated.
3:39 - D. B.
Therefore, we aren't going to accept you.
3:42 - A. B.
Well, someone a few meetings ago, I think, can't remember what department they were with, but we asked that question about AI checkers and whatnot. And I think the consensus is that they're just not accurate enough to actually do that to a degree that you would want catch plagiarism or do this sort of stuff. So it's interesting that people are taking a more aggressive stance, I guess, than admissions.
4:12 - D. B.
Yeah, you know, I don't know if it's the people that that these systems don't work. I think it's that there's a proof problem. You know, they may work, but you can't use it as proof, you know?
4:25 - A. B.
Right. Well, it seems like they are here at least.
4:29 - D. B.
That's interesting. Yeah. I use Grammarly.
4:32 - E. G.
all the time because I'm trying to get a large amount of information out. And a lot of times I'm used to presenting it very succinctly, but I'd like other people to absorb it. So I ask it to go in, correct the sentence structure and things like that. Does that, would that preclude me? You know, I don't know.
4:58 - D. B.
I mean, you're, you're right. It yourself and then, or maybe you're rough, rough drafting it, and then you're asking it to kind of polish, polish it. You know, you could try one of these checkers, like one of them is called GPT-0. If you just do a web search on GPT-0, it's, it's free, right? You can, you can paste stuff in it and you can see if, if Grammarly will be, will flag, it'll flag that.
5:22 - A. B.
I'd be curious to know, I mean, if you try it.
5:25 - D. B.
Because, you know, I think it's just perfectly natural to write something yourself and then, you know, even a rough draft, even some rough, you know, some notes or something, and then ask an AI to kind of put it into a nicer expository, you know, format, which you can then edit. Is that bad? You know?
5:47 - T. E.
So that's an interesting point. And the course that we just finished up with Dr. P. for I don't remember the name of it, but it's basically we went through the process of writing a research proposal for our dissertation. And her stance on it was she didn't mind us using these tools just to make note of it in the references that we did.
6:16 - D. B.
Yeah. So what course was this?
6:19 - T. E.
It's the course that I actually used to do the proposal. That you asked me about. It was information quality. Let me see if I can find it real quick. Information quality theory? Possibly. Okay. Go ahead.
6:39 - E. G.
I just put an email that I wrote and I ran it through it and it says 97% AI generated. And I wrote, Yeah, 90% of this email.
6:56 - A. B.
But you wrote all of it, right? I wrote nine 90% of the email.
7:02 - E. G.
He wrote 3% of it.
7:05 - Multiple Speakers
Yeah.
7:05 - T. E.
According to this, I only wrote 3% of it.
7:10 - E. G.
I'm just teasing.
7:11 - T. E.
Yeah, I don't know.
7:13 - D. B.
I kind of noticed in using chat and using gbt zero that that I didn't try the numbers it came up with. If I had a paragraph, if I had a chunk of text that I wrote most of it, but a couple of sentences were in fact fully AI generated, it would flag it as AI generated.
7:36 - A. B.
So if you took an output from chat GPT, and then let's say you took that output and then you asked Claude or another one to randomly change some of the words here or there for synonymous words, would that then fly under the radar of these tools?
7:55 - E. G.
Actually, V. showed me a tool that is able to go in and rewrite stuff so it's not picked up by AI generators. V., do you remember that tool?
8:11 - T. E.
No, I don't.
8:13 - V. W.
And I'm curious about that because, Last week, I did a presentation where I pitted the AIs against each other. I thought the results were dubious. I presented them, and now I'm going back over them with a fine-tooth comb. It's been a real difficult process to establish what the truth is. I asked four different AIs how much they cost to develop, how many training tokens were used, and what kind of user input context length is allowed. And they were all over the place. So I actually did a mathematical investigation of just how all over the place they were. And it's, it's, I would say was a little when I got really rigorous with it. It's like the AI is picked up on that. And they they snap to but yeah, I'm one that you mentioned was stealth GPT.
9:09 - Multiple Speakers
But I had found another one undetectable.ai.
9:12 - V. W.
I don't have enough experience with those to really say except for the fact that I find it interesting that you'd write 90% of an email and it would say 97%. So it says it makes me not trusted at all. Because the figures that I was asking it to generate last week, were not accurate. And now I'm doing a fine tooth comb analysis of just how inaccurate For example, Claude Artifacts was publishing that it could program in all these languages when in fact it's specialized for HTML, CSS, and JavaScript. So that's not OK. And I've got to be able to quantify just how true or untrue a statement is. So I'm off on the thing of that, hey, we're using AI so frequently in our work. And also, grammarly, we'll try to rewrite content that I have written and it will use things that are out of voice for me. And so I'll have to reject them saying, you know, I appreciate you changing that word. But for example, I frequently use the phrase with respect to as a comparative statement for comparing and contrasting two ideas. And Grammarly tries to remove the with respect to and use some more flowery language that's not what I meant. And so I have to say, I don't suggestion. So it's like I've kind of gone through a period of like, this wholehearted embrace of anything that I could do to now I'm starting to get pissed off that it's trying to take my my true intention and voice away and I'm trying to claw back my identity.
10:57 - E. G.
I'm thinking of it from a different perspective. We work with a day in day out. What if It's generative language is now become infused in our. Yeah. Is it changing our vernacular?
11:13 - V. W.
Is it changing the way?
11:15 - E. G.
Is it changing us?
11:17 - V. W.
Well, if it changes the way we communicate, it is changing us.
11:23 - Unidentified Speaker
Therefore.
11:24 - E. G.
It is taking over. Oh, yeah. Being assimilated.
11:28 - V. W.
Talk about assimilated. There was an announcement two days ago that, OpenAI has agreed to use its products as war tools for the Pentagon. And I was really disheartened by that because we have all these 439 scary moments in this seminar of which war has come up as one of them. And they were talking about drone swarms for the Pentagon and things like that. And I'm thinking, we've already got mass death in the Middle East right now. We don't need any more of it. It's AI facilitated. And so, yeah, it's terrible. I was really concerned about that. And that made me double down on my factual check. For example, if I'm gonna hire an AI employee that happens to be a programmer, I'm gonna want it to give me some references. And so one of the things I'm gonna do is I'm gonna ask its peers, and its peers are other LLMs that might be possibly employed instead of the one that's the candidate. And so I'm using this kind of would I trust this line of reasoning I'm engaging into to somebody who just came in off the street, who may have been programmed by any nation state with any set of motives, and would I trust this thing to do that job for me? And so this making an AI do it, ask questions about who built you, how much did you cost, how smart are you, and how smart are you gonna let me be with what, because of your input token length. Be the smartest AI ever, but if you limit my input to you, then I cannot take full advantage of you. So I'm treating AIs now as, you know, in potential employees. And now they have to be, give me their references, show me their background, show me how they were trained, show me what they were trained on. And this all came out and I'll wrap it up after this is six out of 10 of the references that I had chat a one mini generate for me. Were specious. And it wasn't till I went and visited every one of them and counted them up and reviewed their content. It's like, you're just spinning a big yarn here.
13:41 - E. G.
Well, I got a question. And this, this can be a topic. But V. brings up a very valid point, you're going to be using AI as an employee. But Are you paying them?
13:56 - A. B.
Oh yeah. Oh yeah. I'm paying them.
13:58 - V. W.
Um, and in chat G open AI wants me to pay them 200 a month, which is a little above my ceiling for a monthly fees. I, you know, I've got all these $20 subscriptions that are killing me. And so I was supposed to go to two, 10 times that I don't know.
14:14 - E. G.
Cause I'm only paying a chat GPT, 20 bucks a month.
14:18 - V. W.
Yeah, but there's an open AI just released a $200 a month version for professionals and researchers and academics. Well, that's pretty much everybody's sitting here.
14:26 - Multiple Speakers
Well, no academic is gonna pay $200 a month unless they're getting a million dollar grant or something like that. Or maybe for the whole lab.
14:35 - V. W.
It seems more than the cable TV bill used to be, so.
14:39 - D. B.
All right, well anyway, that's pretty interesting. Here's the second item on our agenda.
14:44 - Multiple Speakers
So, T. E. is a student in our department and he'd like to give us an informal walkthrough of a positive possible research project, and I told him it was very informal and that we were friendly, so please be informal and friendly. Welcome to the group, T.
15:03 - D. B.
And I'll turn it over to you.
15:06 - T. E.
Well, and the opening topic is a great segue into this research, and there was a few weeks ago, there was some discussion very similar to this, and Dr. B. kicked around an idea of using chat GPT and the Socratic method in teaching in education. So, that's kind of what, you know, I talked to him after that. He's actually my research advisor as well for my dissertation. And so, I talked to him after that meeting. And he was kind enough to say, you know, I told him I was interested in that topic if, you know, I didn't want to step on anybody that was already doing the work, you know, or whatever. And he said, no, take it and develop it and see what you can come up with. So, like I said, for this information theory class, we had to write a proposal for a dissertation as our final assignment, and so I with that and I don't know how you guys want to do that if you want to just walk through it or just give I can give you the high the high points real quick I don't want to take up too much time but basically I think I think Dr.
16:34 - D. B.
B. sent it to everyone I don't think I did okay but you can share your screen if you like and just kind of use it or or just go through a list of...
16:48 - T. E.
Yeah, so basically, we're trying, this research, I'll just read you the introduction. This research proposes an innovative approach to education by combining the Socratic method with artificial intelligence to create a dynamic learning tool. AI-driven Socratic questioning, it's aimed to comprehension and engagement with students across various educational settings, you know, whether it be... Do you all know what Socratic questioning is?
17:23 - D. B.
Yes, questions that lead into critical thinking, where the teacher at that point really doesn't present or lecture.
17:32 - E. G.
What they do is ask questions and let the organic nature of the questions generate more questions and conclusions by the students.
17:43 - T. E.
Yeah, that's correct. And apparently it's very popular in law schools as a teaching method and it's based on Socrates, the philosopher, is kind of where it all started. And so what we're looking at doing is use the, and I think Dr. B. kind of showed us what he was doing, some of the he was kind of playing around with the idea and he kind of showed us the transcript of that that day on the call but some of the things we're going to look at is um the questions I have right now is what extent can ai an ai model replicate the socratic method and emulate human-led questioning in educational context and how effective is AI-driven Socratic questioning and enhancing critical thinking and comprehension skills compared to traditional teaching methods? And thirdly, what are the subjective experiences of students interacting with an AI tutor that uses a Socratic method? Now, having not gone through this process before, but from what I understand, those questions may change a little bit depending on where the research leads us down the road.
19:07 - A. B.
So, in that though, and I'm not a definite expert here, but it seems like you'd have to really like you'd have to do true like IRB because you'd have to have like human subjects, I think. That is, yep, I'm sorry, I didn't mean to cut you off.
19:22 - T. E.
No, you're good.
19:23 - A. B.
Yeah, you're right about that and I was talking to Dr.
19:27 - T. E.
P. about that and she sent me a link. I haven't read it yet. But because it's an education setting, if I'm not mistaken, somebody have to help me out here. Maybe it's not quite as rigorous. I'll have to go and check out the link that she sent me. But that conversation, because when I started my dissertation, I did take the ethics course and all that.
19:59 - A. B.
Yeah, the CBT stuff.
20:00 - T. E.
And they talked about the IRB and I was kind of hesitant, you know, I was trying to, my, you know, my initial research ideas were trying to steer away from having to go through all of that. And Dr. B. kind of said, you know, it is a process, but it's not as bad, you know, it's something that we felt like we could get through.
20:23 - D. B.
So, so I could use, like, if you if you came up with a method for doing this, you know, using an AI to be the tutor, using syncretic questioning, I could use it in my classes. And I'm willing to do that without IRB approval. Now, if we take data from the classes and try to analyze how well it worked, then I don't know.
20:47 - T. E.
We might need IRB approval. Okay. And that's probably the link she sent me to. But from based on the research I've already done I feel like the the more complicated part is going to be coming up with a way to analyze you know we'll have to come up with a set of some kind of assessment for whoever for the students that go through the Socratic method and the ones that don't go through it and compare the results because we want to know, you know, did it and this may be cross-disciplinary where we may have to probably have to get tap someone from education or psychology to help come up with the assessments.
21:44 - Multiple Speakers
If you've taken anything from Professor S. This seems right in his wheelhouse. So I don't know if he's even still teaching, but he's one of the best intellects I think we have in our education department.
21:57 - V.W.
And that's one item. And the other item is J. K is an excellent prompt writer. And I would really be interested in the multi-agent aspect of the Socratic method, because in that method, you're kind of putting the AI in the driver's seat of composing the questions that it thinks are going to lead you to the most enriching exchange between the student and the machine. And since you're doing that, it reeks of multi-agent as soon as you start saying the word assessment, you're asking the AI to change hats from, here's what I did with the student versus here's how I want to analyze what I did. Well, that's at least a two-hat multi-agent process. And J. K has a conspicuous talent for prompt writing from the multi-agent point of view because of his background in character development for writing science fiction and so forth. So he's really good at it. And, you know, we all want to do like, oh, completely own our work. But having the right consultants to our work can really take us up to the next level of achievement. And so I would really like to see you talk with J. K and Professor S. How do you spell S.?
23:12 - D. B.
S-U-T-T-E-R.
23:13 - Multiple Speakers
And J. K was the last name?
23:16 - V. W.
Yeah, J. K. He's usually here and I really am sad he's not here today because he would have been just chomping at the bits to have an exchange with you.
23:28 - T. E.
Okay.
23:29 - D. B.
You know, another possibility is Dr. P. She's actually typically pretty interested in these kinds of, you know, teaching oriented survey type projects. So she actually supervises a number of PhD students who do surveys, not of students, but of people in industry and things like that.
23:49 - T. E.
So she'd be another good person. That's good, because it was her class, and I turned this in for my project, my final project. So if I get a good grade on it, maybe that'll be a promising thing.
24:09 - Y. iPhone
I'm not an expert in this subject matter, but if you need any help from engineering standpoint, I can ask some of my people to help you out. So if you need any help on the engineering side, maybe around 100 to 200 hours of time, I'm ready to offer that help to you.
24:32 - T. E.
Oh, thank you so much. Okay.
24:38 - D. B.
All right.
24:40 - T. E.
So, that's kind of the high-level overview. I ended up doing for the literature review, I ended up breaking it down by, where'd it go?
25:05 - T. E.
We ended up having 20 reviews and for some reason I can't find it on here. There it is. Yeah, 20 different sources that we looked at effectiveness. There's some research that's been done in effectiveness in educational settings, personalized learning and adaptability. Driven and critical thinking support. There's been some research done in that. That I reviewed. Development of AI systems for Socratic dialogue. And so that brought up an interesting point. Apparently, there's some software out there. And one of them is called AutoTutor, which I didn't really dig too much into it. But there was a study done by G. back in 1999 that talked about this AutoTutor. And I don't really know that there would be any need to develop something outside of CHAT-GBT. I'm thinking maybe CHAT-GBT would, in its form today, would be able to do what we're needing to do for the research.
26:24 - V. W.
I would say you would want more than one LLM so that you could compare their responses and get some contrast. And for those, Claude Sonnet would be good because of its, you know, it's really a good LLM. And also, E. introduced us to the Facebook LLM, which has now been supercharged. The Lama is now up to Lama 3, 405 billion token model. So, you know, those, those, if you go through something like Poe.com, you can write a prompt and then you can, uh, shovel the, your prompt from LLM to LLM and get really fast turnaround on a compare and contrast of what they're saying to kind of get, to find out if there is among the LLMs, uh, an expert consensus or more importantly, a contradiction.
27:18 - T. E.
Yep. So you mentioned chat GBT and what was the other one?
27:23 - V. W.
Uh, Claude sonnet. Well, the most important thing I probably mentioned was the fact that Poe.com lets you have access to multiple bots of very high quality, including the Gemini advanced engine. And that's Poe.com. It's 20 bucks a month.
27:41 - T. E.
It's the same price as a chat GPT subscription before last week.
27:46 - V. W.
And so you can just go from LLM to LLM very quickly and then The Llama 3, 405 billion token model. And then of course, chat GPT has now gone from 1.0 preview to 1.0. And to get at the 1.0, you have to use the chat GPT site because POE doesn't give you direct access to that model, but it does give you the access to the chat GPT 4.0 level of models, but who wants to use yesterday's model? So, yeah. Okay. Good, good information.
28:17 - T. E.
I got, I made note of that so I can look into those. So that's kind of, again, this high level. Some of the other things that I think are important would be getting feedback from the students to see how they felt about the process using artificial intelligence. If they felt like it was too slow, you know, in responding, or if they felt like, you know, it was a natural conversation or, you know, I don't know. Again, I would have to probably speak with someone from the psychology department or education department on how to come up with those assessments and evaluations.
29:10 - V. W.
You could ask an AI.
29:12 - T. E.
And we're back to E.'s seg.
29:15 - E. G.
It's called key think times because your key think time at that point is your gap in digesting the information, formulating a response. Now, just like AI, it's going to be based on the back end computing power. Somebody with a much more powerful brain will be able to come off with a more detailed, deeper answer and understanding than one who is, say, running on a Visa card. Not Visa, V-E-S-A, which is an old, old video card.
30:08 - T. E.
Oh, okay, okay. B-E-S-A, okay, yeah, yeah.
30:11 - E. G.
But what I had done previously is I had actually taken the output from one model, ran it through the same model again, and asked it to identify whether or not you were hallucinating. This is actually some of the stuff D. and I had postulated. But it was able to identify to say, I know this information. I surmise this information. Uh, I extrapolated this information and I asked for information.
30:44 - V. W.
It's sort of like asking a crazy person if they were making something up when they talked to you in the last conversation. And I've, I think one has to have a lot of skepticism. I really liked that going to psychology. You're, Your topic is so timely, even it's on trend. It's got so much potential for harnessing the educational connection to use of LLMs by students who want to improve their own understanding of things. So it sounds super interesting to me. I would really be curious to follow this work.
31:22 - T. E.
And I think it hopefully it could lead into like answer some of the questions that we had at the beginning of this call where like this administration that rejected this student's use, rejected their application because they detected AI usage.
31:40 - V. W.
It's like asking the crazy person, you know, did they forge this? Yeah, they forge everything.
31:48 - T. E.
Get out of here. So maybe, you know, rather than just rejecting anything, maybe we should, and I think this was an idea that we kicked around last time I was on this call, come up with ways to embrace AI and use it in teaching and learning, as opposed to just saying, hey, you used AI, so we're going to kick you out and fail you. And that's not where things are going.
32:18 - Multiple Speakers
Yeah. Yeah.
32:18 - T. E.
And I believe there was a conference that someone on the last call had attended. And this was the topic of that conference as well. I think it may have been someone from UCA or possibly, I don't remember what her name was, but that's kind of where this research is going and we'll see what happens. This is Y. and I agree. I concur with that.
32:47 - Y. iPhone
And one of the ways I like to give back to society is in the form of education. That's why I would like to help or contribute in any ways if you need help. And what I can help primarily is on the engineering side, on the back end side, because the front end topic is not my expertise. So happy to help you out in any ways possible.
33:18 - T. E.
Yeah, I appreciate that.
33:19 - D. B.
That's great. Well, with that, I'm going to come back to this hackathon announcement, but let's segue into, where are we? Oh, okay, yeah, so Y., I had a chat with Y. earlier today, and so we're looking for master's students to work on AI with his company, Hexanica, so just mentioned that. Y., I don't know if you wanna just say a few words about it, or at any rate, there'll be something to say certainly next semester. Sure, so I can give brief background.
33:58 - Y. iPhone
The main topics that we mentioned, and which is actually a follow-up of what we spoke on the previous couple of calls, is around use of generative AI in code development. So if you all remember, Dr. W. had presented a chart of various GPT models versus coding standards. And I had also mentioned that we'll present something, but it appears that most likely it will be in the first or second week of January when we meet. I'm sorry, I'll try to do before the year end, but my team feels more comfortable presenting it later. But anyways, There are two areas that we identified. One is how to use generative AI right from conceptualization, which Dr. W. had as text to code, then code to code, meaning unit testing, integration testing, right up to user testing and production. So in the development world, we call CICD pipeline, my team has already tested few things which we will present we have created standards documentation which we are signing off literally later this week or over the weekend and starting on implementing that next week and and the students that we are talking about with essentially be trained on those guidelines documentation and etc and they will start using generative and guide that is the guide in version one, but there will be two sets of team, D. and I mentioned. One is people who can contribute to engineering and a team of students who may not be core engineers, but they can follow the standards and see whether whatever we have built is working properly or not. So that is one set of things. And the second set of work would be using generative AI for website or content, D. had a plan of look, but around that topic, like how we can use generative AI to automate that function. So I've sent maybe a one pager to D. earlier, but if you all want to know how it is going to work, when we present what we have done, we can present also what we'll do with students then is
36:43 - D. B.
that enough or do you want me to say anything more or do you want to add or remove anything what I said oh that's good unless anyone has any questions anyone have any questions about this so yeah so I'm gonna be looking for master's students who have projects to do and and we'll want to work on this okay I had a very productive exchange on how we do double blind testing We're really wrestling with this problem of how
37:13 - V. W.
do we evaluate a piece of code that an AI has generated when the person who was prompting the AI had their own prejudices or biases about what the code should do or look like or what it would be good at. And so we talked about the possibility of creating like two experiments we kind of go off and don't inform each other except for some ground rules of what we might like to see. And then we come back and we ask, how well did the various LLMs do at meeting our program specification? And this goes right into the work of D. D., who we don't want to, we want to embrace that however it fits in. And so this idea of coding to a specification and then judging how accurately the specification was met came up when I did the side-by-side comparison of the performance of all the LLMs on all the different languages, which in fact turned out to be not complete fiction, but enough of a fiction that it needs to be studied in a little more rigorous detail.
38:28 - D. B.
Okay. Yeah.
38:28 - Y. iPhone
And to that point, the presentation would be around our assessment on a few of the models that Dr. W. had on that chart. And only one vertical for now, which is around front-end. I think there was a section of HTML, CSS, and JavaScript. Having said that, when we present, we'll also mention why we chose Next.js versus JavaScript. There could be some amendments. But we will present our assessment on one vertical and three models. And later on, we are also going to do on Python. But that's what we'll present.
39:17 - Unidentified Speaker
I'll try in maybe two weeks. But if you're not going to meet over Christmas break, then maybe it will be in January.
39:27 - D. B.
I haven't really thought about when not to meet. When to meet. We'll certainly meet next week. And, you know, I don't know if Christmas is on a Friday or something, I probably wouldn't meet. But other than that, you know, as long as there's participation and so on, people want to meet, we can keep meeting.
39:47 - Multiple Speakers
This is what we do. Yeah. I mean, this is not really a UALR sanctioned activity. It's just us, right? Okay.
39:55 - Y. iPhone
So I just wanted to mention, I got notice of this hackathon invitation.
40:00 - D. B.
Yeah. So if you're interested in hackathons, you might want to check out this today's minutes and read about this hackathon invitation. Build cutting edge LLM agents. $200,000 in prizes, distinguished panel of judges from like UC Berkeley and Google DeepMind and OpenAI and everything. I sent this to J.
40:28 - V. W.
as well as the CS mailing list. And when I read through it, it just gave me a sense, this is something he could walk in there and just show some of the work he's already done in learning tools and just, you know, maybe grab a prize or two.
40:44 - D. B.
Well, I hope he does. I hope he does too. Let us know what happened. Anyway, he can join by clicking this link. And so can you. Anybody. So yeah, any other comments on this? Let's see, submissions closed December 17th, another 11 days. Thank you. Okay, what else? J. sent, he sent around, I guess, I don't know if you all got the email that he sent with his submittal to the AAAI's track on videos or something. And if he was here, I'd show the video. It's like a two minute video. He did this, it's pretty good, but I wanna wait for him to actually be in a meeting to do that. So we'll probably do that next time. I have a student, he just got his master's, just completed his master's thesis. He used AI liberally to make, to help write the thesis. And we'd like to turn that thesis into a shorter publishable paper using AI to help shrink it, which AIs can do pretty nicely. But the hard part is we don't want, we wanna come up with a good paper that can be published, but won't be flagged by AI detection tools.
42:15 - E. G.
So- That's what we're talking about, beginning of this meeting.
42:20 - Unidentified Speaker
Yeah.
42:20 - Multiple Speakers
Lather, rinse, repeat.
42:21 - D. B.
Just put it through the AI detector till it says you wrote it.
42:27 - Multiple Speakers
Yeah. So yeah. Or one of these AI tool, these tools that purport to turn a AI generated text into make it non-detectable.
42:37 - Unidentified Speaker
Um, yeah. Anyway, I wish he was here. We could, uh, um, Dig into that. So anyway, I told him, just start working on it and we'll see how to come up with the workflow that'll make that happen. Okay, well, let's see. Anything else anyone would like to bring up? There's a couple other things which take more than a minute.
43:16 - D. B.
We'll talk about them next week, probably. Hearing nothing, I guess we can go ahead and adjourn and meet again next week.
43:27 - V. W.
Thanks for the meeting. Yeah, awesome. Thanks. Take care.
43:32 - D. B.
Bye, everyone. Bye now.
12-10-24-complete.txt
Displaying 12-10-24-complete.txt.
No comments:
Post a Comment