AIn't What It Used to Be: 11/29/24: Free discussion

Machine Learning Study Group

Welcome! We meet from 4:00-4:45 p.m. Central Time. Anyone can join. Feel free to attend any or all sessions, or ask to be removed from the invite list as we have no wish to send unneeded emails of which we all certainly get too many.

Contacts: jdberleant@ualr.edu and mgmilanova@ualr.edu

140^th meeting, Nov. 29, 2024

Table of Contents

* Agenda and minutes

* Transcript (if available)

Agenda and minutes

Open discussion today!

Transcript:

Fri, Nov 29, 2024

3:18 - D.B.
Hey, everyone. Hello. Hey, so I'm leaving for the airport. So I'm just letting you guys run it. Bye.

3:30 - D.D.
Have a good trip.

3:32 - J.K.
D., do you have a good Thanksgiving man?

3:38 - Unidentified Speaker
No, not really. I was sick.

3:43 - J.K.
Oh, dang. I'm sorry, man.

3:47 - D.D.
I didn't get any turkey.

3:51 - J.K.
Nothing.

3:52 - Unidentified Speaker
Dang.

3:53 - D.D.
That's bad. Yeah, I'm sorry. Yeah, my wife didn't even make me a plate.

4:02 - J.K.
She went. Oh man, yeah.

4:04 - D.D.
They said I didn't see anything there that you'd like,

4:08 - J.K.
I'm sorry, man. Yeah, it happened.

4:11 - D.D.
I'm I'm feeling a lot better today.

4:14 - J.K.
Do you know where you had just like a stomach bug or no?

4:19 - D.D.
I mean, it could have been a cold. I mean, I don't know my chest. I still have kind of a lump in my chest, but my throat and it's not all itchy and scratchy anymore, but I don't know. Yeah. I didn't, I didn't, you know, I didn't have a favor, so I wasn't too worried about it. Yeah.

4:41 - J.K.
I just isolated myself. Yeah. That's still not fun, man. I hope you can, I hope you can make up for it with like a big Christmas dinner or something like that.

4:51 - D.D.
We'll see. So did anybody prepare anything? I did not. I was unwell and I have two papers out there right now. So I'm really busy trying to get camera ready stuff and prepare presentations and all that.

5:13 - J.K.
I didn't exactly, I wasn't able to follow the guideline for using like multiple agents to create something. But in the chat, I did share a prompt that I made from this past week. I call it Super Chat OS. I specifically designed it for chat GPT, but it'll work with any of them. And it's designed to give the user like advanced functionality. And if you look in it, it's mainly, you can paste it in the beginning of a conversation. I also included a version of it that you can put in the custom instructions in chat GPT so that it's persistent across every conversation. But I'm pretty proud of it. You can generate a knowledge base that works for the rest of the conversation. And it puts like different skills and different knowledge into the assistant. You can also, I think the coolest functionality of the system is you can do what's called expert chat, which it simulates a super user or someone who's really good at prompting talking with the assistant about your topic. So like a lot of time, I don't, I don't know what questions to ask, or I don't know, I don't know what prompts to write to learn more about something. Uh, and so. It's, uh, it's able to, it's able to simulate a conversation between someone who's better at me better than me at prompting or the other, the other really cool simulation that's in that prompt is called group think. And so if you type like groupthink chain prompting what will happen is it'll create a panel of experts on whatever topic you're discussing and then they will have a conversation about whatever you're whatever you're trying to learn. So to me to me it's just like I wanted to create a prompt that made anybody into a like chat GPT super user. And so definitely, definitely give me feedback. J. Yeah.

7:49 - D.D.
Did you say that you put this prompt in chat? Yeah, I'll reshare. Yeah.

7:55 - J.K.
So if we weren't in the meeting, then yeah, we don't.

7:59 - Multiple Speakers
Oh, I gotcha. Okay. So there.

8:02 - J.K.
Yeah, I put it in. I put it in again. But yeah, it starts with it starts with the user guide on how to use it. And then there's the main prompt. And then there is like the, it ends with the, the really token conscious, I think it's only like 1500 characters that you can put in for custom instructions. But, but, but yeah, I just wanted, we have all these discussions about just like, things, things that we're exploring within chat GPT. And I think this hopefully, kind of makes it possible for anyone to have really productive conversations with chat GPT. That is super cool, J. I'm really excited to take a deeper look, a deeper dive in that and try to hybridize my monolithic prompt approach with your multi-agent approach and see if we can come up with something there.

9:02 - V.W.
Thanks, man. I appreciate it.

9:04 - J.K.
I have a couple of slides I made for the meeting.

9:08 - V.W.
as I promised to share that took me a different direction than I intended to go. And when everybody's had a chance to say anything they need to say, I'd like to share it.

9:26 - J.K.
I'm ready. Yeah, I'm ready. I'm looking forward to seeing them.

9:32 - V.W.
Can you see my screen? Okay.

9:36 - J.K.
Yes. Okay.

9:37 - V.W.
Of all the opuses, this is the most minor of opuses, so it shouldn't really be called an opus at all. But last January, over a year ago, almost two years ago, I wanted to make an index of what tools were being launched on the scene in the following areas, text to text, text to code, text to image, text to audio, text to video, yada, yada, yada.

10:02 - Unidentified Speaker
And, you know, you can take the Cartesian product of those media and cross them with each other and generate this.

10:09 - V.W.
Here are things we can do with AI. And so when I went at earlier in the week, when I went to try to extend this list to support all the growth that I expected in the number of these tools, I thought it would be exploded. And so for example, for text to text, I had 24 January a year, a year before our previous January. But it turned out not to be the case. It turned out that there had instead been a consolidation of the industry around these four. And if you have a fifth, I'm happy to include it. The thing that surprised me was that users have settled on using these things and they've settled into patterns. And this is similar to what happened in the auto industry when There were something like 2000 car makers in the year 1920, and then those consolidated to the big three in the 70s and 80s and so forth. And so I just find it fascinating that this occurred and that instead of me having to do 48 text-to-text transformers and review how they should be used and all their nuance, it has instead simplified itself. To something that's closer to my heart, and that is text to code, because I really wanted to know which tools I should be choosing, given my limited time and limited attention. Where should I spend my time? And I initially developed this slide without the Gemini Advanced being in the lead line. And it showed Cloud Artifacts as being really a high performer. But then I thought, well, I don't want to leave Gemini Advanced out. And then I ran into some because if you ask each chatbot how much the other chatbots cost to develop, and what their context lengths are, and what their training set size were, there is a disparity of answers, especially when it comes to development cost. I found that there were 50 billion, 5 billion, and 0.1 billion cost estimates for the cost of training chat GPT-4.0. But then I was able to scale the calculations by the number of tokens that were being processed, the size of the training set. And I was able to come up with little closer figures that are maybe within 20% if we're lucky. But as we go across this graph, we find out that JavaScript and Python, fortunately, are the programming languages that are producing the best results in terms of accuracy with regards to some sort of prompt that the user has. And things like Visual Basic, and even poor lonesome C aren't faring quite as well, especially when we look to the off-brand code generators. So I thought, well, if I have to advise someone, what should they choose if they have limited time and limited resources to try to write code more efficiently? What should I do if they have a specific language preference? So this is all the major languages that are in play right now. I think TypeScript, which is T-script in the rightmost almost column, it shows a 94% with GitHub copilot. So that indicates tight integration and TypeScript was used to produce one of the most exciting demonstrations that I ever saw for introduction to artificial intelligence. And that is the TensorFlow playground. So I think TypeScript is still in the running because it's just a super set of JavaScript maintained by Microsoft. So our, family-friendly heroes, JavaScript, Python, and even Java, along with our other favorites of Java, C++ and C Sharp, are staying in the mix. And the trouble is, is that I have a lot of experience with using ChatsGPT 1.0 and 4.0 and CloudRFX for programming and some experience with Codium, but I haven't really had that rewarding of a journey with Gemini Advanced. And Gemini Advanced, Advanced claims, if you ask it, how many tokens it accepts in its context link, it claims a million tokens, but that's only for a specially privileged subset of users. And for the rank and file, like the rest of us, they claim 128 token links. I don't really know if that's true either, because I found that Gemini Advanced tends to poop out on me after a relative, it'll promise all these things that it could do, but then it won't do them when you say, I'd like you to do all of those. So there's something still suspicious in the mix. And, um, that's that I wanted to do the, uh, image generators, which we see in this first line. And, and this, by the way, is also old and there has been some consolidation here, but, um, I didn't have time actually took a Thanksgiving holiday. So that is the, uh, long and short of what I wanted to do. And are there any questions? Are you going to share these slides with us?

15:23 - Unidentified Speaker
Say what?

15:23 - D.D.
Are you going to share these slides with us?

15:26 - V.W.
Sure. Well, since the meeting was recorded, I think I just did. And if you need something more specific, I'd be happy to. I'd like that page right there.

15:36 - D.D.
Why don't you just screen grab that puppy or I'll do it.

15:40 - V.W.
I can probably do that.

15:42 - R.S.
It's been a long time. If you could summarize this like in one or two sentences, what would those sentences be?

15:52 - V.W.
The first sentence was that any of us have this opportunity to drive a $12.5 billion Porsche around the programming block, and that we should probably be doing so with all resolve. There's been so much money spent on making code generation more accurate, more fulfilling of what we actually want, that I think we should all be in the habit of using it. And so that's the first thing. The second sentence I would say is that the cost of developing these text-to-code generators has been the equivalent of paying 124,600 software engineers $100,000 for one year. And so this kind of looks, for people who are into financials, this gives us our cost amortization of what we have to extract this newfound value in order to justify its development. And I think portends an important criteria for AI, and that is only the big organizations have the resources to actually train these LLMs at scale. And for any of us to presume that we can do that is fallacious. On the other hand, we know with fine tuning, we can do transfer learning and get things going. But I think that there's this line, and I've certainly encountered that, where the complexity of our problem would prefer that it to be in the original training set and not as an add-on, say, in retrieval-augmented generation. So those will be my summary sentences.

17:28 - Y.i.P.
Dr. W., this is Y.

17:31 - V.W.
Hi, Y.

17:32 - Y.i.P.
Thank you for sharing that. I did take a screenshot. I have a few questions. So what this is saying is text to development. And I'm assuming you're using some research data to create this, but not in reality, people have actually tried to use it. Or have you actually tested any of these components or any of your students have tested any of these components to prove that, for example, JavaScript is 95, really at 95%. And if not, do you have any plans to do that?

18:18 - V.W.
Here's what I did, and my methodology was somewhat stilted, and I appreciate the question. I asked each of these LLMs what the statistics were for their performance in these language areas. I'm a little bit suspicious that Gemini Advanced may be inflating its figures, because when I asked Gemini Advanced alone what its development cost is, it said $33 billion. But when I fell back to the other four sources that I was using, they had much lower estimates of what it costs to deploy. Also, with OpenAI, I asked them the same question, give me the statistics, and when you give me the development cost, give it to me for the whole inclusion. Of Chat GPT 3, 3.5, 4, and the derivative ones we're using now, because they really have to include that since those generational tools were not suddenly restarted from scratch. So in terms of my personal experience, I have pretty strong experience with Chat GPT 4.0, Cloud Artifacts, and Codium, and I tend to believe the figures for them. One thing I'd like to see done is I'd like to see someone take this table and produce a coding example that's more complex than Hello World, but less complex than a full-on deployable app, and actually ask for their prompt that would be properly vetted through the J. quality control pipeline, for their prompt, however long it is that will still fit in the context, how close a result would they say they got in grading the usability of the code? When I presented last week, I talked about how many shots it took to get usable code. And I think that this graph is partially a reflection of that, that you can not only look in terms of zero slot accuracy, but how long are you going to have to fool around with the LM to get the

20:18 - Y.i.P.
actual code for what you're trying to do? Got it. So I'll tell you why I asked this question. While we speak, we are in fact, in my company, we are using, if I can say, it's not text-to-code necessarily, but even code-to-code. Yes. And meaning that...

20:38 - V.W.
Oh, rewrite this code for me and make it better.

20:43 - Y.i.P.
Exactly. Or we have peer reviews where we are actually doing integration testing or all kind of testing using the code.

20:54 - V.W.
And even for that, I'm not getting that high.

20:58 - Y.i.P.
I see 97% somewhere. You see it on Gemini Advanced, which so far is the most suspect of the lot. Yeah, and I have not used Gemini as yet. So I'm actually intrigued and I'm definitely going to ask my team to use it. But my son used the Gemini today for maths, but that's a different topic. But I am actually volunteering, I do have a couple of people who are doing it internally. But if you want, and I'll choose a couple of software also, I do see HTML CSS, JavaScript as one category, if I'm not wrong, that is one category.

21:47 - Multiple Speakers
That is correct. Yeah.

21:49 - Y.i.P.
So and we are using React.js on the top, and then there is HTML, CSS, and JavaScript. So my team is actually working. And if you have any students, a couple of students, I know last week also we spoke about collaborating on another topic. But if you want, I am very much interested in choosing Python and JavaScript, those two areas. And testing across at least first code to code and then text to code. Now, once we go through those two, because I'm actually building script, then we can go to the others, because we are also building an algorithm in Java, but that will start in March or April. But these three, I'm happy to actually go through it and test with the real application that we are building. And I have challenged my team to actually use it at least for testing or improving the code at this point of time. So I just wanted to bring to your attention, but this data is extremely helpful. And thank you for presenting. And I think we can do take step two, if you're interested in this.

23:05 - V.W.
I'd really like to see the results of that. And I would invite you and your people to present that because it'd be enormously helpful for those of us who have limited time to spend, but there are things that we have to get done. For example, some of us have to rarely write an SQL interface for work that we're doing. It's a one-time job, one-off, and we'd like to know which tool we should use to generate our SQL. So this kind of gives us a guideline that Cloud Artifacts might be a good starting point for that. Less of us are doing Rust, Swift, and Ruby, but those are still important things to consider for people who are trying to deploy apps on various platforms, whether they be servers or little computers or Macintoshes. So I'm just, I saw a demonstration this week of a pendulum, which was a two arm pendulum and a two arm pendulum is famous for after just a few periods of oscillation starting to become chaotic in its motion. And the person had taken all the cases for the initial values of the arm positions in the two link pendulum and they had made basically a spreadsheet with theta one and theta two, the starting angles, and then they had let the simulation run and they had noticed that values that were nearer to zero tend to stay regular for longer, but those who flip through a full two pi radian oscillation became chaotic quickly on the fringes, but eventually the chaos crept towards the center. Well, I kind of had visualized if someone could take this chart and zoom into it, morph, and just show, for a simple example, if it's true or not, and the degree to which it's true, in several areas. One is how many shots it took. The second is how good were the graphics. And three was maybe how pretty the code was, subjectively speaking, where pretty is a measurement of expandability, maintainability, and so forth.

25:06 - Y.i.P.
So, yeah, I would really invite that.

25:08 - V.W.
I think that all of us are pinging off of each other and getting ideas. And that would be a great way to keep this synergy going is to just take turns. It's like, you know, if you're in a, you never want to be the best musician in the band. You always want to see somebody else do a great solo that's, you know, really improves and provokes the band to great, you know, achievement. And so I'd like to see that for us too.

25:32 - Y.i.P.
I'll tell you some background in a couple of minutes that my team is doing at these two and we use SQL too. I forgot to say that we use Python and SQL alternately depending on whether we are going to use AI libraries or not. If we are not using AI libraries, we build things in SQL. But I'll tell you a couple of things we are doing. One is before even we use it, we are actually firming up our coding standards and coding methodologies and coding structure. What I mean is, for me to say that my Python code is 97%, 97% of what, right? Exactly. So that is the first thing we are firming up, especially when it comes to HTML and JavaScript. And what I feel is, and we are using by the way uh gpts uh to build that right and I missed the first presentation I have some questions on the the multi-agent thing but I'll ask later so that is the first thing we are doing um and perhaps when we meet the following weekend or whenever we are meeting I'll present that to you and then that will that will be our yardstick and then our measurement will be against that yardstick, but there will be a version one of those standard methodologies. And then there will be version two, because we want to improve that yardstick as well, which will essentially become the Bible of, hey, this is the best thing and the most effective and efficient. So that will also become version one or what you call first test, I mean, second or whatever.

27:24 - V.W.
And you, you, these, these, so let me interrupt you because you're so on track. There's an Oak Ridge National Laboratory has a set of benchmarks that every time a new machine, a new CPU, a new GPU come out, they run the benchmark on the new device and they put it in the log and tell you its performance. And it's become a very famous resource that's lasted years and years and goes way back to almost the dawn of time of numerical analysis. And we need a similar thing for evaluating the efficacy of these text-to-code and code-to-code generators. So, if you were able to put something like that together, it could be one of those things that it becomes additive over time. It becomes a product that can be rerun and rechecked when necessary, when new standards emerge, but can form a baseline for what our expectations should be. And I've noticed that people who choose to go into the benchmarking activity tend to have long lifetimes in terms of their value to computing society.

28:27 - Y.i.P.
It's like something to keep a mind on. Got it. The second question actually I have is, now that I told you the languages, when it comes to model, in the interest of time, I'm assuming that you would say, although I know Gemina is constantly advertising on my YouTube channel, all Google channels, somehow I'm seeing Gemina advertisements nowadays.

28:55 - V.W.
But I was forced to download Gemini app.

28:58 - Y.i.P.
I don't know whether you all know there's a new Gemini app on iStore.

29:04 - Multiple Speakers
Yeah. And I'll tell you something right on that question for apps is I see no reason to have a user PC resident app when all that functionality is available in the browser.

29:16 - V.W.
Because when you're in the browser, you're already connected to a large number of potential mashup members for whatever doing. And it seems that coming out of that into the app is unnecessary overhead, but there's a proviso on that. And that is if you're doing proprietary work, being in the app may confer on you a greater degree of privacy in the research that you're doing. So if proprietary work or patent work is important to your intellectual property work, there may be an advantage to using the app. But for me, you know, a pure academic, a token academic, as it were, I like to stay in the browser for as much of the work as possible.

29:59 - Y.i.P.
Got it. Thank you for that. So now I'm trying to relate the previous topic to this topic. When the previous gentleman I had missed, I think the previous discussion on the topic too. When you say multi-agent, are we talking about these different models and within those different models? Can I do run a code across all these models which has different agents at the same time? Is that what the previous point was? Three thoughts. I don't know if I'll recover them all.

30:34 - V.W.
One, I want J. to speak to this. Two, multi-agent is usually within the context of a single LLM. Three, A. N. just published yesterday that there is a new tool available from Stanford that can jump across multiple agents within the same context, provided you have the authorization tokens for that. So I hope that answers your question.

30:56 - J.K.
Yeah, I'm excited looking at this slide from a multi-agent perspective, because correct me if I'm wrong, the accuracy ratings that it's giving you is kind of for zero shot, right? Like where I say I need this piece of code and then it produces it with that level of accuracy.

31:17 - V.W.
Yeah, that's the presumption.

31:20 - J.K.
Yeah, so I personally don't write code, but I've had situations where I've needed, specifically Python, and what I do is I will put one agent into a chat GPT conversation and say, you are the world-leading Python developer. You produce, it's really, it's really funny to see how much better responses get when you simply say, you're the world's leading or you're a world down. It is shocking how that changes the output. But in that same chat, I'll also put a QA agent or a project manager agent and have them work together. I mean, the thing that I like looking at these numbers is, just once you have multiple agents, one checking the other, one can even be a junior developer and the other a senior developer.

32:24 - Multiple Speakers
All these numbers go to 100% whenever you consider the ability to have multiple agents checking one another and contributing.

32:33 - V.W.
In fact, this slide was produced using that exact technique because I got these figures and reintroduced these figures to multiple LLMs before I even approached the confidence that I had something that was presentable. And I still have a lot of there's still a lot of wiggle room in these numbers, if we pointed out. And I'm so tickled, J., every time you say you don't program, because the things that the thing that LLMs have enabled us to do is to program in English. And you're one of the best programmers in English that I happen to know. So every time you say you're not a programmer, it tickles me because you're programming at the highest possible level of human achievement.

33:13 - Multiple Speakers
And I appreciate your humility.

33:15 - J.K.
It's fantastic. Well, I mean, the thing, the big thing, the big takeaway from just looking at this at this graph is I think what we're going to see within the next couple of years is like we all have limited amount of time to learn things and apply them in our lives. And I think what we're going to see is the return on investment of learning how to manage a multi-agent team that has access to all of these languages. I, as a non-technical non-coder, I benefit more from learning how to manage a team of coders and basically be able to write in any of these languages, compared to trying to learn Python for the first time, which I've done in the past, it really gets to the point where we ask ourselves, is it worth learning a programming language from scratch? And especially I'm thinking about just like kids in coding boot camps and things like that, We are a generation on a precipice with respect to the question we just articulated.

34:39 - V.W.
Yeah, we have an avalanche of people who are literally having to make this not make versus buy. Well, it is kind of a make versus buy decision. Do I utilize the expertise in the LLM or do I try to go it on my own?

34:54 - J.K.
And that's ridiculous to go it on your own. Yeah, it really like it's going to get to the point where I think and this is this is uncomfortable for me to say, because I mean, I've, I've accepted and I've been humbled by chat GPT in that I've done marketing for over a decade. And I can sit down and have a 10 minute conversation with chat GPT. And it's, it's capable of doing more and, and producing higher quality marketing plans and assets and things like that, in 10 minutes than I then I could produce having 10 years experience.

35:29 - V.W.
So I'm having the same experience in articulating ordinary correspondence that I want to clean up.

35:34 - J.K.
The ability to do sentiment analysis and then reflect the better sentiment in the work.

35:39 - V.W.
There's even an Apple commercial about somebody who writes a letter and then they're all mad and stuff and then they have Apple clean it up and then it's all nice and it's much more effective because it's more honey, more ants with honey than with vinegar.

35:55 - J.K.
So yeah, I'm totally on board with that.

35:58 - V.W.
Here's where I think going to have a loss though. This whole avalanche of people are going to just sit on top of the LLMs. But there was a generation that grew up being able, and especially in rocket science, being able to do back of the envelope feasibility calculations. So if somebody came to them with some wild harebrained scheme or theory, they could quickly with their back of the envelope mental calculations, do feasibility analysis and say, I think that sounds specious, or I think that sounds possible. And so we lost the back of the envelope people a generation ago. And now we have the back of the envelope coders who, because they've been through so much integration and errors and lost weekends for a semicolon, et cetera, that they have a real bare metal view of what's possible and not possible. And so when we're doing complex things, sometimes when we're just living up at the mashup level, we can be invoking enormous amounts of unnecessary complexity to do relatively simple things. But then you can always say, well, if I can just throw another couple GPUs at it, who cares? And that's where we are now. And so it's going to be interesting to see in a generation or two programming wise, if there's even still a notion of programmers could become blue collar people like buggy whip manufacturers were in the automotive revolution. I mean, that's a terrible thing to say for those of us who have spent their lives trying trying to become decent programmers, but we have to acknowledge that possibilities there.

37:30 - D.D.
Think about the possibilities of the jobs out there, people that can come in and make things more efficient. Right.

37:38 - V.W.
Because we now know that at least for 124,000 software engineers, we don't need their services any longer. And what is that going to mean? Because Google made a huge tactical error that just as things were taking off with their large language model work with Bard, they let a lot of their AI staff go thinking they're not going to need as many programmers. And given that they were trying to deploy to Google Home and Google devices and various sorts of, you know, outreaches of these technology, they kind of made the worst possible management decision because they should have hired more people rather

38:16 - Multiple Speakers
than letting go as a cost-cutting measure of those really good people they had that are already trained, already had desks, already had computers.

38:23 - V.W.
So I think this premature loss of labor forces is, you know, and we've been through this before. We went through this in the early 2000s at the telco shakeup. You know, there were all these telecommunications companies trying to be first to give us that last quarter mile on that fiber optic to the curb and all that. And then it shook down to like the three or four major players that we all know and see their ads today with an occasional intrusion by a movie star buying their own company, but you know what I'm saying.

38:54 - J.K.
I think what we're going to see is, I mean, we're talking about people's skills being deprecated on some level, but I think it's going to have a flattening effect where, if I'm a Python developer, why would I try to work a company when I could simply spin up a multi-agent team that complements every other aspect of my skill and I'm basically my own dev shop. And I've got an answer for that.

39:26 - V.W.
I want to give it to you real quick and then I don't want you to lose your train of thought because I'll lose mine if I don't say this and I apologize for that. It's all about distribution channel and if you don't have a distribution channel you can be off in the corner playing your violin with the best symphony ever heard but if nobody hears it, it won't matter. And that's the problem we're starting to see now is we've got individual developers who are doing superlative levels of development that don't have a distribution channel. They don't have a contract with the universal or an MCA or somebody to distribute their work. And because of that, it dies on the vine. And so do they, because they can't make their rent. So I think that distribution channel is going to become very important and that's going to create a third problem. And that is that if you're the, if you're the programmer who has a hundred K two token, or a million token input for Gemini Advance, and your competitors only get 128 context links for their token inputs, the million token guy's going to win out every time. And it seems to me that the way things go, that to him who has much more will be given, and to him who has little, what little he has will be taken away, to quote the proverb, we're going to see this kind of aggregation where we'll see the super programmers and everybody else, and it'll become kind of a kingdom of fiefdoms or a little, there'll be the surf class and the Lord class. And, you know, fortunately these LLMs have been put out there, but if you look at the LLMs that died on the vine, a lot of them were individuals or people who got say less than $10 million in capitalization and they were, they ran for a while, but they just couldn't compete with a hundred million dollars or a billion dollars of capitalization. They just couldn't. Because it was a stroke of a pen.

41:12 - J.K.
Well, and I like what to your point about distribution channels, like I'm, I'm kind of reminded of the really awesome presentation. I think it was some of F.'s students who had worked on a tool that kind of auto generates stories and videos and things like that. And someone said like, cause again, they knew that I'd done marketing before and they were like, well now we like, we need help marketing this." I was pretty point blank. I was just like, you can build a multi-agent team of marketers. That's the thing. I'm working on a curriculum called Cyborg Thinking, and it's just this idea of AI as a cognitive extension of the user. The train of thought in the past has been I've made this cool thing. Now I need someone to help me to help me get the word out.

42:15 - V.W.
And so that's what all the guys at Sundance are saying. Everybody shows up at the Denver Sundance Festival and then everybody's movie gets all these awards and then maybe they show up on Netflix for a limited time show showing and are quickly swept into the dustbin of history. And so unless it can even matter not just that you have a distribution channel, but that your distribution channel, that is one that is positioned to write the big check, to give the thrust underneath your rocket, to even get it to lift up off the ground, even though it's a mighty fine rocket. It's a really E. M., T. situation. And what I'm trying to do is say, I'm agreeing with you. And I think that the kind of hope that you give people with multi-agent personal marketing is that they can break through this but at some point it will become about capitalization. And programmers attempted to get capitalization by going the open source route saying, I'm going to give my work away for free to everybody in the world and they're going to become so dependent on the quality of code that I write that they can't help but come back to me. But instead the world has said over the past decade, thank you very much next to quote A. G. And so I'm really wanting the little guy to win here I don't see it happening and I want you to change my mind.

43:36 - J.K.
Well, I think I mean it's to your point about To your point about The film industry I and and to go back to what I talked about flattening. It's gonna get to the point where one person like like one person who has built a network of complimentary agents is going to be more agile than a big company, even if the big company has, has similar access to tools. I mean, I, okay. You changed my mind and you're done.

44:12 - V.W.
You changed it.

44:13 - Multiple Speakers
And here's why is we have a precedent and the precedent are YouTube influencers where single individuals can attain a massive following and then monetize further work and make it sustainable.

44:26 - V.W.
And we've got, many examples of successful scientific, technical, and entertainment YouTube influencers that my hope is restored.

44:37 - Multiple Speakers
Thank you very much.

44:40 - J.K.
I'm very pessimistic about the way Disney is just consuming all this IP and then what I would consider under-delivering on these things. Truly believe, I'm very optimistic and bullish about a lot of this stuff.

But you yourself cited the example of, you know, Disney took Grimm's fairy tales, which were in the public domain, and then made proprietary intellectual property from them, which they've made money on since the 1930s.

45:14 - V.W.
So I think that if you did the same, I've also noticed the tone in your work of being able to look for the open source for looking for the Grimm's fairy tales that are in the public domain and then build on top of them proprietary works that could be for profit if you chose to go that direction.

45:39 - J.K.
It's funny you say that. One of my side projects is, again, my background's in writing and I view everything that I do as an extension of writing. I'm actually placing bets on the most popular superheroes entering public domain. And I'm building kind of stories in parallel. I'm creating unique superhero characters so that when Batman and Superman and Spider-Man go into the public domain within our lifetimes.

46:13 - Multiple Speakers
You can eat them alive? Superman versus the mummy? Yeah, they'll show up. These characters enter public domain, right, I'll have a, I have characters and plots waiting.

46:28 - J.K.
But, but I really think again, coming back to this, coming back to this image, like we as educators, we as the people who are teaching other people, we really need to be mindful about how we are equipping people because it's going to get, it's going to get to the point. And I like, um, without going into too much detail, like it's, it's been frustrating a little bit trying to, uh, get buy-in into the, these ideas that we need to be integrating large language model education in as many places as we can. Yeah.

47:07 - V.W.
But you, you just made a point better than I ever could have in that you presented a video a week ago or so, which began to actualize your ideas as consumable media content. And it was very motivational in that sense, because you'd also used AI to help you generate it, which was smart because it saved you time from stop frame animation, Ray Harryhausen type stuff. So I think that we're poised to do what you're saying, but here's the deal. The YouTube influencer has to have the following diverse skillset. They have to have a core interest that they really care about. They have to have the research chops to find out the facts about it to the point that they know more than the ordinary person so that there's leverage, intellectual property leverage. But thirdly, most importantly, and this is true for you and several others here, you have to have the communication skills to put your ideas down into formats that can stand alone by themselves for you not having to mind them being consumed by others, say YouTube videos or other kinds of content. And so when I, ruminate over that you have to be a final cut pro editor, you have to be a you have to be a sound logic pro sound analysis person to put a good soundtrack on. You have to have this skills that are outside the main thrust of your expertise that enable your expertise to have a vehicle for play. And you can accomplish that by having multiple helpers, which most influencers eventually hire assistants or video editors or people to help them deploy. But initially, we have to be training students on a central core interest, and then on a set of communication skills that are absolutely essential to take square one in the game.

48:54 - J.K.
Yeah. I think there's a book called Range, and the subtitle is Why Generalists Thrive in a Specialized World. And again, it really drives me crazy how as I've, as I've delved into, uh, academia, like we're, we're, we're constantly encouraged to specialize in niche down and, and, and become subject matter experts in one thing. And it, it really, I mean, it, it ends up handicapping the entire academic. It's worse than that.

49:31 - V.W.
You actually ask incoming grad students to handicap themselves for the rest of their lives by becoming a domain expert in a single topic that is so specialized that if technology moves on, and it always does, that their skill set is rendered almost immediately useless. And I know of no better example than the NIH charter to have a bunch of PhDs study the activity of different enzymes. And so a given PhD will be an expert in the way a given molecular machine works, a given enzyme works, and then And once that's understood and elucidated for the public to consume, that person is basically used up, a

50:16 - J.K.
spent cartridge in the ammunition of technology progress. Yeah. I would go as far as to say it's almost predatory in that it's not, especially in 2024, moving forward. Really need to focus on kind of the skills you're talking about, where it's like, how do you package your idea that you like, I've been I've been meeting with some former professors of mine from UCA. As they're trying to update their writing curriculum to incorporate prompt engineering and things like that. And I've just said, I've said, like, does it matter? Does it really matter if the book that changes your life started out in a person's head and then was brought into the world almost with a surrogate, with AI acting as a surrogate?

51:12 - V.W.
Yeah, we kind of don't care about the origin story until way later when we're trying to write the history of how we did it. In the right now, we don't really need the origin story because we're standing on the shoulders of so many giants that we can have a time to do the giant genealogy, although those things are done, should be done, and

51:38 - J.K.
are also in the dustbin of history and are occasionally useful to unearth and appreciate. I keep coming back to, I haven't encountered a task or a deliverable that I can't do with multi-agent. I've tried, I've genuinely tried to find things that are not possible or the output is worse than what an expert in the domain could do.

52:09 - Y.i.P.
J., I'm going to send you an example of a math problem, ACT problem, that my son was trying to solve.

52:20 - J.K.
I'll give you some homework.

52:22 - Multiple Speakers
I would love to try and do that, absolutely.

52:27 - Y.i.P.
Okay, so I have a couple of questions and follow-up items. Last week, I asked you a question, Dr. W., on the chat. Somebody said that there are some master students and the test of building a website or a user interface was a trial that somebody wanted to do. And I said, let's meet offline. Was it D?

52:58 - Multiple Speakers
Yeah, it was D.

53:02 - V.W.
But yeah, D. is a great resource for that because he has the coterie of graduate students that would be eligible to do that for their capstone project or whatever.

53:16 - Y.i.P.
Here is my idea. I can use this chart and maybe one one model. And when you say website or UI, essentially, it is HTML, CSS, JavaScript embedded into it. And we are using react, but anything will call it artifacts will also use react on the front end.

53:37 - V.W.
And if you so you can just say, please use react to do my user interface, because I find it more extensible than yada, yada, yada.

53:48 - Y.i.P.
And I have a lot of documents that can actually help these students to build something and test something. So that's one. So I'll reach out to D. I think I have his email as part of the invite. And then the second idea that I have and J. for that, if you're interested, by the way, when we are doing that, we can pick up one model and try multi-agent as part of the test. If you're interested in doing that, that as part of the test. But I would like to have an offline conversation. I saw the link of the document that you sent. And pardon the lack of my knowledge and technicality in this. I would like to have some decoding session with you on the document. But if you're interested, we can choose a model and create multi-agent. And before I say multi-agent, I want to understand your definition of agent and multi-agent.

54:48 - J.K.
and all that and perhaps test that as part of the same project.

54:54 - Y.i.P.
So we can really combine three concepts as part of this and create a result of all these three concepts. That is what I wanted to share before we hang up today. We spoke something that can be made real and we can present something while we hide the concept but we can still present the accuracy we got compared to this chart, did multi-model within one model.

55:23 - V.W.
Y., if some of your students could just take the upper nine entries in this table for Jscript, Python, and Java, for Gemini, ChatGPT, and Cloud Artifacts, and just do a simple nine-way example where they give the same prompt. Well, I guess they would give three different prompts to three different LLMs. And you could demonstrate for us in the next week or so what you found, because then we could calibrate our expectations accordingly. And what's nice about that is it doesn't violate anybody's IP or having to disclose things they're not totally comfortable disclosing, but it does get us that contributory place where we make progress. I remember there was a thing called the Silicon, the Santa Clara or Silicon Graphics kind of a computer club. Somebody can help me on that. Exact name, that was going on at the time that B.G. was beginning to have the opportunity to build the first version of DOS and print money from his operating system. And there came a time where they disbanded things because they became so financially lucrative, they couldn't keep sharing openly. So I think if we're smart and unionize ourselves up front, rather than after the fact, that we can figure out how to compartmentalize our sharing so that we produce the maximum benefit for each other, with a minimum long term detriment. So I think we need to be disciplined about that.

56:48 - Multiple Speakers
Absolutely. And my firm has a formal NDA with you.

56:52 - Y.i.P.
So similar to last time when you presented, you said that lady rather right, she has to take approval from some people before sharing and we decided okay, we don't disclose that before she gets the approval. When I present, we can if I'm present something that is confidential to any of our clients I'll say don't record and but I'm because we have that formal agreement I'll present to you all under the non-disclosure agreement but I think I would this is this would be a very interesting test which will start with a real requirement to a final output and leverage the risk Dr. W. or J. or what D. is trying to do, we can combine. We could also write a joint paper on it.

57:45 - Multiple Speakers
We could put all our names on it. And then everybody gets the intellectual property value of having published that on archive or whoever wants to carry it.

57:53 - V.W.
And it becomes part of the corpus of progress that we have, you know, in watching this, how just in a year and a half, we went from 24 companies to four or five. That's sort of scary, because I was looking at the five and $10 million investments that people made in the 20 companies that are no longer with us. And it made me kind of sad, because those people were certainly just as skilled and just as able to do great and creative things. But for one reason or another, they didn't survive the call of natural selection. And I would like to think that we can structure things, as we have so far, to survive the call of natural selection and just become an ongoing contributor to the art, so that just like the benchmarkers always survive, if we benchmark these nine cases enumerated earlier, then we survive because benchmarks survive. And the governance of the history of the art always survives in the face of all other natural selection forces. So yeah. Correct.

58:59 - Y.i.P.
So I will reach out to the D. and maybe I'll create one pager or two pagers on what we are trying to do and we agree and we decide to progress on that on the next call. I may not be able to.

59:15 - V.W.
I'm also excited about D.D.'s work because he's going to the very core of language specifications and how we specify tasks for machines to accomplish and how we measure whether or not the machine accomplished that, the metric by which we actually your accuracy as being an objective rather than subjective quantity. And so I'm kind of excited about the research that he's doing along those lines, because to me, it seems very central to these core issues that we're trying to formalize in a way that we can have robust performance metrics that if we give them to someone else, like a Consumer Reports says, buy the Maytag washer and not the Kenmore washer, that we can know we did the right thing by giving them that advice. Correct.

59:59 - Y.i.P.
So let me take a stab at creating that document and I'll reach out to D. also to confirm his intentions and what he would like to do. And if everybody agrees, we'll kickstart that the following week. It may take a couple of weeks or three weeks, but maybe around Christmas or in the New Year's, we'll present you all something on those dimensions. Mentioned. So sorry, Dr. W., it may not be in two weeks we present. We I'll at least present what we will do and how we'll do and if we agree in two, three weeks, we'll present you back with the comparison with this chart and what actually happens with what we do.

1:00:47 - V.W.
So it was the Silicon Valley Computer Club that found themselves in the very predicament that we find ourselves in. And so now we've gotten a model. Let's recap what we've gotten, because we've gotten a really some good stuff today. J. has proved to us, at least informally, that the YouTube influencer is our great hope for the future that large corporations will eat us all alive, like it was depicted in the movie. What was the series? Was it iRobot? It wasn't iRobot. It was Mr. Robot. Yeah, that was the show that Corporate greed.

1:01:25 - Multiple Speakers
Evil Corp. Evil Corp.

1:01:28 - Unidentified Speaker
Yeah.

1:01:28 - V.W.
Yeah.

1:01:30 - J.K.
I think, I mean, we're talking about like, I know you've mentioned that these these companies have been consolidated down. I my gut tells me that these tools are more of utilities. I mean, I I really don't think between Having worked on prompt engineering for a long time, I don't think that it's going to be, even if you condense this down into the top three or the top one or two, obviously there are costs. They obviously have some command of greater cost control once they have captured this field. But my gut tells me that Once we have these tools in place, the expansion happens again on an individual level. We've gone from zero to one in that anyone can build anything. And the question just becomes- What are you going to build? What are you going to build?

1:02:37 - Multiple Speakers
Yeah. I mean- B.G. said, we now have a 747, speaking of the internet and the personal computer combined, he said, we now have a 747 for the cost of a pizza.

1:02:49 - V.W.
So what are you going to do with this capability? And who are the people that we're going to see emerge that have the potential skills? You know, the skill sets that got us here, the J.v.N. who could do differential equations in his head, that was a different skill set that took us to the next step of building PCs and giant pieces of software, but they were related. But now we're getting more into the different families of will succeed because the distribution of skills needs to be a little bit different. So it'll be interesting to see who survives the next culling, the next meteor impact, if you were a dinosaur, as it were.

1:03:33 - J.K.
My thing is just, again, studying curriculum design and education. I'm excited that regardless of your language, your location, your background, anyone can build anything. I mean, it used to be heavily siloed. You had to know people who knew people. And at the end of the day, that's not the case anymore. It's less the case.

1:03:56 - V.W.
And that's the case I was trying to make.

1:04:00 - J.K.
It's less the case.

1:04:01 - V.W.
And the question is, what heroes like yourself can we identify like you and Y. and R. and D. and so forth, who will be emergent to show us what that next possibility is who will be the, uh, the B.G., S.J., or even E.M. of this next use case to S. to give us all the rest of us the inspiration, because, you know, you used to tell kids, Hey, someday you can be an astronaut. And then I was at JPL when the space shuttle exploded and not as many people wanted to be astronauts or teachers anymore, because they saw that's right.

1:04:36 - D.D.
They saw C.M. get blown to bits.

1:04:39 - V.W.
And it was the deepest heartbreak, especially those of us who dedicated themselves to rocket science, it was the most horrible scenario we could possibly contrive. And there it was in front of our eyes. So then we had to say, well, you could be a doctor or a lawyer. So we have to be able to point people in the direction that we have to take care of the Earth. Earth is our home. We have to do a good job here before we start exporting ourselves to other planets. Make sure we're putting the best version of ourselves forward. So I'm looking for these use cases. And right now, we have these wonderful YouTube influencers in science and technology that are beginning to show us like what's like Veritasium is a great example, or three blue one brown
These guys are one-person roadshows that are doing incredible things to engage us all in an ongoing way. And we have the forefront of quantum computing and what vistas will that open? Well, breaking all the credit card numbers isn't enough motivation for me to move forward. But the possibility of being able to do things with drug discovery with quantum mechanical simulations, that does have a great hope for a large number of people suffering from currently incurable diseases. So I just I'm looking to collect those. It used to be robots, radio and rockets, like when you were growing up robots, radio and rockets. Could engage people, young adolescents on those three ideas they were in. Well, it's not that anymore. It's something different. And we need to identify what that is with a clever with a clever slogan that we need a three-R slogan for robots, radio and rockets, reading, writing and arithmetic, that engage people to continue their education, so that they can enjoy what we enjoy. And that is that the opportunity to surf the leading edge for a while.

1:06:35 - D.D.
Well, guys, I've got to go get ready for dinner. All right. Go get ready for dinner, D.

1:06:41 - V.W.
Yeah. Yeah. It wasn't too scary today, was it?

1:06:44 - D.D.
No, it was great. Hey, we didn't have the yeah, we have the fear thing.

1:06:49 - V.W.
We got closer. I look at four thirty-seven. We had a potential fear moment when we were talking about being wiped out by the big corporations. But J. has extricated us from that.

1:07:01 - Multiple Speakers
So I'm I feel pretty good.

1:07:03 - J.K.
The corporations are the ones you have to worry now.

1:07:08 - Multiple Speakers
We're all... We are the corporations you have to worry about. We are all capable of competing with them now.

1:07:17 - V.W.
We've met the enemy and they is us.

1:07:21 - J.K.
Again, we are more agile. That sounds a bit Pollyanna-ish, but we're going to get to the point where a single person with with a team of AI can become a nation-state actor.

1:07:35 - V.W.
Yeah, man. I don't know.

1:07:37 - J.K.
Oh, gosh, it's OK. I just got scared.

1:07:41 - Multiple Speakers
Yeah, I have a good. You guys have a great rest of your holidays and it's been great talking to you.

1:07:49 - J.K.
Have a good one.

1:07:51 - Unidentified Speaker
Thanks.

1:07:51 - D.D.
Thanks, man.

1:07:52 - Unidentified Speaker
Thanks, J.

1:07:53 - J.K.
Thank you.

AIn't What It Used to Be

Friday, November 29, 2024

11/29/24: Free discussion

No comments:

Post a Comment