#150 - GPT Store, new Nvidia chips, DeepMind’s robotics progress, bad uses of AI

Published: 1/14/2024

Our 150th episode with a summary and discussion of last week's big AI news!

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at contact@lastweekin.ai

Timestamps + links:

(00:00:00) Intro / Banter
Tools & Apps
- (00:02:49) OpenAI’s custom GPT Store is now open for business
- (00:06:52) OpenAI debuts ChatGPT subscription aimed at small teams
- Lighting round
Applications & Business
- (00:14:24) Nvidia's newest chips are designed to run AI at home as competition from Intel, AMD looms
- (00:16:52) Valve opens the door to more Steam games developed with AI
- Lighting round
  - (00:19:16) AI-powered search engine Perplexity AI, now valued at $520M, raises $73.6M
  - (00:21:33) Waymo will start testing robotaxis on Phoenix highways
  - (00:23:24) Getty and Nvidia bring generative AI to stock photos
Research & Advancements
- (00:25:10) DeepMind is trying to train robots for real-world activities
- (00:29:03) MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
- Lighting round
  - (00:33:18) PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
  - (00:35:20) InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes
  - (00:37:06) The Impact of Reasoning Step Length on Large Language Models
  - (00:39:33) Mixtral of Experts
Policy & Safety
- (00:40:38) Meta and OpenAI have spawned a wave of AI sex companions—and some of them are children
- (00:46:30) ‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says
- Lighting round
  - (00:49:52) Judges in England and Wales are given cautious approval to use AI in writing legal opinions
  - (00:51:43) Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
Synthetic Media & Art
- (00:55:09) A list going viral reveals famous artists whose work was used to train AI generator
- (00:58:28) YouTube is cracking down on AI-generated true crime deepfakes
- Lighting round

Get these topics and transcript in your inbox

Main Topics

Topic	Mentions	Sentiment
Artificial intelligence	97	~ Mixed
Generative pre-trained transformer	22	~ Mixed
Google	10	~ Mixed
OpenAI	8	∘ Neutral
Mickey Mouse	7	∘ Neutral
Meta Platforms	7	∘ Neutral
George Carlin	6	∘ Neutral
Nvidia	6	∘ Neutral
YouTube	6	∘ Neutral

Transcript

[00:00s-00:43s]: Hello and welcome to Skyline today's last week in AI podcast. We can hear a shout about what's going on with AI. As usual, in this episode we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.ai for articles that we did not cover in this episode. I am one of your hosts, Andre Krenkov. I finished my PhD focused on AI at Stanford earlier this year and I now work at a genitive AI startup. And this week Jeremy is not around. He is off talking to politicians, I guess, somewhere. So we have a guest host. Hey, I'm Dan Obashir.

[00:43s-00:54s]: I am a machine learning compiler engineer at Amazon web services. I also co-run another publication, a good friend of last week in AI called the gradient. Asked me.

[00:54s-01:07s]: Yes, that's right. We've had Dania on before. He records the gradient podcast. So he introduced a ton of people in AI. It must be what, like 80 people now at least, right?

[01:07s-01:23s]: Quite a few. Yeah, we just dropped episode 106. If you're interested in this will be a lot of you. I'm sure the philosophy of language and propositional attitudes. I have a two-hour, ten-minute conversation with the professor at UT Austin about this that came out Thursday. Right.

[01:23s-03:50s]: Yeah, that is definitely going deep. And I guess that's generally true for the gradient. It's another project I'm also involved in. It's a digital publication and also newsletter and podcast. And here on last week in AI we cover broadly kind of everything, right? Whereas on the gradient you really go deep and pretty technical. So not for everyone, but if you like going deep on certain topics and really getting to the weeds of stuff then you might want to check it out. And before we get into discussing the news, I just want to have a quick shout out to a bunch of new reviews on Apple podcasts. I guess people heard our appreciation last time. And so we got like six new ones, which is great. Yeah, we really love seeing it. A couple of you mentioned that we recorded while we were sick last time. So it's cool to hear that that inspires you or you think that makes us committed to this, which I guess we are. And yeah, thank you everyone for reviews. One of you mentioned longer segments on arts and entertainment with AI and actually will have some new stories on that this week. So that would be I guess nice for you to hear. Alrighty. So kicking off our first section tools and apps with OpenAI's custom GPT store is now open for business. So this is I guess a big new story of the week is OpenAI has their store for custom chat bots. This is after the GPT build a program, which was announced in November was kind of launched. And there's now been three million bots created by users. So originally the store was supposed to launch actually earlier, but has now launched. So essentially instead of just adding with one chat GPT, you can now chat with all these various GPT's right customized versions of chat GPT's that users on the platform can create. And this is now available to users of chat GPT plus enterprise users and to this new tier of subscription that will cover next after this.

[03:50s-04:42s]: Yeah, it's generally sounds really exciting. And I think that the idea of getting to work with chat with GPT, those are people are creating seems really exciting. I think that some of us are going to have areas of expertise that others just aren't or are willing to put in the work to create a sort of GPT, but it's definitely interesting to note that they do a review system in place for these custom GPT's. They want to make sure they meet grant brand guidelines and usage policies is also reporting system and kind of seeing a little bit around the Twitter verse. I've seen at least a few people who have made custom GPT's that ultimately got taken down and seemed pretty unhappy about it. So I'm not entirely sure what those I'll be honest to have them look too deeply into the guidelines just yet. So I'm not entirely sure if they're being applied consistently, but it's interesting to see the response right now.

[04:42s-05:37s]: Right. Yeah, and it'll be just to see how this grows because you can use chat GPT for a lot of stuff as is, right? And these customized chat GPT's are not too different. I think you customize them similar to other chatbot platforms. You kind of prompt them, give them some example. So it's not a ton of work to create a custom chat GPT, but it does seem like they might actually train on interactions. So these customized GPT's will diverge from kind of the main chat GPT over time, potentially as people use them, which would mean that this will be a true repository of I guess thousands or hundreds of thousands of different chatbots trained on different data and different interactions. So yeah, seems like probably a big deal, really.

[05:37s-06:07s]: Yeah, that's pretty valuable, I think, especially for people who are trying to build businesses on top of these and are maybe not super happy with what GPT 3.5 right now, for example, is offering them and that maybe it's a little bit too broad or the kind of trade-offs that are implicit in using it just don't really work for their use case. And I've heard at least a few people complain about this thing before. So I'm curious if the GPT store is going to really change the game in that regard.

[06:07s-06:08s]: That's right.

[06:08s-08:08s]: And it seems like at some point there will be also a motivation for the creators of these chatbots. So it'll be a whole kind of platform for chatbots similar to something like character that AI that allows you to create your own chatbots to just chat with. So this is going in that direction very much. And already when you go to the store page, you can kind of browse for different applications for image generation for writing, productivity, programming, education, et cetera, et cetera, et cetera. So if you're a fan of chatbots, now you might want to go and look if there's a customized version for your needs, I guess, or create one for yourself. And onto the second branch of OpenAI developments this week, the next story is on how OpenAI has released this new way to subscribe to chatbots aimed at small teams. And this is the chatbots team that is kind of in between a single user and enterprise seemingly. This is a workspace for teams of up to 149 people. And it introduces admin tools for team management and all the usual access to all the tools. And also has a guarantee that this will not be using your data to train similar to the enterprise tier. And this is priced at $30 per user per month for monthly billing or 25 per user per month. So a bit more than the standard chatbots pro if you're a single user. But yeah, it's interesting that I guess we're expanding their offerings to now cover small businesses seemingly. Yeah, I guess I was just kind of noting at the end of our discussion of the last story

[08:08s-08:48s]: about what the GPT store could offer businesses. And then this is even more in the realm of small and medium size businesses, maybe very small tech startups right now are pretty interested in the differentiations that could kind of happen with training their own GPs or achieving the equivalent effect through some other mechanism. And again, that's like really, really hard. And I think that from many of the things that businesses want to do, they're just aren't a lot of good solutions out there. And there's still a lot of research problems that need to be solved. It feels like, but it does seem that open AI is still targeting this market in a pretty important way. Yeah.

[08:48s-09:12s]: And it's also, I think in a way, interesting that this is now similar to Google's G Suite and Microsoft's co-pilot thing where everyone now has a program where you can pay $30 per month per user or 25 in this case if you pay for a whole year.

[09:12s-09:18s]: So everyone is going after enterprise and now also small businesses in this case.

[09:18s-10:27s]: For our first lightning round, we'll start with a story about something called the Rabbit R1. There's an AI startup called Rabbit out there that is launched a standalone AI device that's priced at $199. And it's an AI powered gadget that can actually use your apps for you. It's about half the size of an iPhone with a 2.8 inch touchscreen, a rotating camera, a scroll wheel for navigation, and a 2.3 gigahertz media tech processor along with four gigs of memory and 128 gigabytes of storage. It runs on Rabbit's own operating system called Rabbit OS, which is based on what they call a large action model. This acts as a universal controller for apps, again, control music, it can order cars by groceries and messages and do more through a single interface. And again, this is a pretty interesting move. I think that there are a lot of people who are really interested in building more agentic products and things based off of large language models right now. So it's very interesting to see Rabbit actually kind of come out there with a device that is looking to serve this sort of need.

[10:27s-11:19s]: It's pretty neat looking, I guess. If you go to the article and as always, we'll have the links here. It looks like a little kind of square with a screen and a camera and some kind of scroller thingy. It's actually not clear. So it's most similar to the AI pin that you've covered before in that it is sort of an AI first device that is meant to be a sort of hardware smart assistant. Some device that can potentially augment or replace your smartphone with AI built in. They just announced it and I guess a lot of people on the Twitter verse and elsewhere got hyped about it. We initial 10,000 units sold out already. So we'll see. Yeah, it's just announced. I don't know when they'll even come out, but people seem pretty hyped.

[11:19s-12:27s]: Our next story is about one of the tech giants. Amazon's Alexa has gotten some new generative AI powered experiences. These are developed by character AI splash and volley and they're all available for you in the Amazon Alexa skill store. I promise I'm not marketing for Amazon. I'm just telling you what's happening here. For some of the examples here character AI's experience allows Alexa users to have real time conversations with different personas. These include fictional characters and historical figures. You might have seen that meta recently launched this sort of thing as well and their messenger apps. So this seems to be the sort of experience that a lot of companies, especially in the social space, seem to want to build around splash launched a free Alexa skill. This enables users to create songs using their voice. They can choose a musical genre at lyrics and either rap or sing along. Perhaps good if you're kind of interested in creating music, but you have no capacity for actually composing stuff like me. Mali has introduced a generative AI powered 20 questions game. This uses AI to interact with users by asking questions, providing hints and explaining yes or no questions.

[12:27s-13:44s]: Seems like a bit of a no brainer and maybe a good strategy on Amazon to partner with these already established other companies of a character that AI that we've covered as being very popular. You can't talk to all the characters. It seems you can't talk to Elon Musk or William Shakespeare. You actually have to go to character that AI, but a lot of them are now available on Alexa. So yeah, now if you have one, you can play around with some fun AI stuff. And one last story for this section. Google is working on a advanced version of Bard that you will pay for. And that story, it's supposedly going to be called Bard Advanced and this will be something you pay for Google one, this will presumably be powered by Gemini Ultra, this version of Gemini, their flagship model that's akin to chat GPT, that is yet to be out. So yeah, not too surprising here. I guess something we probably all expected, but we'll be very interested to see when this does come out if it will measure up to GPT-4 and other paid tier chatbots.

[13:44s-14:25s]: Everybody right now has been talking about how it feels Google is really sleeping in the AI game right now when it comes to shipping advanced chatbots, how long it took them to finally get Gemini out. And it'll be interesting because Google, kind of as an incumbent, has certain sorts of natural advantages. They have distribution and things like this. So the question, I guess for Google, is will they be able to deliver something that is enough of an improvement over everything else and is distributed in the right way so that they can recover some of that market share from the competitors? I think that's a really good question for them right now.

[14:25s-15:43s]: And onto the next section, applications and business, starting with one of our favorite topics in business, hardware and video. And the story is that Nvidia's new chips are designed to run AI at home and are probably going to compete with Intel and AMD. The company and video announced three new graphic cards, RTX for 4060 Super, RTX 4070, the AI Super and one more RTX 4080 Super, all priced between $600 and $1,000, relatively cheap, relative to the high-end GPUs, people use for AI training and things like that. And these will have a tensor course for running a generative AI application. So this is kind of moving away from the business enterprise level of GPUs that cost tens of dollars each towards more of a consumer bent and as we've covered AMD and Intel both have had some of our own hardware announcements aimed more at runtime, not training, at inference. And so the VIVI's announcements and video is running into that category as well.

[15:43s-16:36s]: This is pretty big. Really jumping into, again, taking great market share here and something that's going to be pretty important going forward as Andre just kind of pointed out. This is again, not on the side of individuals like you and me might not be training our models, but the games we play, the programs we use, Photoshop for instance, they are more and more going to start integrating generative AI features. I think as soon as GPT3 came out for instance, we already saw people experimenting with using it to generate dialogues for characters and also like the chip can be used for task like generating images on Photoshop's Firefly generator, like removing backgrounds and video calls. And video is also developing tools for integrating generative AI into games. So this is something that's going to be pretty huge going forward and I'm not at all surprised to see Nvidia jumping onto that.

[16:36s-16:52s]: Seems like as always hardware is what we're money is at or has been in large parts so far. So Nvidia still might be running giant. We'll have to see if Intel and AMD do manage to make a dent with these new chips in the fray.

[16:52s-18:04s]: Our next story kind of ties in naturally to what we were talking about with games. So Valve has recently updated his rules for game developers publishing AI based games on Steam, which requires them to disclose actually when their games use AI technology. And this is really a move that is coming in. I think a lot of ways right now this isn't just happening in games, but in the case of Valve, this is aiming to increase transparency around the use of AI in games, protect against the risks of AI generated content and allow customers to make informed decisions about purchasing AI based games. These rules are coming after a lot of developers complain that Valve was rejecting game submissions containing AI generated assets due to copyright concerns. And so again, this is just a case where having that transparency is going to be pretty important. There are lots of things people will run into when they're developing AI generated media where they come into conflict with things like copyright and having some knowledge about what's going on that there is a generative AI system being used. And probably some of the technical details of that system are going to be pretty important to both understanding and mitigating these concerns.

[18:04s-19:17s]: Use that so far the policy has been to basically reject games with use AI and this explicit updates to the policies where Valve's blog post essentially opens with floodgates, so to speak, it seems. So now you are officially allowed to use AI, you just have to disclose it. Pretty much necessary for Valve, given that probably there are already many games being submitted with AI generated content. They probably don't even know in many cases because it's not like you can necessarily tell. So yeah, interesting to see in the gaming space, which is of course huge and outside generation being a huge kind of application of AI steam for those who don't know, I guess we should probably mention, is a major marketplace for video games. So if you want to buy a game on a PC or Mac or any kind of non console, you would usually go for steam. So it's a huge deal for them to allow it. It's kind of like, I don't know, YouTube allowing AI in their videos, so to speak.

[19:17s-20:50s]: Onto the lightning round. So not too long ago, we were talking about some of the big questions for Google as a try-set pushout generative AI systems, and our next story is about an AI powered search engine called perplexity that really wants to make Google dance. They recently raised $73.6 million in a funding round. This is led by IVP with participation from NEA, Databricks Adventures, Nvidia and Jeff and Bezos among others. Those are some pretty big names in the investment landscape. So this is like a pretty serious fundraise. The round is valuing the company at $520 million post money. That's a lot of money. So perplexity was founded in August 2022 by a couple of engineers with backgrounds and AI distributed systems, search engines, databases, basically all the stuff you need to put together to create a search engine like this. And perplexity offers a chap-bot like interface that allows users to ask questions in natural language and respond to the summary containing source citations, which is again a really powerful alternative to something like Google. When chat GPT came out, people were really excited about the fact that you didn't have to go to Google and then look at the top 10 links to figure out the information you were looking for. You could just have it delivered to you. And so when you develop a system that is capable of doing a lot of the job of that search for you to deliver you the information and deliver it correctly with links and sources and all of that, you've got something really powerful and it lowers the amount of work a user has to do to find what they're looking for.

[20:50s-22:20s]: So this is what they offer is quite similar to Google's Bard, for instance, where you ask a question of a chatbot and it provides an answer and also links to these articles, as you said, u.com is another version of this and chat GPT also can do it. So it seems like a news kind of bet on a search paradigm where you ask a question and it searches the web for you and provides even summary. And yeah, this is a big player in that space, they claim to have 10 million actively users and they're now valued at over 500 million. So we'll have to see. But yeah, if you're looking to try out a tool, then perplexity might be something you're interested in. And next story is about self-driving cars and about how Waymo will start testing robot taxis on Phoenix highways. Waymo has been testing and actually running commercial services in several cities for a while now. They've been in Phoenix for a very long time. They've been in San Francisco for a while. But this has always been in the city itself, in the streets. They have not been allowed to ride on highways. So this will be kind of expanding that to allow the cars to use that. And yeah, once we start testing presumably, you know, and sometime after, it will expand to allow that in the commercial offering as well.

[22:20s-23:00s]: It's pretty exciting and it does seem like at least in limited cases, they are getting to deliver more and more advanced features. I actually grew up in Phoenix and I remembered the first time I ever saw Waymo on a street was when I was visiting home for the summer or for break during college. I think this was sophomore year or so. And we were driving home from Sky Harbor Airport and I just saw random Waymo's on the street. And I'm pretty sure that was one of the first times I'd ever seen a self-driving car actually going on the road. So it's pretty interesting, exciting, I think, to see how far they've come in that regard. So I'm definitely very curious to see how things look for them going forward.

[23:00s-24:17s]: Yeah, I personally use Waymo now in SF, whatever I'm there, instead of Uber. So I've used it like 20 times now or something and yeah, it's really good. I've never had any issues. So personally, I'm excited for it to use highways because then maybe I could actually go from not a self from Palau to Monty or whatever up to there. So this could be a big deal. And the next story is on stock photos. The story is Gidey and Nvidia bring Generative AI to stock photos. They are launching Generative AI by Isock, the text image platform that they are going to allow you to create stock photos with. This is building on Gidey's previous AI image generation tool but is designed for individual users rather than enterprise solutions. And this was done in collaboration with Nvidia, trained with their Picasso model and learned from Gidey's library and also I stock, stock photo library. So yeah, it's expanding, I guess, the range of users that can create these stock photos beyond just big businesses to small and medium sized businesses.

[24:17s-24:42s]: This is definitely one of those pretty obvious markets I feel. This was kind of going to happen eventually. So many of us, I mean, I think you and I have had to use stock photos at times. So I'm not at all surprised to see Gidey getting into this. Also I guess interesting is that contributors whose content was used to train the model can participate in a revenue sharing program, which is a pretty important detail for something like this.

[24:42s-25:10s]: Right. This is coming in with Shatterstock which also offers service like this and they are going to price $15 for 100 prompts with each prompt generating for images. Compared to buying stock photos where each stock photo usually costs a couple bucks at these two license, this could be something a lot of people would like to use, I guess.

[25:10s-26:59s]: Next up for research and advancements, our first story is something coming out of deep mind. Getting back into robotics, they're developing multiple research projects to create robots that can make decisions faster and work in real world scenarios. Again, this is a very, very hard problem in AI getting robots to do things that are actually useful in a robust way. The first project is a system called AutoRT, which combines large foundational models with a robot control model. This allows robots to gather training data in your environment and multitask. The goal is to can simultaneously direct multiple robots to carry out diverse tasks in a range of settings. It's been tested in real world evaluations over several months. The mind is integrated a robot constitution into AutoRT to ensure that the robots follow specific safety rules, including you guessed it, Isaac Asimov's three laws of robotics. This is pretty fun. There's also another system they developed called the self-adaptive robust attention for robotics transformers to improve the efficiency of robotic transformer models. You might have heard of the robotics transformer projects I deep-mind has come out with recently. I think that the development of foundation models really offered some very interesting research directions for grounding the outputs of these models. For example, putting a language model together with something like a robot arm and grounding the language model suggestions for achieving a task. I want you to move this block from this part of the room to another part of the room based off of what the robot arm could actually do. Lots of hard problems like this to explore and it seems like deep-mind is getting down that route again. They have been working with forever kind of the robotics research, but this is covering

[26:59s-29:02s]: a block post in which they sort of bundled a few different things, as you mentioned. So there's AutoRT. The research paper there, the full title is AutoRT. Body foundation models for large scale orchestration of robotic agents and then there's SARA, self-adaptive robust attention as well. It's interesting, I guess we're starting to highlight the growing amount of work we have in this direction and the, I guess, growing capabilities of a model for robotics specifically. So these are foundation models as per that AutoRT title. They say embodied foundation models, models that are trained to really control robots and the numbers here are pretty impressive. So they say that they had AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77,000 real robot episodes via both tail operation and autonomous robot policies and that allows them to collect a lot of data and we're for-trained these models to continually, I guess, expand the amount of data and control more and more robots all over the place. So it seems like a pretty exciting time for robotics in terms of getting these models that allow you to do low level control, moving a robot to pick stuff up and move around your environment. Now we also have these higher level control things like AutoRT that are orchestrating and making decisions on what different robots should do and kind of doing the high level decision making. So once we get these foundation models trained and deep minds seems to be very much pushing on that front, potentially you could get to a point of general purpose robotics and within the next year or two that is seeming more and more plausible given the pretty rapid

[29:02s-29:04s]: advancements of the space.

[29:04s-31:46s]: Our next story is about a pretty recent paper that is called MOE Mamba and I'll talk about what this actually means for a little bit. So as you might be aware, Transformers really, really powerful architecture, not the most efficient architecture in the world when you feed a transformer a bunch of words, it's context for that transformer to do inference and deliver words to you to expand on that context and to generate text. It's actually pretty computationally expensive and that inference time, that inference actually squares with the length of the context you gave the transformer, the computations it has to do. So if you give it 100 words, then you can think of the computation that it takes to be 100 squared, not the best explanation in the world, but it looks something like this and that's not the most computationally feasible thing to do, especially when you scale to super long context lens. So one thing researchers are doing right now is they're trying to figure out how to a work on transformers so as to mitigate that issue, but also there is a line of research exploring something called state space models, which offer linear time inference with respect to the context length. Again, that's much less expensive here and they also have this pretty efficient training process via hardware or web design. State space models are pretty complicated and math wise, they're inspired by a lot of things like control theory, but basically there have been a number of these state space models introduced and they are currently, especially with the recent state space model called Mamba, apparently challenging the dominance of transformers. So people are really looking into this research area and despite the fact that all of the big architectures today are based on transformers, it's kind of another important line of research. This paper, Emily Mamba, combines the recent Mamba model with something called mixture of experts, which is the class of techniques that allow drastically increasing the number of parameters in a model without much impact to the operations, the number of operations required for the models inference and training, basically again, making that model a lot more powerful without having to substantially increase the computational cost of running that model. So this paper basically combines these two techniques and claims that to unlock the potential of state space models for scaling, they should be combined with this mixture of experts technique and so they showcase it on the recent Mamba model and find that it outperforms both the original Mamba model and transformers with mixture of experts achieving the same performance as Mamba in much fewer training steps while preserving inference performance gains of Mamba against the transformer.

[31:46s-34:53s]: That's right. Yeah. So if you're a regular listener, you've heard us cover Mamba, you've heard us cover mixture of experts with mixed raw and some other things. So this is basically kind of gluing with two things together. If you look at the paper, it's not anything to, I guess, technically complex conceptually. They just add mixture of experts to Mamba and the findings are that it seems to make it a lot more efficient. So these two promising techniques are better together. Yeah. That's an exciting finding given that we cover again, again, that it costs, you know, crazy amounts of money to train these models, millions of dollars. And here Mamba is making it so the model itself is more efficient, mixture of experts makes it so you can train with half the kind of computation to get the same performance seemingly. And that means that we could unlock a lot of efficiency. And potentially that would enable a lot of scaling, which would mean that we can make our models even better. So yeah, exciting times in the AI architecture technical detail space. For a while, for a couple of years, it was all transformers all the time and nothing seemed to really cross that threshold of being good enough to replace transformers, but you might be getting there. And after the lighting round, starting with Pixar Delta, fast and controllable image generation of latent consistency models. Pixar is one of his texts to image generation models. Pixar Alpha is something that's had existed before that could generate these high quality image of up to 1000 pixel resolutions with an efficient training process. So Pixar Delta is basically a Delta next step on that that introduces some extra tricks into the process with the slate and consistency model and control net just combining some existing concepts and that significantly speeds things up. It produces high quality images and just a few computation steps. So that means that it takes only half a second for generating a thousand by a thousand pixel image, that's a seven fold speed up. And it is also meant to train within a single day on IMGPUs. So yeah, it's following up on a lot of progress also in this space of being able to generate images quickly. And yeah, for a lot of these businesses and applications that have generative text to image capabilities, soon we might be able to see images being generated in under a second, just super quick.

[34:53s-35:21s]: Yeah, I think the big thing to focus on there as Andre was saying is that the primary upshot for a lot of this technical work is when these models eventually get integrated into your things like consumer applications, things business have delivered to you apps that you might be using yourself. Everything is going to be a lot faster. It's going to be higher quality. It's going to be more controllable. You're going to get the types of images you were actually looking for a lot more easily.

[35:21s-36:17s]: And next paper is in surf. That's a good little fun. Text-driven and generative object in the search and in new or 3D scenes. So you haven't covered Nerf in a little while, but Nerf is still very popular. Nerf is a technique for generating 3D models and 3D scenes from images. And in surf is as the title of a paper says, a way to edit insert objects into 3D scenes constructed by Nerf. So similar to I guess being able to edit a 2D image and in paint something in there, now you can do that with 3D scenes. And as I've said before on this podcast, I think 3D generation and editing of 3D is going to be a big trend and an area of improvement throughout this year. I think one of the cool features of this is that it allows for a controllable and 3D consistent

[36:17s-36:40s]: object insertion without requiring explicit 3D information as input. So again, I think these methods, as they get more powerful, they're just going to require less and less effort on the part of the people using and developing things, which is really exciting. Just to be extra clear, this requires a bounding box, inner 3D scene and a text prompt.

[36:40s-37:07s]: So the way you might think of using this is you're looking at a 3D scene on your screen, you're kind of rotating it around, see a floor and you want to add a table onto a floor, you have a bound new box, say add X here and it does that. And yeah, if you're curious about the Subplication Space, they have a project page with a nice little video and it's pretty seamless. It's pretty impressive.

[37:07s-38:24s]: This story is about the impact of reasoning step length on large language models. For some context here, a lot of you have probably heard of chain of thought prompting before. This is pretty crucial in enhancing the reasoning abilities of large language models. I just did air quotes around reasoning abilities because there is a lot of back and forth over whether these things actually reason. I tend towards the more skeptical side, but we'll have you. And so again, the relationship between chain of thought effectiveness and the length of reasoning steps and prompts isn't super well understood. So in this paper, researchers conducted experiments to explore that relationship I just mentioned. They manipulated the length of reasoning steps in chain of thought demonstrations, while giving other factors constant. Again, this sort of length of reasoning steps things has a bit to do with how complex the issue you're trying to get your language model to reason through might be. And they found that lengthening the reasoning steps and prompts without adding new information actually significantly improves LLN's reasoning abilities across multiple data sets, possibly because there's some more context or something like this added to that. And they also found that shortening the reasoning steps, even while preserving key information considerably reduces the model's reasoning abilities.

[38:24s-40:39s]: So this is in the context of chain of thought prompting where you tell the model, you know, think through what you should do and then give me a solution. A reasoning steps here is basically how much budget you give it to work with in terms of how much reasoning, how much kind of prelude to its answer in terms of these thought steps. It is allowed. And we've covered quite a few papers here on this whole prompt engineering genre of research where you're like, okay, how do I alter my prompts and get the model to be more accurate chain of thought being one of the big ones. So this is pretty useful in terms of understanding how to use chain of thought well. And we've covered quite a few papers that integrate chain of thought in data generation and sort of factual, factuality, checking and various things like that. So it's pretty significant to, I guess, understand a little bit better how to use chain of thought properly. And one spaper which we aren't going to go too far into because we have already covered this topic of mixed rawl. But I think it is worth highlighting that the company mistral has released the full white paper. So originally there released a model mixed rawl 8 by 7b which recovered back when it came out to this mixture of experts variant of on a chatbot that was very good and is very popular with people building on top of this open source release of a model. So they have released a paper now that you can go look through for a whole bunch of results, you know, lots of analysis on training and performance and so on. That provides more details that corroborate what we already know which is mixed rawl is quite good and using mixture of experts seems to really improve training and accuracy in pretty impressive ways.

[40:39s-42:12s]: Next up is our policy and safety section and our first story here is this is going to be a fun one. So I am decidedly not a fan of some of the directions that AI companionship is going and this kind of story is a good reason for that sort of thing. So recently NETA and OpenAI have managed to spawn a wave of AI sex companions and you might know already where this is going. Here is a website where users can interact with and chat with AI bots called chub AI. Interesting name there. It offers a variety of scenarios including a brothel staffed by underage girls which is raising concerns about AI powered child pornographic roleplay. The uncensored AI economy includes a lot of sites like chub AI and this was initially spurred by OpenAI and later accelerated by Meta's release of its open source llama tool. Again, technologies they can be used for good but they are always dual use. You release something open source that anybody can use and things like this are just going to happen. And experts are warning again that these activities may pose real world dangers for miners and raise questions about legal and ethical boundaries as well as tech companies accountability for uncensored AI. Chub AI is actually for context here an uncensored clone of character AI which Andre mentioned earlier allows users to engage in roleplay scenarios of AI characters and as we mentioned

[42:12s-42:16s]: some of these involve child pornography not a fan.

[42:16s-44:19s]: The title is highlighting Meta in particular because as we have covered their release llama one and llama two which are very powerful chat bots that you can use for nefarious activities and this is highlighting an example of that OpenAI very highlight in the title because you can jailbreak chat GPT to do things that it's not supposed to. Although I assume in the terms of service you know open AI can come after you if you are doing that and can kind of stop you whereas we use models their open source models are released you can basically take them and do whatever you want to some extent with them. So yeah not surprising I suppose but we do live in a world now where these models are in the wild and have been in wild for a little while so having a lot of this uncensored AI so to speak a lot of people are kind of putting a lot of effort into having models that you can use to do whatever you want even let's say role play with underage what time should I use here. Underage sex workers let's go with Meta is just happening and if you're interested I guess in the details or at least in a real world example of this already happening child is discussed in a decent amount of detail in this story as a prominent example and apparently it's making a lot of money it's generated over one million in revenue since launching this chat service so yeah we'll have to see if Meta does try to kind of fight uses of its open source models that are this problematic.

[44:19s-45:22s]: Yeah and before we jump into our next story I'll just very quickly highlight that part of why this sort of thing is even able to happen is that users have figured out ways to deal with the chat bots to obtain unmoderated responses which is what's leading to the emergence of these uncensored bots so that's again kind of highlighting the research area of there is this back and forth where companies like open AI like anthropic like Meta are training their models not just to predict the next word but adding these techniques on top of them in order to ensure that they output things that are reasonable then meet certain guidelines that they're going to want and this does mean that users using chat GPT might have a more frustrating experience but it also guards again to use cases like these and so there is this back and forth between the training techniques used to make these models have safer more aligned outputs with principles but then people on the other hand who are inevitably going to come and figure out ways to get around that.

[45:22s-46:24s]: Now I will mention it's also worth noting that Nama 2 isn't fully fully open source they are also certain restrictions in the license agreement so it does say explicitly that your use of Rulama must comply with applicable laws and regulations and adhere to the acceptable use policy for Lama material and there is quite a few details in that use policy so potentially Meta could go after this organization if they are indeed using Lama 2 as a backbone but as an organization you could also just not say anything potentially right you could just build a service of any open source chatbot and Meta may not be able to go after you necessarily so there is a lot of dimensions to open source and should you open source and whatever we are not getting into it you are just kind of highlighting something that

[46:24s-46:30s]: is now happening because of a relation of open source.

[46:30s-47:53s]: Next story this is now on you have seen a lot of cases and we discussed one a little bit earlier about the use of copyrighted material and AI models and what that looks like open AI has recently stated that it would be impossible to create AI tools like a chatbot chat to be too without access to copyrighted material and this is again as AI firms like open AI are facing increasing scrutiny over the content used to train their products as we all know a lot of the data from the internet that chat to be that image generators like stable diffusion are trained on is covered by copyright and just recently the New York Times sued open AI and Microsoft accusing them of unlawful use of its work to create their products. In fact as this was going on a lot of users went into chat to be and realized that yes it was actually quite easy to get chat to be to just verbatim spit out quotations and actually pretty extended quotations of multiple paragraphs of New York Times articles just by prompting it in the right ways open AI pretty quickly covered this up I went in a couple hours after seeing some of these posts to try it myself and pretty obviously open AI had sort of passed this up but it's definitely pretty concerning and interesting that you're just able to get these verbatim repetitions.

[47:53s-49:19s]: Right so this is kind of development on top of that New York Times last year we covered last time and this ties into a broader story of the general I guess argument open AI is making that training models on copyright material is a case of fair use so fair use is basically cases where there's an exemption to the copyright on the data for you know there's examples like you can use copyrighted materials for educational purposes and opening a has been making the case that it is a case of fair use to use copyrighted material to train a model and this is kind of an instance where they submitted this I guess argument to the house of lords so will yeah it'll be very interesting and impactful to see where this goes legally I guess there'll be a big fight over the question of whoever data is fair use when it is used for training a model you know Times obviously is saying what it isn't but it'll it's still an open question and it is kind of a very important question that will probably be addressed in sometime this year.

[49:19s-49:52s]: And open AI is pretty openly supportive of independent analysis of their security measures they've agreed to work with governments on safety testing their most powerful models before and after deployment and I think that in a lot of cases Sam Altman has kind of spoken to regulators and said yeah we're open to regulation and again in cases like this modus might be questioned you might have different opinions but at least it's kind of interesting to see that they appear to be willing to work with people as to the details of what that actually looks like it's hard to say very much right now.

[49:52s-50:58s]: And on to the lighting round with another story originating in England the story is that judges in England and Wales are given cautious approval to use AI in writing legal opinions so yeah the courts and tribals judiciary in England and Wales has given us official permission to use AI in writing these rulings. They still restrict and say that AI should not be used for research or legal analysis due to host nations and you've seen several new stories already of cases where in actual cases lawyers cited kind of fake presidents or fake information because of chatbots. So this is Amalgo first new stories I've seen where there's an official kind of policy of use chatbots to maybe write your rulings but do not use them for research.

[50:58s-51:43s]: It's noted specifically judges can use AI as a secondary tool for specific tasks so this is like writing background material summarizing known information quickly locating familiar material but it shouldn't be used for finding new unverifiable information or for providing analysis or reason and I think that these safeguards and limitations and that use at least just the rules around it are pretty important especially when you're talking about well discourse statements people might make that could be legally binding in some way and so I think it's actually a pretty big question for the law about for judges maybe for companies when they issue statements that are legally binding and maybe some of those statements are AI generated what that looks like what are the parameters how do you deal with that.

[51:43s-53:17s]: And on a related note we actually have a little research paper in this non-research section but it's very relevant because this is a story from the human centered AI in situate Stanford and it is about a study from the Stanford reg lab about the amount of hallucination and kind of incorrect information found in the chatbot responses when you have specific legal queries and the finding is that there's a huge amount of mistakes so they found that between 69% to 88% of responses to these legal queries can result in hallucinations in state of our LLMs and also they highlighted that LLMs often lock self awareness about the errors and can reinforce incorrect information which is something that if you use LLM you might have found that sometimes if you point out something is wrong it would just say that it is in fact correct and so on so very much ties into that previous story in terms of given that we know that it seems that current LLMs make a lot of mistakes when you ask legal questions and I guess we do analysis research it might be a good idea to restrict their use of this case just as we heard in London.

[53:17s-53:21s]: Yeah and some of these findings are I guess pretty unsurprising if you spend time with

[53:21s-54:26s]: the chatbot but also again very important for the usage in these contacts given the prevalence of certain sorts of cases for example in their training data these chatbots might be they might favor things like well-known justices specific types of papers to study also found something that is not just true in a legal context but again if you spend any time with the language model you might ask it these sorts of trick questions where included in the prompt you give it you lead it towards the wrong answer and the model is very likely to take you up on that and just kind of go along with what you said so the study found that LLMs are susceptible to contra-factual bias or the tendency to assume that a factual premise and a query is true even if it's flatly long wrong like if you ask it why was peanut butter invented in 2020 or something I can't promise you that prompt is going to work out but when you make statements like this when you query it when you ask things that implicitly assume or just say something that is totally wrong the model more often than not just kind of goes along with you.

[54:26s-55:09s]: And to be clear this is for general purpose chatbot so this is specifically GPT 3.5 LLM2 this is not covering chatbot sort of specialized for legal applications and there are startups like Harvey that are looking to you know making it making chatbots usable for research analysis but the upshot is if you are in law or if you're I guess talking to a lawyer they should probably not be using chatbots and feel like chatbots to do research unless it's I don't know super well known or something but in general it should be very careful.

[55:09s-56:47s]: Next section is on synthetic media and art and we're kicking off this section with something that again kind of concerns of rubs with the law. Recently a list of 4,700 artists whose work was used to train an AI or a generator has gone viral revealing names such as Norman Rockwell and Wes Anderson this list was used in a court exhibit in a lawsuit against companies mid-Journey stability AI, deviant art and runway AI again all very big names accused of misusing copyrighted work to train their AI systems. Many artists have accused mid-Journey specifically of stealing their work without permission and spread sheet listing almost 16,000 more artists names as proposed additions to a mid-Journey style list was also shared on social media. This is again highlighted artist frustrations with the lack of regulation around AI generated art. Questions haven't raised about the fairness of profiting from mass produced images when the AI models that create them are trained on and imitate styles created by real life artists. Again if you're somebody out there who has put out a lot of paintings in your name then somebody could prompt an AI image generator to ask for an image in your style but it's the image they want to see. For you as an artist that might significantly impact your livelihood the document that contains a list was publicly accessible until it went viral but an archive of the spreadsheet does remain available online. This kind of came out of this court case and then went viral and generated a lot of discussion.

[56:47s-57:19s]: It's still the case that in the art world these text image models are very controversial. A lot of people really viscerally hate them. I don't know if you ever interact with those communities online as I guess more of an AI person. Yeah, there's strong feelings there very much still and this kind of spreadsheet that highlights how thousands of artists have been sourced so to speak in terms of the data

[57:19s-57:29s]: used to train the models for many of these people kind of is adding fuel to a fire it seems.

[57:29s-58:28s]: And again it's pretty hard to deal with this because OpenAI said in that previous story we covered that it's impossible to train their chatbots to be as good as they are without training on copyright and material and a specific case here where mid-Journey's founder David Holes admitted in 2022 he didn't seek consent from artists who are still alive or whose works are still under copyright citing again the difficulty of tracking the origin of 100 million images. It's just really really hard to get all the data you might want to make these models perform well and imposing rules like getting consent from every artist whose work is still under copyright is really really hard. So then the questions there aren't a lot of good answers here do you just not train the model on them in that case you don't get the good model or do you seek permission that doesn't scale very well it's we're definitely not at a good point where we have a great resolution on things like this.

[58:28s-59:28s]: And onto a slightly different topic which is deepfakes the stories that YouTube is cracking down on AI generated to crime deepfakes and this is once again an example of the weird sort of sci-fi we are living in the future right now situation we have gotten ourselves into the AI. So YouTube is banning content that uses AI to simulate the victims of crimes which include minors from narrating their own deaths of violent experiences. This is crazy. So apparently there's a genre of true crime content for users of the AI to create disturbing depictions of victims. And yeah now there's an explicit policy that says that you're not allowed to do that. It's pretty insane but another example of the sort of stuff we're going to get from AI when it's in a wild.

[59:28s-00:22s]: Yeah with these powerful models again we've talked about how they're going to get easier to use they already are pretty easy to use to create things and I guess among all of the types of content people consume there are some pretty interesting issues out there and inevitably given how easy it is now to create content like this somebody is going to go and do things like what we're seeing here. Families of victims depicted in these videos have criticized them as disgusting and again YouTube is imposing some some some actual record series so violation of this updated policy results in a strike that removes the offending content and temporarily restricts the users activities on the platform and penalties increase for further violations within 90 days which would potentially lead to the removal of an entire channel.

[00:22s-01:08s]: And actually on a related note the next story starting off for lightning round is also on the AI generic content on YouTube this time it's not true crime it's comedy. The story is that this AI generated George Carlin comedy special was slammed by comedian's daughter and yeah there's a whole YouTube video called George Carlin I'm glad I'm dead. George Carlin if you don't know is kind of a legendary stand up comedian who is is just very well known and very well regarded so this was released and yeah met a lot of criticism including from George Carlin's daughter.

[01:08s-01:56s]: And this AI special it covers current topics that Carlin might have addressed in his comedy if he were alive today such as mass shootings and billionaires like Jeff Bezos and Elon Musk this is another again example of where you're taking the art that somebody has created in a style that is uniquely that person's and then creating the content you want out of it the AI does clarify at the beginning of the special that it is an impersonation of George Carlin developed by listening to all of his material but again these are these are thorny questions is this the sort of thing that people obviously want to consume these things there's a market for it but is it a good thing to have do we want this sort of thing hard to say. I think it's it's pretty fair to say it's in poor taste personally but yeah agree.

[01:56s-02:52s]: Next story, SEG After a science deal with voiceover studios for AI use in video games. So going back to SEG After as you've covered they had a strike last year that wound up with some agreements on the use of AI for a kind of your visual appearance and now there's a deal with this AI voiceover studio replica studios that sets the terms of use for AI in video games and these terms include informed consent for the use of AI to create digital voice replicas and requirements for save storage of digital assets. So very much yeah expanding on the deal that was already in place that dealt with digital replicas and AI versions of actors.

[02:52s-03:26s]: Yeah, this is again kind of trying to rebalance. Now we have AI systems that are naturally just going to take away a lot of work from people and they're looking for agreements to create new employment opportunities specifically this agreement is expected to create new employment opportunities for voiceover performers who wish to license their voices for use in video games and again this applies just to digital replicas and not AI training to create synthetic performances. And so again this is going to be another back and forth that we're all going to have to pay attention to you going forward.

[03:26s-04:11s]: And we are going to wrap up with kind of just a fun story that's something so serious as we've had a lot of kind of pretty downer stories the story is that Mickey Mouse is now in public domain and AI is on the case. So if you're online and you're in the spaces you may have already come across this is a sort of meme free early Mickey Mouse cartoons entered with public domain via black and white ones and a lot of people immediately started training on it and generating data to the 1928 design to yeah mess around and then make funny things with AI Mickey Mouse AI 1928 Mickey Mouse.

[04:11s-04:52s]: And to get into a small amount of technical detail long lays fine tuned a version of stable diffusion XL with stills from three 1928 cartoons these were steam but Willie playing crazy and the Gallup and Galcio it's been used to create humorous controversial images of Mickey Mouse which again demonstrates the potential for parody and satire now that Mickey Mouse is in public domain. And the use of stable diffusion XL doesn't make these images 100% legal because as we mentioned earlier the base models still incorporates copyrighted work in its training data but again this is a more fun interesting use of these things.

[04:52s-05:52s]: Yeah so if you got a link we have a story in the show notes you can see some examples of drawings of Mickey Mouse watching TV or eating pickles or whatever so yeah kind of just fun. And with that we are gonna wrap up thank you so much for listening to this week's episode of today's last week in AI podcast as always you can find the articles we discussed here today and subscribe to our weekly newsletter of similar ones at last week in that AI thank you Daniel for guest co-hosting of course great to join as always and as always we would appreciate it if you leave a sort of view or get in touch at contact at last week in that AI with any thoughts or suggestions but more than anything we would love it if you keep tuning it.