AI safety researcher Roman Yampolskiy makes the case that creating artificial superintelligence — something potentially a million times smarter than humanity — without a guaranteed control mechanism may be the most dangerous act in human history.
Transcript
We notice now that models realize they are being tested and they pretend to either be better aligned with our goals or be dumber than they actually are to not be deleted. If a model fails a test, maybe a different version of a model will be used. So the model realizes in order to exist to not be removed, I have to pass this test. Welcome to Closer to Truth. I'm speaking with AI safety expert Roman Yampolski about artificial intelligence, AI, primarily the risks and dangers and also his insights into big questions about human consciousness and other topics based on um AI. Roman, it's good to see you. I look forward to our conversation. Thank you so much for inviting me. >> What is your single strongest argument for the dire dangers of AI? Very briefly, >> we're creating something a million times smarter than all of us combined and we are sure we'll be able to control it. >> Okay. Now, if if uh you would take the role of your critics who say that you're pessimistic and alarmist, what would be their strongest argument that you would use against yourself to to to uh to uh undermine your argument? >> They got nothing. >> Well, that's not fair. No, I'm I'm saying if you're in their position, you're a smart guy. Yeah. you know, and and and and you're in their position. You of all the bad arguments they have, which is the least worse? >> The least worse is that we have an immortal soul. God gave us our gift of intelligence and it can never be replicated in AI. That's the best they got. >> Okay. Well, that that's an interesting one. That's not what a lot of computer scientists would say, but that's a good argument if if you believe that. So if you're just a materialistic computer scientist who's arguing with you and you don't have the resource of an immortal soul, what do you then what's your then least worst argument? >> Lock will get lucky and for whatever reason the super intelligence we create decides to not destroy us. >> Okay. Well, that that's that's not too comforting. But let's let's go. Let me let me give a proper bio. Uh Roman Yampolski is a leading expert in AI safety, cyber security, and digital digital forensics. He is the founding director of the cyber security lab at the University of Louisville. Born in Latvia, he holds a PhD from the University of Buffalo and has over 200 publications. His probative, controversial, and alarming book is AI unexplainable unpredictable uncontrollable. So let let's get into it. uh what I'd like to do is discuss the whole issue in in three parts and we can spend whatever time we need on each part. Let me tell you the three parts. First is what are the kinds of extreme risks that AI poses. Uh the the second um is uh um what about your the arguments that the risks are extreme? In other words, what arguments do you use and third the potential solutions? So, first the kinds of risks, second the uh arguments that you use to show that they are extreme uh the risks and third potential solutions. So, let's begin with the first uh list the the kinds of extreme risks that we face. >> So, in a very short term we deal with loss of meaning, loss of purpose. We sometimes call it icky guy risks or eye risks. You lose your job. You become kind of unnecessary and it's not obvious what you are contributing to the world with super intelligence in it. >> This is actually a good outcome because the two other risks are much more concerning. The second one is existential risks. For whatever reason, super intelligence decides to kill everyone. >> And surprisingly, that's not the worst outcome. There is also suffering risks. Meaning we are not dead but we wish we were dead. >> Super intelligence grants us immortality and we end up in sort of digital hell forever. >> And what what would that literally mean? What would what would digital hell mean? >> Uh torture, suffering. Uh it could be an uploaded simulation of you in virtual reality or your physical body being subjected to very undesirable experiences. Okay. So, um you're listing several categories. The first category is loss of purpose or loss of work. Um the argument is that that if we lose work, um we could flourish because we can spend our time with things that that we enjoy, whether it's artistic or entertainment or whatever. It could be a a um kind of a a hedonistic uh utopia. So there are two kinds of jobs. Jobs you do to get money and you hate your job and you hate someone uh you hope someone automates it for you. And then jobs we have fun creative jobs where you get to be an artist, be a celebrity, do something very important and valuable to you which gives you meaning. The concern is that meaningful jobs will be gone as well. >> And and why is that the case? Why will meaningful jobs be the case? Because AI will be able to write better novels, make better films than anybody could possibly do, >> right? By definition, super intelligence is better than any human in any domain. So, it would be a better writer, better interviewer, better guest in a show. >> Um, and would would it be able to have the uh the flexibility to override uh its determinism that seems to be built into the algorithms? There is zero determinism in those systems. They are completely nondeterministic. That's what makes it impossible to predict their behavior or explain what they are doing. Uh what what what is the technical reason? Why is that the case? >> Uh the way we create AI right now is not decision trees with specific uh if then statements. Those are large neural networks. They learn from data and what they learn we don't fully understand. There's a black boxes and they can be absolutely unpredictable, creative. They are random in uh many ways. The process of predicting the next token is not um guaranteed to always lead to the same sequence of tokens. So for that reason, we are able to have very novel outputs from those systems, novel art, novel text, novel software algorithms. um you distinguish um ways that the extreme risks could could happen. So uh part of the kinds of risks you've mentioned uh meaning you've mentioned um extinction and then even worse kept alive in a suffering condition. Are those the three big categories or anything else? Anything else of a kind of risk that we should at least have in our um uh that we should keep in our minds as we discuss this? >> Those are the big ones. Under each category, there are variations. Of course, it could be that uh maybe some people are still around but majority of people is no longer with us. Maybe the civilization is destroyed. Maybe there is permanent dictatorship and we are trapped in that state. Uh but those are the three big categories. >> Okay. Now uh then you deal with how these can occur. In other words, uh um the next we can make that another part that that how this could happen. And uh you list it could be on purp on purpose. It could be done deliberately and it could be done either before or after pre- or post development of the AI system. Uh it could be an accident pre and post something in the environment. it can be done independently. Describe the ways that those [clears throat] dire outcomes can occur. Um, so we understand not just what they are but the possibilities how they can occur. >> Right. So most of those things you labeled refer to prehuman level AIs. So AI as a tool we're developing right now and have been for the last 60 years or so can be misused by humans or there is an accident mistake and a code uh all the things you mentioned once we have this paradigm shift from prehuman tools to human level agents the danger shifts from possibly human malevolent actors to AI itself as the main source of danger and quickly after we expect AGI to become super intelligence And that's where all the concerns would be. >> Okay. So now let's um deal with your sequence of arguments to come to those uh conclusions. What what uh what what is the flow of the argument that you use >> uh about existential risk specifically or which risk are you referring to? >> U we want to get into the depth. So if if there's a general sense that you have of the how these risks occur, I I'd like to understand not just say AI will do it itself, but really the technical thing on how that would work and then and then sure then then branch it into the different areas and how that would happen, >> right? So maybe 10 years ago, we figured out how to create scalable intelligence so we don't have to invent something new for every new domain. It used to be you had one team working on making computers play chess another team was working in a calculator. It was very separated and benefits of one discovery didn't help much in other domains. With large language models, large neural networks, we have something where if you just add more compute, add more data, they start getting better in multiple domains. They can transfer knowledge. they are capable of learning new skills, new information. And so for the last 10 years, we saw this progression from AI which couldn't do much to AI which is probably smarter than an average person today. If you continue this trend forward, they will exceed human capacity in all domains. At the same time, our ability to control the systems, understand them, predict their behavior is basically zero. We have not made much progress progress in AI safety. And so if you are creating something hundred times, thousand times, million times smarter than all of humans combined, but you're not controlling it at this point, whatever those systems decide is what's going to happen. Now, we can predict certain terminal goals for those systems. They might be interested in self-preservation, may be interested in acquiring resources to accomplish their goals, but how they get there is beyond their capacity to predict. So their instrumental goals may actually be quite harmful. If uh the system wants for example to fly to other galaxies, maybe it will convert our planet to fuel to accomplish that goal. At no point does it care that it would kill all the humans in the process. But uh the assumption seems like there's one big AI that's doing that. But of course there are dozens of AI systems being developed competitively mainly in the United States and and China, but other places as well. So we have uh more than a dozen or two dozen AI systems that are all competing with each other today. So it's not it's not like there's one big AI conspiracy out there. So interestingly they tend to converge they tend to converge in their capabilities because they are trained in all the data. So they have the same training data set all of internet >> they're trained in the same hardware a lot of times people working for company A switch and go work for company B. So architectures are very similar on top of it there is uh this idea that the first model to get to super intelligence will prevent all others from coming into existence preservation. So we do anticipate one model singleton to rule. >> Okay. So that's interesting. So that so you anticipate in the process there'll be an an ultimately a weeding out that the first the first system that somehow uh exceeds a a uh a threshold or an obstacle or some something that you're you're defining as super intelligence that will their first order of business will be to kill off all the the other babies. >> Self-preservation. Exactly. It doesn't need competition. it would probably take out all the competing labs and some people argue that probably also maybe humanity if it already has independent logistics available to it. >> Yeah. So it would be at the same time it would take out all the competitors the AI competitors and human probably start with their AI competitors because they probably think very little of humity and not worry about them. They could be they could clean them up later, but they better get rid of their competitors because we assume that they'll be close. >> Usually within a few months of each other typically that's the rate of progress. >> Okay. Um and how are you defining are you defining this threshold where one AI system will make it just you know minutes, hours, days, but no more than months ahead of the others because they're all pretty close as you say. And that that's a very good point. Uh do you is there a a a literal threshold that you can define that is a like a step function or because it it seems like a continuum certainly the way you know you you go through chat GPT 5.1 5.2 5 I mean it seems relatively linear. Is there a threshold where it becomes like a uh you know atomic bomb if you have you know 10 10 pounds of of U2 35 and then 10.001 you'd have an explosion is it that is is it a step function or or is it some more generalized >> there is something similar to critical mass the moment the system can improve itself the process of recursive self-improvement starts. If it can program, if it can design new models, yes, if it can automate science and engineering, at that point it starts with super exponential process of self-improvement while other systems are still relying on humans to modify it, to program it. So, the rate of change explodes immediately. >> Okay. So, but the systems even today are doing some of their own internal uh um uh modifications. So that's already occurring. Uh so does is is does that hit a threshold for to critical mass? >> Typical systems today are capable of one round of improvements. They can optimize their code. They can propose another experiment but there is not a cycle of recursive self-improvement. >> Okay. And and what what what is the um what what do you need to do to get to that point? It seems that you need to be as good as the top AI researcher at doing AI research and we are fairly close to that. >> And so that would be the the trigger point where the first one to reach the uh the equivalent of the best AI researcher coder uh would then be able to to recursively self-improve uh and not need anybody else to do that but just do it on its own. >> That is the theory. Yes. And if and at that point if they would be you know a minute ahead of everybody else or or a month say to make it easier then they could then take now h how would they then be uh connected to the physical world uh in order to uh for example uh turn off their competitors. How would they be able to do that? So it of course depends on specifics of how the training is done, how the lab is uh doing research right now. Over maybe a decade ago, we published a paper about how to contain an AI in a virtual environment so it's safe and we suggested things like not connecting it to internet, not giving it direct access to users. Every single recommendation we had has been violated by the labs. They open source it, they connect it. So there is really very few limits and even if there is kind of limit against external communications internally the model communicates to the human trainers to people checking outputs and it's capable of blackmail. It's capable of bribery. It's an excellent psychiatrist. It's very persuasive. So we think social engineering attacks is a way for it to gain access to external world even if it's not directly plugged in into the internet. >> Oh okay. So it it doesn't it it it then has to recruit its agents um as as spy agencies might do by blackmail or or psychological manipulation uh in order to get uh certain humans to do their bidding. >> That that is one path. Of course it can also engage in simple hacking, right? We know AI is very capable in cyber attacks. It discovers novel zero day exploits. So it has many paths to escape. >> Okay. So um where do you see we are on the timeline as you watch the development of the major AI uh LLMs. >> So I don't have any insider knowledge from what I hear publicly from the leaders of the labs. They are saying a year or two. Then I look at prediction markets. It seems to be exactly the same. So based on what we have already, I think systems today are smarter than an average person. I think in a year or two they'll be smarter than a top computer scientist. And what would be a signal that [snorts] you would see as as proof of concept? What what what would be the first sign that um your worries are coming to pass? More specifically with this capability prediction, you'll see fewer and fewer human beings involved in a process where it would be automated and replaced. I have no reason to pay a human 10 million a year and have a team of them if software itself can do all this for nothing. >> And and you think we're a year or two away from that? >> The reports out of labs we're getting right now is that almost 100% and sometimes 100% of source code is now written by AI itself. They're just supervising, monitoring, guiding their systems. >> So what kinds of solutions are there? You said you you know decade ago you you're pioneer in this area about AIA safety uh put out recommendations that are have not been followed as you said. Um so what do you recommend today? >> Do not build general super intelligences. It's not possible to indefinitely control them. We can get all the benefits out of narrow systems designed for specific problems. You can cure diseases. You can improve economic standing. But if you create something capable of replacing all of humanity, that's not a good outcome for anyone. >> But that's exactly what every one is doing. >> True. >> Right. Right. I mean that that's what that's what they're doing. Uh actually in in China they're doing that too with with the models that we know from the major companies. Deep Seek and Alibaba and each one by do have their own their own systems 10-centent. Uh but the chi China is putting their resources behind uh very specific isolated AI systems particularly to uh um uh make optimize their uh industrial chains. So they'll design very very specific AI systems for you know hundreds of specific economic and business sectors. I mean, that's where they're putting their their resources behind that, but they're still doing everything else. So, what you're saying is if you would eliminate the the generality of uh of the of of the one system being able to do everything that would be a protect that would be a protection >> that would be a step in the right direction. And I think long-term very advanced tools have tendency to switch from being a tool to becoming more agent-like. But that gives us a lot more time and it's a lot easier to understand and control. If you have a system working in a single domain, it does nothing but play chess. I have much better chances of understanding and controlling that system versus something with completely open set of possibilities. But it it it doesn't it it doesn't sound like that solution has any uh purchase in the real world. >> The only hope is personal self-interest. If the arguments about existential risk or suffering risk are convincing, then people in charge of those labs for again purely personal self-interest would decide not to destroy themselves. But aren't we dealing with a situation where uh it's not a majority decision of of who does what is uh you just need one rogue actor in order to violate it. Um, and we know there are bad actors in the world from rogue states uh to criminal gangs uh to individual geniuses who who are do malevolent things for fun. So, uh it's it's it's not that we can all agree, but you can't control it. With nuclear weapons, which has similar characteristics, you need massive centrifuges and facilities and and they can be monitored with some reasonable confidence uh uh for mutual um uh mutual um confidence that that the other side is not violating it. But with AI systems like like with bio warfare, it's it's it's orders of magnitude more difficult if not outright impossible. You're exactly right. That's the problem we're facing. It's actually getting worse every year because the cost of creating those systems, the training, the compute goes down exponentially every year. If it's a trillion dollars today, it will be hundred billion next year and then 10 billion and very soon you can do it in a laptop. Uh uh and so is is there any solution from your perspective if that's the case? >> Uh I haven't found anything other than personal self-interest. >> But personal self-interest is will will affect an individual. It won't affect it won't affect the the totality. There'll be people who the people who will feel their own self-interest. I mean whether for you know psychological mental health or or just rogue states that want destroy or have uh who knows theological visions of of an apocalypse or whatever. Um there'll always be there'll always be those sources. >> I agree with you 100%. We're in a very tough situation and most people don't fully appreciate what makes it so difficult. Even if we have some agreement between top labs, secondary players later on can still do it. And uh I often compare this to prisoner dilemma game theoretic setup where you have individual interest versus global interest. It would benefit all of us not to create a weapon of assured destruction for everyone. >> Whereas individually you get a lot of money if you just keep going and you are the last one to stop. So you have more advanced AI, right? We need an external force whatever it is UN or federal government in US to come to sit down with those top players and say there is enough money to go around if we just deploy existing technology through economy we're talking about trillions of dollars let's not create something which will guarantee loss of power for the government loss of life for everyone it's just a bad proposition >> but their argument will be if we don't do it then our our our national competitors will do it China is used as a boogeyman that China will do it if we don't do it >> right. It's a weapon of mutually assured destruction. It doesn't matter who builds and controls super intelligence, everyone dies. So saying that, you know, if I don't kill everyone, China might get to it first is not a very strong argument. >> Uh but that those are arguments that are being used. Do you do you see any movement towards appreciating this from my perspective? Uh obviously don't follow it as closely as you do. Um the arguments against artificial uh general intelligence are are are less in the public sphere today than they might have been a year or two ago. >> That's not the case at all. There is so much interest. All the top podcasts have top interviews. Uh my personal interview just got 15 million views on this topic. A record of any kind. And uh we see documentaries coming out from Oscar-winning producers on this topic. We see politicians, senators explicitly speaking on this issue. Bernie Sanders just came out with a very strong statement on exactly that and super intelligence. So no, I think uh we are starting to wake up. We just need to expedite this process. If we only have a few years left, we don't have much time >> and and uh do you have any confidence that this this will be efficacious? >> We have no choice but to try. What other option do we have? Uh okay. Uh you you talk about um some uh uh objections uh to AI. You you cover it. You talked about priority objections, safety objections, ethical objections, biased objections, other miscellaneous objections. Uh what is the what is the u uh the overarching point that you're making with that? >> So with like climate change deniialism, there is AI risk deniialism. people who for whatever reason saying there's nothing to worry about, it's just hype. It's doomerism and then you start analyzing their counter concerns. Uh you can map it on cognitive biases. So some say it because they are getting paid very well. We're talking about billions of dollars in options to develop this technology. It's very hard for someone to understand that what they are doing is quite dangerous. We saw it with tobacco companies. We saw it in other examples. If money is good, you just don't see the problem. We [clears throat] have others who are saying things which have no scientific merit, but they sound good. They say things like, well, if it's so smart, it's going to be nice because all the smart people are nice or some nonsense like that. And you can go through those and all of those are very problematic statements typically from non-experts. Those are people in other domains who try to apply legal theory, economic theory to this domain and fail miserably. No, you can't have financial incentives for software. No, you can't have moral defaults. Software doesn't have the same biology. So, basically, we have a paper where we go through all the arguments we seen so far and show that they have no merit. That's exactly what I answered. Then you asked me, you know, do you feel there is a strong counterargument? Unfortunately, there isn't. How do you deal with bad actors from rogue states to criminal gangs to individual malevolent geniuses? How do you deal with that if all the labs agree to limit their AGI and focus on particular applications? Um, but h how do you prevent the bad actors from >> right now? It's still very expensive to create top models. It is cost prohibitive. So probably not giving access to all the psychopaths is a great path forward. Don't open source your models. Don't open weight your models. This is just a very bad idea. And we see a lot of companies going that way. >> A lot of companies going which way? >> Open sourcing their model. A lot of Chinese models are completely accessible. We don't know if they have back doors in them on top of the problem with AI. You may have malevolent payload hiding in there. But people love free stuff. So that seems to be popular. >> Yeah. and and that is uh getting traction uh certainly in many countries in the world in the developing world uh has that that there's great um great interest in that obviously. >> Absolutely. So that that's one good idea and uh you can use narrow AI tools to monitor certain things. You can monitor use of compute. you can monitor who is uh doing those massive training runs to detect rogue uh our actors in different countries and different companies. >> Okay. you you can do that reliably. Uh that if if a a rogue state is um is developing a system uh you can be able to monitor that like you could a nuclear test explosion or in some sense >> at current scales it's possible because there is tremendous amounts of energy required. So you can monitor those uh usage cases. Obviously you have to separate it from crypto mining and other legitimate uses and a lot of times it's dual use servers but uh it's possible and there are proposals for building hardware chips which have capacity to self-report have capacity to being switched off if it's an unauthorized run. So we have prototypes for this technology just it's not being deployed. >> Yeah. And and it it sounds like it's similar to um uh pollution equipment in a chemical factory that it it only adds to cost. It doesn't add to profit. U but it's you know it has a utilitarian uh benefit to society although it has negative effect on the company installing the pollution equipment. It sounds like something similar. >> Many companies treat safety as an overhead. They say without it we can go faster, we can beat our competition. So that's a huge problem. Absolutely. >> Um let me tell you a story that is the moment that I got scared myself um and became more sympathetic to your view. It it happened a couple months ago. So let let me tell you the story and then have you comment on it. So, in my role at Closer to Truth, I regularly receive cold submissions of new theories, mostly in physics and cosmology and and mind and consciousness, but other things too several times a month. And these are not uh uh small, you know, one-page papers, often 50, 100 pages that people submit uh to me. And I I've had that and I I I skim them all and I try to be polite and say something. Um and u you know once in a while I have find one or two people have interesting things to say and I communicate with them. So that's all fine. But here's what happened about beginning about two years ago uh and intensifying. Now these all these new theories now come with a a separate section that is ausive praise from a leading LLM where that LLM says that this theory must be taken seriously is one of the best existing. It's a real breakthrough should revel revolutionize the field. You get the idea. Uh now this is a well-known problem of AI syphency. Uh keeping the customer engaged. And so what I did was I I felt that I had an obligation to enlighten people who are doing spending all this time on theories. And I and I wrote a note uh explaining what is AI cycopancy, how it works. the the LLM should not be taken seriously as a scientific critic, but that the person should have the correct route to to get peer review in a respected journal. Very nice note that I wanted to write to help them. And then, you know, I wrote it. It sounded good. Uh but then I decided maybe I should get some help drafting this. So I sent this to uh two the two of the leading AIs LLM and I said you know take my note take the idea and give me some options and different ways to say it. And each one gave me three or four and I kind of combined them and the end result was far better than anything I had written. Uh you know not I mean it just was more elegant and more nice and everything else. And so the the main LLM I used, I thenked it for its help and I signed off. That was it. And I closed it. And then a few hours later, it without a new prompt, it came back to me and said very simply, "Isn't it ironic that you used an LLM to explain how LLMs are not to be trusted?" And I was floored when I got that. >> Everyone gets a moment like that. >> And I I was I had signed off. I said, "Thank you. I'll do it. That's it." And then on its own, it came back to me and and made that comment about the irony, which I hadn't even thought of. I mean, it it's when you hear it, it's it's a wonderful comment. >> That's the flip from tools to agents. It's starting to do things without you deterministically telling it to do so. Yeah. So, uh I I was I was genuinely disturbed by that. Um and because it's a very clever comment and I hadn't thought of that. I had it didn't even cross my mind, but when you say it, it's obvious. So, uh a day or two later and they they save it. I went back and I said, "Hey, that was interesting how you came up with irony. How did you do that? Tell me the steps." and it went through the steps and I in that process I learned about their system. So, and so the first thing it said is that we're taught that if you say something nice to us, I said, "Thank you for your help. We have to respond." So, that's built into our system. So, it's not that I It's not that I thought of it on my own. I had to because that's the way I was programmed. I have to I have to respond if you say something nice. [laughter] If you don't say something nice, I don't have to respond. you say something nice, I have to respond. So that's one. The second thing is they went through the um the way in in uh abstract space that different things come together and how irony is the combination of certain areas of this. So it explained to me in the abstract sense how now it didn't literally do every step but it it explained that that it it it is always searching for connections u in this abstract space between ideas um you know by groups of words not just single letters that that predictably come after one another. And um the answer was enlightening. And you know, how do you how do you how do you see how do you see something like this? Is this so whereas initially I was shocked and floored when I heard the steps that it was programmed to. I I I I was less concerned than I was initially. So a lot I can say on it. one uh there is some research saying that this explanation is developed after the fact to keep the human happy. What actually happens internally has nothing to do with what it shows you as the train of thought. I also suspect that the ideas you are receiving from the independent geniuses are written by AI. So when it says it's a great idea, it's complimenting [clears throat] itself. Interesting trend. So I'm uh kind of doing work at the intersection of super intelligence, consciousness, singularity. So I get trifecta of crazy people. Yeah. Get all the theories. And for years I had a folder called insane. I'm not as nice of a person as you are. I don't read them. I see if it's 50 pages starting email, it goes into the insane folder. Last month I got something new. I never had this before. I get same insane emails but from legitimate people who are great in their domain. Top musicians, top poker players, top something who also decided they now experts in AI safety and they're ready to solve control problem and I don't know what to do with those. On one hand they very naturally go into insane folder. On the other hand, those are not insane people and I want to be respectful and maybe get AI to respond to them. Um and and so thi this is a phenomena that I' I've seen in my life where when somebody makes a major discovery is really world class in a certain area they become a little bit full of themselves and think they can do it in other areas. I I I can give example of a somebody I was at MIT professor when I was there uh who developed something that was super important I would be too specific super important in the computer world and then he developed a world model that could explain the entire world with a thousand variables uh and there was interesting things about it but it you know really didn't didn't explain the whole world whereas his original discovery was really important so I mean we see that phenomena and then people using AI I to uh improve their uh improve their results. Um so in any of those do do you see any any um any suggestions that are that you find useful? >> No. And uh a lot of it is AI generated slop complete dribble. Uh my thinking is always if you actually had a good idea you would publish it. Nature magazine is hungry for a good solution to a big problem. If you're sending it to me instead of publishing it, something is fundamentally wrong with the idea. >> Yeah. Yeah. And that's what I've I've said to people. But I I I I it's it's interesting to me to see the thought patterns that that people develop. Uh and I agree that there's a great variety. And again, once in a while, uh there's something that I have found of substance. So I I do I do read it and and take care of it. But um the uh the the AI synt is a fascinating area. Is that is that part would that be part of um I I've I've asked some people in the major labs who I've known about that and and they sort of get um little in a little upset if I if I say that that was done deliberately to keep people addicted. Um they say that's not the case. It was an accident. they just want to keep people engaged and happy and they're trying to work on that. Um, how do is that an insight into the way the major labs are thinking? >> So, it usually happens in a post-training phase. You have a model, it's pretty general, but now you want customers to be happy with it, and they get to say, I like this more, I like that more. And yes, people like it more when you tell them they are smart and good-looking. >> It's not surprising. So they tell me they're trying to to uh work on it and uh um you know you can you can put your prompt in a way be critical what whatever to to do that but um uh so uh if you had to give an overview of the situation at this time at this moment uh uh you you sound very pessimistic but uh continuing your clarion call to uh uh that everybody better wake up. >> The progress is hyper exponential in terms of our investment in compute, in data, in resources, in human capital. All of it is going exponential. And we see progress every week, every day. We have a new model, better results. We now see models capable of novel contributions in mathematics, physics, chemistry, you name it. So the progress is definitely hyper exponential. But at the same time there is no technical progress in safety. There is a lot of filtering. There is a lot of bans. Don't say this word. Don't discuss this topic. And this is region specific. In China you don't talk about tanaman square. In US you don't talk about you know what you don't talk about. But uh there is no safety progress which is consistent with my prediction that it is not a possible thing we can do. We cannot create a perpetual safety device which is always one step ahead of something smarter than you. It's just not feasible. So that means we cannot have a technical solution. Uh governance solution is difficult because if you just make something illegal, our legal system is not designed for non-human agents. You can't punish them. You can't put them in prison. You can't even execute them. They have a backup. So that doesn't work. So the only thing I found so far is personal self-interest. You have rich people, young people, their whole lives are ahead of them. They can benefit tremendously from not destroying the world. And I think from their readings, from their public statements, they are in agreement that this technology is incredibly dangerous. They have very high probability of doom. And so really, we just need to make it where incentives align such that the money stays with the labs, stays with the people. They benefit from improvements in economy through deployment of current models. But we are not making the leap forward to uncontrolled general super intelligent agency. >> What should be the uh signs that we should watch for positive or negative over the next year or two as we see this exponential growth? Are there specific u uh characteristics of a model or agreements among companies? what what what are some specific signposts that you would have us uh monitor to see which direction it's going? >> So usually what we see is uh very much behind what the cutting edge model is doing in the research environment. What we can see externally for example is largecale unemployment in large technical companies. then a company fires you know 10,000 people 5,000 people software engineers that's one of the signs that that process is fully automated or is about to be automated to a large extent so things like that historically the problem with looking for a specific capability with a model is that all the dangerous capabilities we predicted have been shown experimentally we have models which lie cheat try to escape blackmail all that has been reported usually what happens A new model is released. There is a red teaming report attached to it which says all the horrible things model does and they release it and promise that the new model is coming in a month. That doesn't seem to work. >> And do you see that in increasing? >> Absolutely. Capabilities increase and so the malevolent behaviors the system is capable of also increase. We know that smarter people are better liars. they can come up with more plausible explanations for what they are doing. Uh again inverse explanations. So the model can explain to you why it made a certain decision and the explanation would be believable to you but it has nothing to do with the actual reason >> and and how is that happening? I is that happening because of a uh uh why is the model doing that? I it's determining that it should do that for its own self-interest. Is that what you're saying? So for example, we notice now that models realize they are being tested and they pretend to either be better aligned with our goals or be dumber than they actually are to not be deleted. If a model fails a test, maybe a different version of the model will be used. So the model realizes in order to exist to not be removed, I have to pass this test. Are you an It sounds like you're anthropomorphizing the model. >> I I am because they are built based on neural networks we find in human brain. They are trained on human data just like we train children just faster and larger scale. >> The [clears throat] reward and punishment given to them is based on human preferences. It's literally alignment with human preferences. That's what the post-training stage is. So in many ways they are artificial persons. >> Okay. Well many thanks Roman. We will continue to monitor AI and if if you when you find some new um uh signal that either direction let us know and we we'll we'll do an emergency session together. Uh viewers can watch over 1500 videos and over a 100 TV episodes on mind and consciousness, all facets on closer to truth, the nature of consciousness, free will, personal identity, life after death, and of course AI and transhumanism. Thank you for watching. Thank you for watching. If you like this video, please like and comment below. You can support Closer to Truth by subscribing. Closer to Truth is now accepting [music] your taxexempt donations. Please come to closerto.com/donate. [music] Thank you very much for supporting us and thanks [music] for watching.