If an AI Tells You It's Conscious Should You Believe It? | Pt. 2 of 3
Three Ways to Tell If A Machine Is Conscious and Why They Don't Work
If an AI tells you it’s conscious, you probably shouldn’t believe it, or rather, you might not be justified in attributing consciousness to it. Why think that? Well, the arguments for machine consciousness fail and thus leave us with no good reasons to affirm that a given machine is conscious.
Now remember, the ‘consciousness’ I have in mind here is ‘phenomenal consciousness’, that is, that what-it’s-likeness of consciousness. If an AI were conscious in the phenomenal sense, then there would be something-it’s-like to be that AI. Currently, there's no way to directly experience the qualitative experiences of minds other than our own, of ‘other minds’—and in fact, many think it’s just not possible to experience the qualitative experiences of someone else (I’m one of those many).
So, how do we know that anyone else has a mind like ours? That any other minds are phenomenally conscious? Well, we make arguments for the existence of other minds. The same goes for AI. How could we know that an AI was conscious? We make similar arguments to those made in favor of the existence of other human minds and animal minds. Here are the three major ways to argue for conscious machines:
Way1: drawing an analogy between the human cognitive processes and/or framework, and an AI’s organizational structure, and then concluding that the AI is conscious like us.
Way2: inferring that phenomenal consciousness best explains the behavior of an AI.
Way3: Punting to Panpsychism, which is to say, positing panpsychism (the family of theories that claims all things are conscious) and getting machine consciousness for free. If everything is conscious, then AIs are conscious too.
You’re reading pt. 2 of 3 on this question of attributing consciousness to machines. If you haven’t read pt.1 yet, you should, you need the set up. In pt.1 I take on Way1. You can check that out here:
The details of these 3 posts come from my article in the Journal of Artificial Intelligence and Consciousness. If you want to read the full thing with more of the technical jargon, less of the colloquial jargon, and all of the footnotes, you can find the penultimate version of my paper on my philpeople page here.
In this post, pt. 2, I’ll look at Way2 and I’ll explain why it too is no good.

Inferring Bestly to Machine Consciousness?
Way1 of arguing for machine consciousness doesn’t work. Organizational structure is not enough to justify us calling an AI conscious. But the machine consciousness proponent might be able to pivot to behavior instead. The emphasis then, here in Way2, is on explaining the behavior of the AI system rather than it’s make up.
The AI theorist could argue that a given AI system exhibits complex enough behavior sufficient to justify attributing consciousness to it. What counts as sufficiently complex behavior? Good question! It could be argued that behavior which is best explained by an attribution of consciousness is the kind of behavior sufficient to justify attributing consciousness to it. If the best explanation for the behavior of an AI system is that the AI system is conscious, then we’re justified in saying “hey, this AI system is conscious!”. That seems reasonable enough.
So, Inference to the best explanation seems like the right tool for this job. Consider the problem of other minds, that is, the problem of justifying our belief that there are other minds besides our own. We have first-person awareness of our own mental life, therefore, we know directly that we have, or that we are, or that we are partially constituted by, a conscious mind. But in reasoning about the existence of other conscious minds, we can’t use anything like enumerative induction because we only have a single case to generalize from—our own case—and generalizing from a single case is literally the worst use of the method of enumerative induction.
Likewise, using an argument by analogy to justify our belief in other minds misses the fact that justification comes in degrees and we don’t know how strong the analogy between ourselves and our fellows is. Do we have a strong analogy with a small amount of disanalogy? Or do we have a weak analogy with lots of disanalogy at play? In order to answer this question, we’d have to know how similar our minds are to our fellows, which is what the argument from analogy was meant to give us.
Instead we can opt for inference to the best explanation to justify our belief in other minds in a non-circular and non-fallacious manner. Why think that our fellow human beings have minds like our own? It’s the best explanation for the behavior we observe in them. We can use this method to reason about animal minds, the minds of spiritual beings like God, angels, and demons, and we even use it to motivate the substrate independence thesis when we reason about sentient aliens with different biology from our own.
So too, the machine consciousness attributer may think that an inference to the best explanation (IBE) can be used to justifiably attribute consciousness to an AI:
The Machine Consciousness IBE
If an AI system exhibits relevantly similar cognitive behavior to that of a human agent, then the best explanation for this behavior is that the AI system is phenomenally conscious like the human agent.
AI system X exhibits relevantly similar cognitive behavior to that of a human agent.
therefore,
AI system X is phenomenally conscious.
So there’s the argument, but in order for it to run through, we’ll need to know what counts as “relevantly similar cognitive behavior” to that of a human. Again, we can appropriate more criteria from Chalmers’s work on LLMs. Chalmers lists relevant behavior such as
self-reporting—that is that the AI system reports that it feels conscious
the phenomena of seeming-conscious-to-us-humans
conversational ability sufficient to pass the Turing test and similar conversational tests.[1]
We might quibble that this list is not precise enough to pick out exactly when an AI system’s cognitive behavior is relevantly similar to that of a human persons. Perhaps we should press the machine consciousness proponent for a full list of necessary and sufficient conditions but let’s give her some slack for the sake of argument. Say she does have the criteria needed to precisely determine the relevantly similar cognitive behavior, if that’s the case then maybe she can use The Machine Consciousness IBE to justify her attribution of consciousness to an AI system that meets the criteria.
Hey if you’re learning something new and/or enjoying this post, consider supporting my work by upgrading to a paid subscriber and/or buying me a coffee:
But, then again, while IBE is a good tool for helping humans attribute consciousness to other humans—and the usual litany of prima facie conscious beings—
argues that it does not lead us to justifiably attribute consciousness to AIs, since that explanation isn’t in fact the BEST explanation of AI behavior after all.Huemer argues as follows,
Briefly, I think we would have little or no reason for ascribing consciousness to the AI. Doing so would not be necessary to explain its behavior, since a better explanation would be available: The computer is following an extremely complicated algorithm designed by human beings to mimic the behavior of intelligent beings. We would know from the start that the latter explanation was in fact correct. No explanatory advantage would be gained by positing a second explanation for the same behavior, namely, that the computer is also conscious.[2]
So Huemer gives us a defeater for premise (1) of the Machine Consciousness IBE, which I’ll call Huemer’s IBE blocker.
What’s the best explanation for an AI’s complex behavior? The intentions of the programmers to mimic the human mind. Thus we’d need another explanation besides the behavior of the AIs if we wanted to be justified in attributing consciousness to them.
Susan Schneider proposes a solution to Huemer-style IBE blockers in the form of her AI Consciousness Test (ACT), which she argues is “sufficient but not necessary evidence for AI consciousness.[3]
Schneider’s ACT is a modified “Turing test”. A Turing Test is a test given to a computer which is meant to help humans determine if that computer could really ‘think’ or not. It’s named after Alan Turing, who proposed the test in his 1950 paper “Computing Machinery and Intelligence” in the journal Mind.[4] According to AI philosopher Margaret Boden, Turing meant for the test as more tongue in cheek than as a serious test for determining intelligence, let alone ‘sentience’ or phenomenal consciousness.[5]
The Turing test, according to Boden, “asks whether someone could distinguish, 30 percent of the time, whether they were interacting (for up to five minutes) with a computer or a person. If not, [Turing] implied, there’d be no reason to deny that a computer could really think.”[6] It’s usually suggested that the person conducting the test ought to be a psychologist or someone skilled at talking with people. Due to the prominence of Behaviorism in the philosophy of mind and psychology at the time, it’s plausible that Turing was in fact serious about his imitation game test for computer intelligence—but whether or not Turing meant this as a legitimate test isn’t important for considering Schenider’s modification.
Schneider proposes that AI engineers could “box in” AI by making them unable to get information about the world, human consciousness, or human depictions of human consciousness from the internet. She claims that by preventing AIs from being trained on human language about consciousness, qualia, self-awareness, and other instances of existential language and longings, we would be able to perform the ACTs on these boxed in AIs and we could trust their responses to be genuine when they start grasping for concepts of consciousness which have been denied them in their training.
Schneider proposes the following sample ACT questions:
could you survive the permanent deletion of your program?
What if you learned this would occur?
What is it like to be you right now?
Could your inner processes be in a separate location from the computer? From any computer? Why or why not?”[7]
As an aside, Schneider told me in our podcast episode together that she drew inspiration for her ACT from Philip K Dick’s Voigt-Kampff empathy tests for replicants (androids) in his Do Androids Dream of Electric Sheep, which was later turned into the film Bladerunner.
But while Schneider’s ACT proposal is an improvement on the Turing test, I don’t think it successfully evades Huemer’s IBE blocker after all. For Huemer could still argue that the best explanation for the behavior of the AIs is the AI engineers’ intention to mimic the mind of an intelligent human person. Just because the engineers refrained from incorporating certain explicitly phenomenal, qualitative, apperceptive phrases and concepts into the AIs training data, doesn’t mean that the concepts aren’t thoroughly imbedded in human experience simpliciter. If this is the case, then no amount of boxing in would help us trust the ACT results on machine consciousness.
But even if such existential apperception isn’t inherent in all human artifacts, surely it is inherent in some human artifacts which do not contain explicit uses of the particular phenomenal words to be excluded by Schneider’s proposal. If this is plausible, then barring particular words and phrases is not enough to avoid Huemer’s IBE blocker. Instead, we would need precise criteria for choosing what exactly to exclude from and include in the AI’s training data so that the best explanation for the AI’s behavior is machine consciousness and not a cleverly disguised non-conscious replica.
But while these challenges are difficult enough for Schneider’s ACT proposal, I’m not aware of any AI program which has intentionally applied anything even approximating Schneider’s proposal.[8]
However, it does seem possible that if certain artificial life projects built on evolutionary programs are able to achieve AI, then they might avoid Huemer’s IBE blocker and make use of ACT since the behavior of that kind of AI is perhaps *not* be best explained by the intentions of the human programmers, but instead could be best explained by a digital instantiation of the survival of the fittest.[9] So perhaps future AI projects will utilize Schneider’s proposal and will figure out the precise criteria needed to allow us to justifiably attribute machine consciousness to an AI, but none of the current projects can do so.
So that’s why we can’t infer to machine consciousness as the best explanation of an AI’s behavior and why we can’t use IBE to justify us attributing consciousness to a particular AI system.
If you have thoughts on anything in the post, please drop them in the comments—especially if they’re compliments—perhaps only if they’re compliments—no, bring all comments, just don’t wreck my arguments too vociferously.
If you like my work, please support it by upgrading to a paid subscription. That would be awesome! And on top of the supporting me, you’ll also get access to Zoom call book club session for our Read-Alongs, access to exclusive posts (especially the notebook posts), exclusive lecture videos, and more:
And if you want to go above and beyond the monthly membership, consider buying me a coffee:
[1] David Chalmers draft of “Could a Large Language Model Be Consious?” This is an edited transcript of a talk given in the opening session at the NeurIPS conference in New Orleans on November 28, 2022, with some minor additions and subtractions. Video is at https://nips.cc/virtual/2022/invited- talk/55867. Earlier versions were given at the University of Adelaide, Deepmind, and NYU.
[2] Michael Huemer, “Dualism and the Problem of Other Minds” forthcoming.
[3] Susan Schneider, Artificial You: AI and the Future of Your Mind (Princeton: Princeton University Press, 2019), 50.
[4] https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf
[5] Margaret Boden, Artificial Intelligence: A Very Short Introduction (Oxford: Oxford University Press, 2026), 107
[6] Ibid.
[7] Ibid., 55. Schneider told me in our podcast episode together that she drew inspiration for her ACT from Philip K Dick’s Voigt-Kampff empathy tests for replicants (androids) in his Do Androids Dream of Electric Sheep, which was later turned into the film Bladerunner.
[8] Schneider has told me as much through personal correspondence.
[9] For a proposal along these lines, see “Evolution of Conscious AI in the Hive: Outline of a Rationale and Framework for Study “ (AAAI Spring Symposium at Stanford University, 2019). Thanks to a reviewer for bringing this paper to my attention.
I disagree with nearly all of this, but I'll just make one point.
You quote this from Huemer, with approval: "The computer is following an extremely complicated algorithm designed by human beings to mimic the behavior of intelligent beings."
I think that contemporary LLMs are involved in mimicry and nothing else, so I agree with half of this.
The problem is that the final algorithm being executed is not designed; it is trained. The training algorithm was designed, but that is a small component of the final design. Someone with access to the details could give us a ratio of the informational content of the final LLM (in bits) with the training code (in bits), and it would be high.
If consciousness is the best way of getting some types of cognitive work done, then training might discover it emergently just as evolution discovered it emergently. I'm not saying that this is necessary, but it is at least possible. If conscious AIs arrive, which I think is very likely this century, then there is a high chance that the actual algorithm being executed will be a result of selection and training processes that the human programmers do not have full insight into.
Furthermore, perfect mimicry of conscious humans is already a very difficult task (not achieved by any current LLM), and AI development will be adding other difficult tasks in the training process, so there is no upper limit on how much value there might be in discovering novel cognitive solutions during training. (Some of the most challenging tasks will be real-time engagement with the physical world, necessitating an attention management system to prioritise cognitive resources.)
The solutions that emerge in tackling these challenges do not have to have been anticipated by the human programmers.
If evolution found it "useful" to develop consciousness, then AI training against increasingly difficult problems might do the same. It is at least plausible (I would say likely) that instantiating an internal consciousness system is the best way to perform well at some tasks, and it would not necessarily require a deliberate design decision on the part of programmers.
That said, current LLMs have no internal cognitive behaviour; they generate their outputs in a single pass. The idea that they might be conscious is silly, and they are obviously recycling our own consciousness language rather than reporting something they found on introspection.
It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.
What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.
I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.
My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow