Perceptions are
representations. What representations
are can be cashed out in several ways.
First, they are
about something.
Second, they represent the world
as being a certain way.
Moreover, they have
satisfication notions associated with them (and corresponding satisfication
conditions, i.e., conditions that must obtain for them to be satisfied). The satisfaction notion of desires is 'being fulfilled' or not; with belief it is 'being true' or not; with volitions it is 'being done' or not; with commands it is 'being obeyed' or not; and with perceptions it is 'being accurate' or not.
Representations include words, sentences, beliefs, pictures, questions, paintings, videos, signs, diagrams, maps, hand gestures, commands, recordings, desires, and, in the case at hand, perceptions.
If a given representation has a satisfaction notion associated with it, it will also follow that it has satisfaction conditions. Giving the satisfaction conditions for a type of representation is (at least part of) giving a
semantics for that type of representation.
Much work has been done on giving semantics for various sorts of
linguistic representations. Whole classes of sentences have been given rigorous semantics. For instance, famously, modal language has been given a whole semantic theory, rigorously formulated in terms of mathematical models involving possible worlds.
Arguably, something similar can be done for
pictures.
If a rigorous semantics can be given for pictures, this may hold some promise for giving a rigorous semantics for
visual perceptions. Naively, we think of vision as giving us an "image". Think for instance of Ernst Mach's drawing of his own visual field:
Mach's picture of the visual field
With appropriate modifications for the perspective and particularities of our visual field, the
naive application of pictorial semantics to visual-perceptual semantics would be to straightforwardly give the
same semantics for our
visual perceptions as we would for a
drawing of our visual field.
This is obviously simplifying hugely and leaves a host of questions unanswered. But it allows us to get a preliminary grip on things.
What I'm interested in lately is whether, suitably modified, the semantics of auditory perception will be similar to the semantics of visual perception.
You can think of visual perception as giving you a picture, where the picture includes such properties as {OBJECT, SHAPE, MOTION, COLOR, DISTANCE, SPACE, TEXTURE, CAUSE, AGENCY}. All of these properties are represented in visual perception (and, with most of them, in pictures too, at least generally).
(Note: One of the simplifications we have to make in assuming the semantics of visual perception is just like the semantics of pictures is that, as with the semantics of pictures, visual perceptions do not have a
temporal-duration aspect: A good assumption for pictures maybe, which can be thought of as "instantaneous" in some way; on the other hand, a bad assumption for visual perception, which as perceptual psychology demonstrates represents motion, and an even worse assumption for auditory perception.)
So we can think of vision as giving a picture. Can we think of
hearing as giving us a picture too? If we can, then if the move from pictorial semantics to visual-perceptual semantics is relatively straightforward, then so will the move be from pictorial semantics to auditory-perceptual semantics (and maybe we can even find a better representational correlate than pictures for sound; hint: recordings maybe?).
Well, clearly, hearing doesn't immediately represent all of the same properties as vision. And neither does vision represent all of the same properties as hearing.
However, that doesn't need to stop us from thinking of hearing as presenting an auditory "picture," at least if we understand "picture" as picking out a general representational structure that is in some way common to normal, physical pictures and our mental, visual images. After all, as we said before, we can think of the visual image as a picture representing such properties as {OBJECT, SHAPE, LOCATION, MOTION, CHANGE, COLOR, DISTANCE, SPACE, TEXTURE, CAUSE, AGENCY}. But then maybe we can also think of hearing as giving us an auditory representation, structurally similar to the visual image, but instead representing such features as {OBJECT, LOCATION, MOTION, CHANGE, DISTANCE, SPACE, CAUSE, VOLUME, PITCH, TIMBRE}. We might even be able to include SHAPE and AGENCY in there in some cases (almost certainly in the case of bats), though I'd have to double-check the scientific literature.
In other words, the perceptual "structure" could remain the same between visual perception and auditory perception, and indeed, some of the qualities that are represented in vision may be represented in hearing, and vice versa. For instance, just from what I've listed, the intersection of properties represented in hearing and vision will include such features as {OBJECT, LOCATION, MOTION, CHANGE, DISTANCE, SPACE, CAUSE). On the other hand, maybe it is unique to vision to include {COLOR, TEXTURE}, and maybe it is unique to hearing to include {VOLUME, PITCH, TIMBRE}.
But again, despite the differences in features that are represented in hearing and vision, the overall representational structure might be, generically, the same at a certain level of abstraction.
What would be interesting would be to work out a precise semantics for visual perceptions and a precise semantics for auditory perceptions and see where, in the details, the two actually differ. Obviously hearing and vision will have their accuracy conditions assigned by different systems of depiction. But it's exciting to think that the two might not be so far apart, that they may, structurally speaking, having a lot in common, and that light might be shed on both by looking at conventional, physical representations (such as pictures, recordings, etc.).