This paper is concerned with people's ability to
transfer visual images using language. An explorative study was
carried out where video and audio recordings were made of two
subjects reconstructing visual images. While subjects could not
see each other, one person described the image and the other person
made a drawing. The data was analyzed from three perspectives.
A concepts analysis revealed that high-level concepts were used
to set the scene, guide the drawing and had a confirmatory function,
while low-level concepts were used to get the details of the image
right. An analysis of focus showed that the describer used the
strategy of first giving an overview followed by details, and
that focus can be divided into hierarchical modules. A syntactic
analysis demonstrated the functions in the discourse of nominal,
pronominal, and prepositional phrases, and verbs.
People are quite easily able to describe a visual image using words. Likewise, people can draw an image from a description in words. Thus, people have the ability to transform pictures into words, and the ability to transform words into pictures. How do people manage this? How is language used by one person to transfer the spatial information contained in a visual image, so that it can be re-created by another person?
In this study, two persons re-created several visual images. While the persons were sitting with their backs towards each other, one person described the images while the other person attempted to draw them. The number of participants was limited to two subjects and the data to two pictures. The purpose was explorative; the aim was to generate further hypotheses.
Using audio and video recordings of the conversations, we set out to investigate three issues concerning the language used to transfer visual images. First, we look at what types of concepts occur and how these are used in the description of the image. For instance, are there high-level concepts, like familiar everyday items, or are there low-level concepts, like points and lines? Second, we analyze what subjects focus on as the image is recreated. For instance, the size and the location of objects depicted in the image. Third, we study some syntactic units and their function in the transfer of the image. Some aspects of the structure and meaning of noun phrases, pronominal phrases, and prepositional phrases were described. The very limited number of verbs was subcategorized with regard to their meaning and the communicative function. An attempt was made to show the meta-communicational awareness in participants.
The material consisted of four transcribed discourses between subjects. Two of the discourses were analyzed further into intonation units, based on Chafe's theory of the flow of consciousness [Ref. 3]. The other two discourses were transcribed roughly just to show the flow of conversations.
This paper is organized as follows. First, the experimental
situation is described. Then follows three sections where the
data is analyzed in different aspects: concept usage, focus, and
syntactic units. Last, issues of further research formulate the
hypothesis that are the result of this examination. The data from
the experiment can be found in the appendices (A,B, and C) at
the end of the paper.
The main objective of the experiment was to let one
person, called A in this paper, describe two pictures for another
person, called B. The purpose was to let B create a new picture
on basis of spoken information from the first person (A). The
whole dialogue between the participants, the describer (A) and
the drawer (B), was recorded on video- and audiotape. This material
- recordings and drawings - formed the base of the analysis in
this paper.
Both participants, two males, 22 and 23 years old
respectively, were unknown to each other. Both were undergraduate
students at the Lund University. One participant had studied medicine
for about four years. The other one was a third year undergraduate
engineering student. The medicine student proved to be the most
communicative of the two and was therefore chosen as the describer.
The engineering student became the drawer. During the experiment
each of the participants sat at its own table. They were turned
back to back to avoid any eye contact.
One picture was taken from a famous Swedish book
for children [Ref. 1]. It showed a typical setting on a tray [Fig.
1]. The picture presented a tray with pitcher, cookie-plate and
four glasses with straws. The other picture was taken from a collection
of Escher drawings [Ref. 2]. It represented a kaleidocycle figure
[Fig. 5].
The video camera was fixed on the paper of the person who was doing the drawings. Only the creation process of the new pictures was recorded thus other aspects of the participants' behavior such as eye movements of the describer, non-verbal communication, etc. were neglected. Time was recorded on the video tape during the experiment.
Two audio recorders were placed in front of each
participant. All stages of the experiment were recorded except
for one, ESCHER SQUARE. Double recordings gave two somewhat complementary
recordings. Utterances of the drawer were clearer on the tape
that was placed on his table. And utterances of the describer
were clearer on the tape that was placed on his table. Double
recordings proved to be very advantageous during transcription
as some missing or unclear parts on one recording could be reconstructed
from the other one.
Four out of six experiments were transcribed. The transcriptions of ALFONS 2 and ESCHER 1 presented in Appendix B (Table 5 and 6) were rendered only as rough transcriptions and were not divided into intonation units. They just show the flow of conversation between the subjects, which was important for two of the analysis in this paper, concept formation and lexical analysis. ALFONS 3 and ESCHER 2 were transcribed into intonation units, however the pause duration was not measured (just subjectively estimated) and the analysis of accents (loudness or pitch variations) was not done. The transcriptions are presented in Appendix B (Table 3 and 4).
The following transcriptional notations were applied:
| ?? | - unclear text, no transcription given |
| [ ] | - a part of speech that overlaps with another part of speech |
| . | - a short pause |
| ... | - longer pause |
| (long pause) | - a comment that a pause was caused because the subject was doing the drawing |
The instruction given was: Describe the given picture, the original Alfons [Fig. 1], as precisely as possible in order to create a drawing which should be as close to the original picture as possible. No limitation of time was given. The experiment was stopped after 25 minutes when only two of the 18 objects from the picture were described and drawn. It became clear that the drawing of the picture would not be accomplished within a reasonable time. Because of its length the dialogue has not been transcribed.
After this first stage five additional stages were conducted. The drawn picture of the ALFONS 1 experiment is presented in [Fig. 2].
The participant A was asked to describe only one object from the original Alfons picture, namely the pitcher. No time limits were given. This stage was completed after 9 minutes when the pitcher had been drawn [Fig. 3]. A transcription of the dialogue is present in [Table 5, Appendix B].
The participants were asked to describe and draw all objects on the tray in the original Alfons picture taking mainly the relations between the objects into consideration. No time limit was given. This stage lasted for about 7 minutes and the whole picture was completed [Fig. 4]. The transcription is presented in [Table 3, Appendix B].
The describer received the original Escher picture [Fig. 5] and was asked to describe it to the drawer. No time limit was set. This stage was interrupted after 18 minutes. Then the figure had been drawn except for the middle part [Fig. 6]. A transcription is presented in [Table 4, Appendix B].
The describer received another Escher picture and was asked to describe it to the drawer. No time limits were given. The figure was completed after 8 minutes. The results of this stage of the experiment were not analyzed as the recording was not available (the conversation was not recorded).
The describer received the same Escher picture as
used in ESCHER 1. The participants were given 10 minutes to describe
and draw the picture. The task was completed after 11 minutes.
The whole figure was then completely drawn [Fig. 7]. The transcription
is presented in [Table 4, Appendix B].
When studying the data from the experiments, one
notices that a number of concepts are deployed in the language
used to describe the visual images. What kinds of concepts are
used? Are there high-level concepts, like names of familiar objects
(such as a tray or a glass)? Or is the image described using low-level
concepts like simple geometrical shapes (such as lines, circles,
and triangles)?
If high-level concepts are used, this would mean that the parts of the image were constructed using a top-down approach. That is, if a part of the image is described as "a tray", the person making the drawing would have to fill in his own details of how a tray typically looks like. This makes the transfer quick and efficient, but has the drawback that details might not be drawn correctly, since they are not stated explicitly by the describing person. A top-down approach relies heavily on shared background knowledge of the objects described. When knowledge of these objects differ between two persons, the transfer is likely to be poor.
Conversely, if low-level concepts are used,
this results in a bottom-up approach to reconstructing the image.
The image is deconstructed by the describing person into atomic
visual concepts, which are then described to the person making
the drawing. The drawing person does not know what he is currently
working on since all he gets is a number of details to draw. Eventually,
the details add up to complete an object on a higher level as
part of the image. The bottom-up approach results in good transfer
of details. But without top-down guidance, the details could easily
add up to something not resembling its corresponding object in
the original image. Low-level transfer is also slower and more
inefficient, because many low-level objects are needed in order
to transfer one high-level object.
The concepts used by the subjects in the experiments were categorized in the following way. All concepts naming everyday, concrete things were put under the label "high-level concepts". All concepts that were not names of concrete objects were labeled "low-level concepts". In the first class, it could be noted that objects sometimes had hierarchical relations, so that pupil (pupill) is part of eye (öga), which is in turn part of mouse (mus).
The categorization of concepts from the data into
low-level and high-level was not always straightforward. Some
concepts, like base (bas), edge (kant) and tip
(spets) could be considered low-level concepts, although they
are not purely geometrical concepts. Rather than being low-level
concepts that are used in transfer, they are used to move the
drawing person's focus to a certain part of a high-level object,
like pitcher (tillbringare). If the describing person says
"now consider the base of the pitcher", the concept
base (bas) is not used as a geometrical, low-level concept,
but rather serves the function of moving the focus to a part of
the pitcher object. This observation lead to the splitting of
the low-level class into two classes. One for low-level geometrical
concepts, and one for low-level concepts which are neither geometrical
nor concrete objects.
As was described, using either of these two approaches alone, top-down or bottom-up, can lead to problems in transferring of the visual image. Let's look at what happened in the experiments (data from the experiments ALFONS 2, ALFONS 3, ESCHER 1, ESCHER 2 was analyzed).
An analysis of the data from the experiment ALFONS 2 shows that from a total of 60 occurrences of concepts, 22 were high-level and 38 were low-level, as shown in Table 1. Both high-level and low-level concepts were used. The almost double usage of low-level concepts compared with high-level concepts is not surprising, since it in general takes several low-level objects to 'build' a high-level object.
The data reveals that high-level and low-level concepts are used at different times and for different purposes in the reconstructing of the image.
First, high-level concepts are used to "set the scene". The overall scene in the image is named using high-level concepts so as to lay the foundations for the drawing person. Later, low-level concepts are used to get the details right. This order of attending, first to the "scene", and then to the details in it, is further described in the analysis of focus described later in this paper.
In the experiment it was noted that the usage of high-level concepts, like pitcher (tillbringare), ensured that the low-level objects did not add up to something not in the original image.
High-level concepts are used by the drawing person to confirm what is drawn. When the drawing person is uncertain of exactly where a detail should be, he asks for confirmation using high-level concepts. An example of this is found in the experiment ESCHER 2 (see Table 4 in appendix B for the transcription). The drawing person has been instructed to draw some lines and then asks for confirmation of what he has drawn by asking if it looks like some sort of strange shark's fin (ett slags konstig hajfena).
Describer: so this whole area is black
Drawer: .. so
Drawer: .. okey it
Drawer: .. whole area
Drawer: .. it looks like some sort of strange shark's fin
Describer: exactly
Sometimes, the use of high-level concepts leads the drawing person to include features not in the original image. Figure 9 shows how the drawing persons modifies, or amplifies, the characteristics of the high-level concept used by the describing person to illustrate the shape of the pitcher object. The describing person used the concept woman who is sitting naked (kvinna som sitter naken) to describe the shape of the pitcher. Apparently, the drawing person had a different interpretation of the concept than the describing person, making the shape more exaggerated than the original.
Figure 9. The contour of the pitcher described as that of a woman
who is sitting naked. To the left is the original image and
to the right the drawn image.
In the experiment ALFONS 3, subjects were instructed to concentrate on the spatial relations between the objects rather than the exact appearance of each object. Here, one finds a drastic reduction of the number of low-level concepts. Of the 79 occurrences of concepts in this experiment, only 10 were low-level geometrical concepts. This is natural, since no great detail was needed in the drawing. High-level concepts were used here to refer to the objects in the image so that they could be placed in their correct spatial configuration. The reduction of low-level concepts can also be attributed to the fact that subjects already had described the objects in an earlier experiment (ALFONS 2). The appearance would then already be known to the drawing person, making a repeated detailed description of the objects unnecessary.
In the experiment ESCHER 1, the abstract nature of the image posed a problem for the subjects. At first, the describing person did not recognize any familiar, everyday objects in the image. It could be seen that subjects started to use some low-level geometrical concepts, but soon established a shared analogy with mice (as they thought some parts of the image looked like mice). This analogy was then used in the remaining part of the experiment. We thought before the experiment that no familiar high-level objects were present in the image. We anticipated that this would lead to a low frequency of high-level concepts. But as subjects used the mouse-analogy, the usage of high-level concepts turned out to be as frequent as low-level geometrical concepts.
The experiment ESCHER 2 was very similar to ESCHER 1, with the exception that as the mouse-analogy was already established, it was used by both subjects right from the start.
As a final remark it can be noted that subjects used both concepts
occurring in the original image as well as objects which did not
occur. For instance, the naming of pitcher (tillbringare)
and glass (glas) is hardly surprising, since they are clearly
recognizable in the images. But the occurrence of necklace
(halsband), tree trunk (trädstam) and woman
who is sitting naked (kvinna som sitter naken) are examples
of concepts not depicted in the original image. The describing
person uses these concepts to produce internal images in the head
of the drawing person, making analogies between the appearance
of these objects and the things the drawing person is supposed
to draw. The most obvious usage of concepts not occurring in the
original image were experiments ESCHER 1 and 2 (using the abstract
Escher image). As noted before, here subjects developed an analogy
with mice, which they elaborated on in both these experiments.
Various types of concepts were found in the data.
The concepts were categorized into high-level and low-level. Both
classes were frequent in the data. It was discovered that high-level
and low-level concepts are used for different purposes in the
discourse. High-level concepts are used to "set the scene",
guide the drawing process, and have a confirmatory function. Low-level
concepts are used to get the details right. High-level concepts
can lead to poor transfer because they rely heavily on shared
background knowledge of both participants. If the participants'
knowledge differs, the usage of high-level descriptions can lead
to that the wrong things are drawn. Even when an image did not
contain any obvious high-level concepts, subjects imposed a high-level
interpretation on it, making the usage of high-level concepts
as frequent as low-level concepts.
Human beings can only concentrate on a few things at any time (Chafe [3]). In order to be able to perform complex tasks that are impossible to concentrate on as a whole, a strategy must be developed for changing the concentration between different parts of the task. The strategy manifests itself in changes of focus occurring through time. The task of transferring pictures through spoken language is an example of a complex task.
By focus is here meant the active information present in consciousness according to Chafe's [3] theory of active, semiactive and inactive information.
Data for this analysis
was taken from the ALFONS 3 experiment. The transcription of this
experiment can be found in Table 3. A foci analysis of the experiment
is found in Table 2. This text builds upon the results obtained
in Table 2.
The participants followed a certain strategy when constructing the picture [fig. 4] for the ALFONS 3 experiment.
The describer started out by creating the "scene" for the picture (Table 2: intonation units 1-15). The drawer was told, in a short and high-level way, the contents of the picture (tray, pitcher, plate, glasses).
The describer then started going through the scene-objects one by one (Table 2: foci column). The typical focus pattern used was to first focus on one object in the picture by mentioning it by name and then establish looks, location, size, relations etc. by several different subfoci.
A third thing that was noticed was that once the
participants had focused on an object in the picture they did
not leave it until it was drawn.
Table 2 attempts to divide the spoken language transcript from Table 3 into foci modules. By a foci module is meant a passage of spoken language that has the same focus. It was found that the division into foci modules came quite natural. It was also found that the modules could be classified into a few categories, almost resembling a checklist for the picture transfer problem.
The foci modules was organized into a hierarchical structure. In the hierarchy foci are present at different levels. When moving between levels in the hierarchy it is assumed that subfoci expands or emphasizes part of the above foci. Thus the lower the level the more specialized is the focus and the higher the level the more abstract. When moving between different focus at the same level, nothing of the previous focus is remembered. In Table 2 this is manifested as three columns describing focus at different levels (foci, subfoci and subsubfoci).
In the experimental situation there was a clear division
between the focus level and the foci sublevels. The focus level
only dealt with objects in the picture while the subfoci levels
dealt with various attributes of the objects.
As mentioned above the various kinds of subfoci could be classified into categories. A description of the categories found is described below.
These foci categories were used to "set the scene" before a more detailed description took place.
The attribute foci categories describes physical aspects of the objects. The categories can roughly be divided into basic attribute categories and variants attribute categories. The variants attribute categories consist of variants of the basic attribute categories.
The task focus category deals with the task (what the drawer is supposed to do). In ALFONS 3 it manifests itself as the use of the word "draw".
Sometimes the test persons started to describe relations between objects already drawn. Apparently they did this in order to check for errors between the original picture and the drawn picture. The control focus category deals with the checking of the drawn picture [Fig. 4] against the original picture [Fig. 1]. Most of the comparisons did not reveal any error thus indicating few misunderstandings between the test persons.
The question of what subfoci categories were used for each object is interesting because it tells something about the kind of information the drawer needed in order to be able to draw the object.
For the tray the describer only had to tell the drawer that he should draw the tray, i.e. he only had to focus on the task at hand. This is understandable because the test person had just drawn the tray in a previous experiment and knew everything about it already. There was also no location problem because the tray was the first object drawn. The rest of the subfoci to the tray focus were used to "set the scene" for the objects standing on the tray.
The next object to get focused was the pitcher. It was the leftmost object on the tray (the describer adopted the strategy of going through the tray in a left to right manner). Also here the drawer had drawn the object before, but standalone, without any preferences. The information missing was the location of the pitcher on the tray and the size in relation to the tray. The missing information was presented as location and appearance subfoci in the conversation. An additional piece of information was also given as a relation subfocus, the fact that the pitcher was the highest object on the tray.
Then came the plate, middle object on the tray. It was also drawn before, but not together with the pitcher and therefore the size against the pitcher had to be checked. The focus present were the task subfocus and the size and check subfoci that both dealt with the size of the pitcher.
Glasses was the first object that had not actually been drawn before. First of all the general location and number of glasses were focused to "set the scene" for the glasses. Then the problem of there being many objects referred to as one object was solved. The describer recognized a spatial relationship between the glasses that was easy to describe, the forward row of three glasses in front of one glass at the back.
The describer started out by focusing on the forward row of three glasses. The drawer now needed to know the location of the row and the location of each individual glass within the row, he further needed to know the looks of the glasses before he could start drawing. The subfoci present that dealt with these problems was the location, relation and appearance subfoci. The relation subfocus described how the glasses were located within the row. Also present were task and check subfoci. The check subfocus controlled the location of the three glasses on the tray.
Next the describer focused on the fourth glass. The information transferred in the two sub-foci present was the location and appearance of the fourth glass. To tell the appearance was necessary because the fourth glass was hidden behind the forward row of glasses and was therefore only partially visible.
The last objects focused on the tray was the straws. They were really part of the glasses but had not been mentioned in the conversation between the participants up until this point. They were at this point introduced as objects of their own. The things needed to focus on seemed to be the location, appearance (dotty line), slope and size subfoci of the individual straws. For the row of glasses a check focus was made to see that the straws were of the same height.
That was the end of the tray. The describer then focused outside the tray and started to define a box before he was interrupted by the experimenters.
The participants of the experiment seemed to follow a mental checklist of subfoci when describing the objects. The checklist consists of all the subfoci mentioned above. Subfoci that are variants of each other are considered the same. The hypothesis is that in order for an object to be drawn almost all parts of the checklist must have been gone through, either explicitly through a subfocus or implicitly as already known information (or assumptions). The checklist could be said to represent the information that is necessary to know in order to draw an object.
The checklist contains parts that deals with the
communication problem between the participants (scene,
control, task) and parts that deals with the physical
attributes of the objects (location, size, appearance,
relation).
Several observations regarding focus were deduced from the ALFONS 3 experiment.
Lexical analysis
The purpose of this analysis is threefold. First, to show whether the samples correspond to Chafe´s distinction of two functions of intonation units: substantive and regulatory. Second to find out what kind of syntactic constituents are the most frequent and why. And third to show a meta-communication awareness in participants and existence of mental images as a bridge between the original picture and the one which was drawn.
The analysis is based on some constituents of intonation
units: noun phrases, pronouns, prepositional phrases, verbs, and
linguistic markers. The last group is named tentatively as it
is difficult to find a unanimous term for them in the linguistic
textbooks. They are closely explained below.
In the description of objects' forms, size and their referents in the real world the most significant and frequent mean of information transfer were noun phrases occurring together with definite or indefinite articles and different types of pronouns. Whenever a new object or a new concept was introduced it was described by an NP.
Alfons pictures were introduced very quickly and fluently. The first information A gave about Alfons 1 was the exact placement of the tray in relation to the whole picture.
Alfons 2 was introduced as:
In the introductory stage of description of Alfons 3, A used only noun phrases proceeded by a definite article as the picture had been discussed previously, viz. in Alfons 2. Exceptions were expressions like four pieces of glas were definiteness was expressed by the quantifier four or other quantifiers.
The introduction of Escher 1 with several fragmentary intonation units and pauses may show that this picture was more difficult to verbalize. But even here A describes the figure using a noun preceded with an indefinite article a big triangle [see Appendix B, Table 6].
It is noteworthy that the common nouns used in that introductory stage of the descriptions refer usually to high level concepts which are shared by both interlocutors, e.g. triangle, vertical stroke, skateboard ramp, opening. This aspect was already discussed in the part about concepts.
A new scene, new object and all new details were always introduced by a noun phrase preceded by an indefinite article. Such use of noun phrases corresponds to Chafe´s distinction of an inactive idea [Chafe, Ref. 3, pp.71-81], unknown to the listener and introduced by the speaker for some conversational purposes. After the introduction the idea becomes active.
Samples from the experiments confirms that intonation units formed by noun phrases play mostly substantive function. They conveyed information about how the discussed objects looked like. Their shapes were compared to another objects, geometrical figures or objects from the real world. Nominal expressions referring to spatial relations between objects and objects size formed also substantive units.
Noun phrases are the most frequent constituents in
the samples. Approximately 30 % of all lexical units in the samples
are formed solely by noun phrases. This richness of nouns can
be explained by the subject of the dialogues - description of
static pictures. The descriptions did not involve any actions.
There were any claims put on the form of descriptions, which resulted
in free conversation disregarding rigorous grammatical constraints.
About 10 % of noun phrases were repeated.
Pronominal phrases were used to either modify nominal phrases or to substitute them.
As soon as A introduced an object to B the object is referred to in the following conversation later by adding a demonstrative pronoun, this or the/they/that into the noun phrase. This change in lexical units corresponds also to Chafe´s observations on the expression of a given idea [ibid.] which can be expressed by pronominal expressions:
When geometrical figures were applied to the descriptions they were not substituted by a pronoun when they were referred to in the following parts of the discourse but modified by a demonstrative pronoun instead:
On the other hand the concepts that referred to concrete objects sometimes changed into personal pronouns that substituted a noun phrase, e.g. mouse - they (möss - dem).
Personal pronouns used in the dialogue were the following: I, you, they and we. In a few cases the indefinite pronoun one was used, so one has two pieces of lines.
I and you were used by both interlocutors throughout the whole dialogue without any misunderstanding whom the person refers to which is not at all surprising. In some parts of the descriptions interlocutors used the pronoun we when they referred to themselves. This topic is discussed closely in the part about meta-communicational awareness.
Another pronoun which has to be mentioned here is pronoun it functioning as a formal subject in an extraposed construction very common in Swedish as well as in English, e.g.: it is totally strange, it looks like, understand it. The occurrence of this pronoun was not so frequent in the recorded dialogues as the participants were referring to very specific objects.
The usage of pronouns is connected with the active idea in the discourse thus intonational units which consist of pronouns or noun phrases modified by pronouns play substantive function.
Special attention require demonstrative pronouns such as that, this very frequently used in the samples. They function is to point out focus of awareness and to move from one focus to another e.g. an then you turn slightly to this ear an then .. so you turn to the edge of one of the ears.
One very important type of pronouns used frequently throughout the discourses are the deictic pronouns. Their correlate could be previously mentioned or knows by speakers, e.g. back of the right white mouse. Deictic pronouns and other deictic terms express the movement from one focus of attention to another. This function was rendered by verbs which express movement, e.g.: turn, follow and go (see analysis of verbs below) . Prepositional phrases and personal pronoun are another example of deictic terms.
Pronouns are next frequent constituent in the sample,
they constitute approximately 15 % of all lexical units in the
sample.
Verbs occurring in the samples were subcategorized into five classes depending on their function in the discourse. The classes are following: copula, verb of commands, movement verbs, verbs referring to ongoing communication process, verbs like have and get.
Copula be forms a class for itself. It was used to describe states and to introduce, name, different objects in focus, e.g.:
Another class includes verbs which are describer´s commands to the drawer, e.g. draw, do, move, set (set the pen). All those verbs can be regarded as synonyms and reflect the main action of a drawer - drawing of a set of objects, e.g.:
All the verbs are active, transitive verbs. Interesting is the context-depending meaning the verb do got in the dialogue, i. e. to draw.
The next class are movement verbs. The speaker instruct the drawer in which direction he is supposed to move, with other words in which direction the new line should go. The verbs conveying directions are as follows: start, follow, turn, go, change (the direction of the bend), lies, e.g.:
The subjects of verbs can vary so that an object from one sentence may become subject in another sentence. Sometimes it is the drawer (agent) who moves the stroke, sometimes it is the stroke itself (patient) which moves, e.g.:
Movement verbs together with different pronominal phrases can be regarded as deictic terms. They point out the focus and movement from one focus to another one.
Interlocutors made constant remarks to the ongoing communication process. This function was expressed by the fourth type of verbs recognized in the samples, namely: mean, get (do you get it), so to say, wait, e.g.:
Such discussions are necessary in the cooperation. Both participants search assurance that the other one understands the conveyed message correctly.
The last group of verbs consists of two verbs, have and get, e.g.:
Those two verbs are in one way similar to the proceedings one - they are expressions of meta-communication between speakers. What differs them from the latter group is that they point out to the object which is a drawn object or part of a drawn object and not to a spoken message.
Verbs from ESCHER 2 were analyzed closely. The transcription of this experiment showed that there were only 15 different verbs mentioned in the discourse. Rough control of other transcribed texts presented similar results - not so frequent occurrences of verbs in dialogues and little variety of meanings.
Statistically less than 9 % of lexical units consist of verbs.
The function of verbs was discussed in connection
with their meaning. Intonational units consisting av verbs, or
verb phrases play regulatory function. It can be either interactional,
expressed by verb phrases as: you know, or validational
as: think or imagine.
The predominant expression of spatial relations between the objects in the samples were prepositional phrases. They specified objects´ location as direction from a certain point of referent. In few cases it was a reference in time, e.g.: the tray that was drawn before. There were two constellations of prepositions in the sample. One horizontal one: in front of/in back of/, before/after, and one vertical: over/under, up/down.
The describer used his perspective in description, e.g. on the left side of the picture. The drawer used his own perspective in drawing.
The other points of reference were another objects on the picture, e.g. above the eye, up the right corner. They were important references in locating all constituent part of the pictures. Points of reference are closely connected with focus of attention which is closely discussed in previous part of the paper. The movement from one point to another is mostly expressed by a prepositional phrase, e.g.:
Prepositional phrases are the most precise devices to describe the movement of attention.
Prepositional phrases can be regarded as of the deictic expressions. As such they express the spatial relations between objects and signal the movement of focus. They form substantive intonation units.
In the samples prepositional phrases turned out to
be more frequent than verbs. Approximately 10 % av lexical units
were prepositions. It can be explained that the particular subject
of the dialogue contributed to the frequency of that constituent.
The active participation in a discourse is a prerequisite of an achieved communication. There are many ways for a listener to show that he or she follows the speaker. One of the linguistic or discourse markers are short segments, such as: mm, aha, yea.
Such markers appear in the samples. It has to be mentioned here that they were quite frequent in the dialogue. It has been noted that they stay for 13 % of all linguistic units. To such discourse markers were also counted: you know, ja, ee, yea, ye etc.
Another type of communication awareness was noticed in the samples which is called in the paper as meta-communication awareness. The evidences of such awareness are following: change of pronouns from singular to plural, usage of phrases as: to be more precise, perhaps I should say, so to say, wait and others.
Both participants give notices about the ongoing dialogue. Corrections show that are able to comment their own statements by correcting himself. They also strive to be more precise and correct in order to be understood, e.g.:
In another situation A, irritated, commented:
The change of pronoun was used during those parts of the discourse when the interlocutors were certain what part of a picture was supposed to be drawn, e.g.
Intonation units in which the pronouns change the number had an interactional function. These utterances can also serve as a prove that both participants created mental images of transferred pictures which is discussed below.
It is assumed here that the participant A kept creating a mental image in his own mind of the picture he was describing in order to verbalize it. This image was constantly corrected to make the transfer possible. B in his turn kept creating a mental image on basis of the information received from A. And also he tried to correct his own image so that it could map to the image A had in his mind.
When A explains the contours of a mouse on Escher 2 B says:
It means that he already had an idea what he was supposed to draw. The image was already created in his mind even if it was not drawn on the paper.
While comparing some object to the real world object
participants show also that those images were created firs in
their mind. One example was mentioned earlier in the paper when
concept formation was discussed, e.g. the analogy of the pitcher's
form to woman who is sitting naked. Another one can be
mentioned here, e.g. stylized mouse pictures in the biology
schoolbooks (see Appendix B, Table 6).
Lexical analysis of samples opens several ways and levels of analysis. This part was limited to describe different functions of some intonation units constituents. It also showed they frequency in the samples and their role in formation of verbalized transfer of images.
Discussed were also some aspects of participants
awareness of communication. Some examples of existence of mental
images of described and re-created picture were given.
The study has resulted in a number of hypotheses
concerning how language is used to transfer visual images. These
hypotheses are issues for further research.
When confronted with the task of describing an abstract
visual image that someone else should draw, the describing person
have no obvious, familiar objects to refer to. Our hypothesis
is that in these cases, the describing person imposes a concrete
interpretation on the image. The alternative would be to use just
simple (geometrical) concepts to refer to the different parts
of the image. The advantage of imposing an concrete interpretation
is that these high-level concepts can be conveniently used for
future reference in the reconstruction process.
Our hypothesis concerning the usage of high-level
and low-level concepts are as follows. High-level concepts guide
the drawing of low-level details. In this way, high-level concepts
impose constraints on and deambiguizes the drawing of low-level
details. Otherwise, if no high-level concepts are used, there
are often many possibilities open when drawing each low-level
detail.
When describing a visual image to be drawn, we believe
that the describing person always at first makes a brief overview
description of the image. This overview has the function of "setting
the scene" and providing the broad context for the items
to be drawn in the image by the drawing person.
This hypothesis states that when drawing multiple
objects the objects will be focused and drawn one by one (with
the exception of a possible quick scene-setting tour).
This hypothesis states that all conversations can
be divided into regions of different foci. The different foci
can be classified into category classes.
For an object in a visual image to be reconstructed properly, we believe that the describing person has to include certain features of the objects as if mentally following a checklist. The order is probably not important, but all entries in the list must be transferred. In our experiment the checklist contained the following entries: scene (tray, glasses, straws), attribute (location, size, appearance, relation, occupied space, slope, and shape), task and control categories. If some parts of the checklist is already known to the drawing person, they are not transferred.
It would be interesting to study poor transfer of
an object and relate this to in what extent a checklist was used.
In the setting were two persons are to recreate a visual image and can not see each other, there are two visual images present (person A describes and person B draws):
References
[1] Bergström, Gunilla, Listigt, Alfons Åberg,
Rabén & Sjögren, 1987.
[2] Schattschneider, Doris och Walker, Wallace, M.C. Escher
Kalejdocykler, Taschen, p.11 and p.24, 1991.
[3] Chafe, Wallace, Discourse, Consciousness, and Time. The
Flow and Displacement of Conscious Experience in Speaking and
Writing, The University of Chicago Press, 1994.
[4] Holmqvist, Kenneth, Holsánová, Jana, "Focus
Movement and the Internal Images of Spoken Discourse" in:
Discourse Construction, Libert, W. (ed.), 1996.
[5] Tversky, Barbara et al, "Spatial Mental Models from Description",
J. of the American Society for Information Science, 45(9),
pp.656-668, 1994.