Abstract and Their Implications on Learning Learning can


          This study aims to examine the effects of auditory stimulus in the recall of pictorial and worded stimulus, and determine how the interactions between these variables could potentially contribute to a better learning experience for students. It has been well established in the literature that the use of an auditory stimulus enhances an individual ability to recall. Past studies discovered that the medium of delivery such as pictures and audio stimuli has an impact on enhancing an individual’s learning. Hence, through the insights of previous works, the impacts of the three stimuli on learning will be investigated. With quantitative data collection approach, the data is obtained through experiments and statistical measurements. The results from this study revealed that media such as visual image and auditory have only little impact on the enhancement of an individual’s learning. This study also depicted several limitations that affected the results that will be discussed more in the sections.


Keywords: visual image, memory recognition, auditory stimulus, graphical, and learning












Words, Pictures and Sounds and Their Implications on Learning

             Learning can be described as a process that is largely dependable on an individual’s experiences and how he/ she integrates these experiences into his/ her memory system. In order for a memory or an experience to be retrieved, it had to be encoded in the first place (Howard & Vesta, 1996). Many factors have been shown to affect how a stimulus is being encoded; for instance, whether or not the stimulus is presented in a pictured format or whether it is presented audibly, its effect on one’s ability to encode and retrieve is non-negligible. Past researches have demonstrated the effects of objects/ pictures leading to better free verbal recall rates than do their labels (Ducharme & Fraisse, 1965). Such findings have lead researchers to conclude that a more complex process other than mere verbal coding is involved; in other words, since objects lead to better recall rates than do their labels, then non-verbal processes must have contributed to the retrieval process. There are 4 ways in which the non-verbal process may have occurred. Objects may be easier to code because there is a unique physical vividness to them, meanings might be activated by seeing the objects, being able to view objects as independent units, as well as providing the platform to organize them into higher order units (Tulving, 1968).


Literature Review

While none of the suggested possibilities have been demonstrated empirically to be involved in why objects are easier to recall than are their labels, Bousfield, Esterson & Whitmarsh (1957) found promising results that would help explain some of the mechanisms at work. They found that participants were able to recall nouns if the nouns are presented along with a coloured picture, but were less likely to recall the nouns if the nouns were presented with uncoloured pictures and least likely to recall if nouns were presented alone respectively. Hence, their study made it clear that it is the physical vividness of the pictures that made it easier for participants to recall the nouns. In fact, the phenomenon of pictures being easier to recall has resulted in a term coined ‘Picture Superiority Effect’. This effect has claimed that pictures may be easier to recall and attributed it to the higher familiarity or frequency of pictured objects that individual holds towards pictures than words (Mintzer & Snodgrass, 1999). Interestingly, the effects of pictures leading to higher recall rates than non-pictured worded stimuli are also apparent among the young and elderly and have been demonstrated in Park, Puglisi & Sovacool’s (1983) study. They found that when provided with the stimuli of words and pictures, both younger and elderly participants were significantly more able to recall the pictured stimuli than the worded stimuli. This shows the widespread phenomenon of the ‘picture superiority effect’ has no age boundaries and that the effects of picture being easier to recall than words are synonymous among all age groups. Similarly, Pavio, Philipchalk & Rowe (1975) found that whenever participants are allowed to recall the stimuli freely, pictured stimuli yielded a higher percentage of recall than do worded stimuli. However, when participants were required to recall the stimuli in sequence, they fair better when the stimulus was worded rather than pictured.

Other than the physical attributes of the word being presented visually, audio representation of words have also been found to improve recall rates as opposed to presenting the word stimulus on its own. Specifically, researchers such as Forrin & MacLeod (2017) found that students tend to recall better when words were read aloud to them than when they were just presented on paper. Similarly, Poulton & Brown’s (1967) study on reading aloud and memory demonstrated the effects of reading aloud on one’s memory. Twenty-four housewives were recruited for Poulton & Brown’s study and they discovered that the end of the passage was remembered significantly better when it was being read aloud as opposed to silently reading it; hence concluding that vocalization added extra emphasis to the text. The effects of vocalizing the stimuli and how it affects an individual’s rate of recall have been studied intensively by Kucan and Beck (1997). They found that when students ‘think aloud’ and verbalize their thought process, they were able to better understand their text, which then result in better rate of recall. Therefore, it would also be interesting to examine the variable of sound and study their effects on the words presented and how they affect rates of recall if presented in an audio format; likewise, it also allows for comparison between words presented in a pictured format and audio format in order to determine the best format for word recall. In fact, researches surrounding the topic of visual and audio presentation of information have found that individuals were more likely to remember visual information presented in a news rather than the content of the news that are being presented audibly (Newhagen & Reeves, 1992). Therefore, the comparison between visual and audio information will likely shed light on the most effective way to increase rates of recall.

With the findings of past researches, the current study aims to compare how pictures and their noun labels determine which condition leads to better recall rates. In line with results from Bousfield, Esterson & Whitmarsh’s findings and the ‘picture superiority effect’, it could be hypothesized that participants will have a higher recall rate in the ‘picture’ condition than in the ‘noun label’ condition. It could also be hypothesized that if physical vividness/ compounding of stimulus is indeed effective, then recall rates should be higher for the pictures rather than for words. However, if physical vividness/ compounding of stimulus were not effective, then concrete meaning of the words would be the determining factor that leads to better recall rates. In addition, with relation to sounds and recall rates, it could be hypothesized that when stimuli are presented audibly, participants would recall them better than when stimuli are presented without the audio stimuli. Hence, participants in the ‘audio’ condition will perform better in terms of recall rate in both the ‘picture’ and ‘noun’ condition than participants in the ‘non-audio’ condition.





A total of 20 Psychology university undergraduates from the Management Development Institute of Singapore (MDIS) between the ages of 17 – 30 are randomly recruited for this experiment.


There will be a total of 20 stimulus presented to the participants. 10-pictured stimulus and 10 concrete words will be presented to them in total. The pictures and words given to the participants will be concrete in nature. The 20 pictured and 20-word stimuli will be presented to participants through the use of a computer. A speaker will be utilized for the ‘audio’ trial of the experiment.


This experiment used a between-subject ANOVA design with the conditions ‘picture’, ‘word’, and ‘audio’ as independent variable with the participant’s recall rates being the dependent variable.


Before the procedure, participants will be asked to sign a consent form stating that they are willing participants and that they will be allowed to withdraw from the experiment at any point in time. The 20 participants will then be randomly separated into 2 separated groups of 10 each, and be ushered into the experimental room. The first group of 10 participants will go through the first trial while the second group will go through the second trial. 

First Trial (Non-audio condition)

In the first trial, participants will be shown 10 pictures followed by 10 words with a gap timing of 2.5 seconds between every stimulus on the computer screen. Each stimulus will be shown for duration of about 10 seconds. After showing the 20 stimulus, participants will get a break of 2 minutes before they are told to recall what they saw during the experiment and write them down on a piece of paper provided. Participants are required to recall the stimulus in any order.

Second Trial (Non-audio Condition)

In the first trial, participants were also shown 10 pictures followed by 10 words with 2.5 seconds in between every stimulus. In addition to presenting these stimuli, the word and description of the stimulus will also be read out loud for the participants. Similarly, after going through the stimulus, participants will be required to take a 2 minutes break before writing down what they could recall in sequence.

The experiment will end after the participants wrote down their response. The responses from the participants will be collected and all results obtained from the experiment will be recorded using a spreadsheet. Participants will then undergo a short debrief and the true nature of the study would be revealed to them. They will also be informed of the data and results of the study collected by the researchers. The spreadsheet, which contains all the participant’s scores, will then be analyzed using the ANOVA.









Non-audio versus Audio

Participants in the non-audio condition scored a mean of 8.1 (SD = 8.54) for concrete words and a mean of 11.1 (SD = 9.43) for pictured stimuli.  On the other hand, participants in the audio condition scored a mean of 8 (SD = 3.55) for concrete words and a mean of 9.4 (SD = 6.93) for pictured stimuli. The ANOVA comparison between the non-audio and audio condition is non-significant F(1, 36) =  1.14, P = .293 . Hence, the null hypothesis that there is no difference between the audio and the non-audio condition cannot be rejected since the P value is greater than 0.05 and its F value of 1.14 is lesser than the F critical value of 4.113.


Pictured versus Worded

Participants in both the pictured condition scored a mean of 11.1 and 9.4 (SD = 9.43, SD = 6.93) and participants in the worded condition scored a mean of 8.1 and 8 (SD = 8.54, SD = 3.55). The comparison between pictured and worded condition to be significant at the p<.05 level F (1, 36) = 6.8, P = 0.013. Therefore, the null hypothesis that there is no difference between the 'pictured' condition and the 'worded' condition can be rejected at the p<.05 level.   Variables Interaction There are no statistical significant interaction between the auditory stimulus and the type stimulus presentation F(1,36) = 0.89, P = 0.34. This shows that the auditory stimulus had no significant influence on the type of stimulus presentation (words/ pictured) and vice versa. Hence, the null hypothesis, which states that there are no interactions between the 2 variables, cannot be rejected since the P value is greater than .05 and the F value of 0.89 is lesser than the F-critical value of 4.113.                                                                    Discussion Non-audio versus Audio Results obtained for this study demonstrated that the impact of having the stimulus read out loud is minimal and insignificant in relation to an individual's ability to recall. This presents a disparity that does not reflect the findings based on Forrin & MacLeod (2017), Poulton & Brown (1967), Kucan and Beck (1997) as well as Newhagen & Reeves's (1992) study which claimed that audio stimulus or having the stimulus read aloud leads to a higher rate of recall. However, the present study failed to replicate their findings. A few explanations could be provided to explain the non-significant result that has been obtained between audio stimulus and rates of recall. Firstly, the sample size of this study of 20 is far too small to yield a statistically significant result given that variance is too small to be fairly accounted for. Moreover, it has well been documented that the larger the sample size, the more reliable the result, and thus greater precision and power could be yield from the result. Since the sample size for this study is small, there may not be enough variance between the participants, and hence the difference between the audio and non-audio condition may not be detected. Secondly, it was mentioned in Sruthi's (2017) paper that connections between the audio stimulus and the word/ picture had to be formed in order for the audio stimulus to lead to higher recall rates. Hence, another potential explanation for why participants in this study did not fare better in the audio condition as compared to the non-audio condition could be due to the fact that there wasn't sufficient time for connections between the audio stimulus and the word/ picture stimulus to be formed. The time between the presentation of each stimulus is only a mere 2.5 seconds, which is far too short for any forms of meaningful connections to be established between the audio stimulus and the word/ picture stimulus, hence leading to similar rates of recall as the 'non-audio' condition. Simply put, the process of categorizing, organizing, and connections making are absent, thereby making little to no difference between the 2 conditions. Pictured versus Worded The results for pictured condition and the worded condition supported the findings of the previous studies with statistically significant results that claimed that pictures are easier to remember than words and hence leading to a higher rate of recall for pictures than do words. The phenomenon of the 'picture superiority effect' could be explained by Paivio's Dual Encoding Theory, which claimed that pictured stimuli are easier to encode because their features are encoded dually (Paivo & Csapo, 1973). In other words, while concrete words are only encoded verbally, pictures are likely to generate a verbal as well as a pictorial/ image code, hence leading to better rates of recall since it is being encoded dually. Additionally, the phenomenon could also be explained by Nelson's Sensory Semantic Theory, which states that because pictures are perceptually more distinct from one another than are words, they would be encoded more deeply, thus increasing their chance for recall and retrieval (Nelson, 1979). Words, Pictures & Sounds and their Implications on Learning This study demonstrated the effects that pictorial cues could potentially have on an individual's rate of recall. Therefore, in order to enhance retrieval and recall in students, textbooks and other relevant learning material should have important concepts illustrated with as many pictorial cues as possible. Moreover, given that much of our sensory cortex is devoted to vision (Zeki, 1993), visual learning makes complete sense given that our brain is an image processor more than it is a word processor. As words could be more abstract and thereby making it more difficult for our brain to retain, visuals, which are concrete in nature, are easily remembered. Therefore, various visual cues such as photos, illustrations, icons, symbols, sketches, concept maps and more could be incorporated in the classroom setting in order to facilitate the student's learning process so as to aid in their process of encoding and retrieving of relevant concepts.  Even though this study failed to establish a significant relationship between audio and rates of recall, their relationship should not be seen as miniscule. As mentioned, many researchers have previously managed to establish the impact that the addition of audio/ sounds can have on an individual's ability to recall. In fact, the utilization of audio stimulus is considered to be paramount in a classroom setting. Lewis and Mack (1982) emphasizes the importance of 'thinking aloud' and claimed that 'thinking aloud' is crucial in helping students monitor their thinking process as well as help them make sense of what they read. Limitations of Current Study The current study faces numerous limitations such as having a small sample size, being unable to establish reliability and validity, as well as not random sampling from the general population. As mentioned, the small sample size involved in this study may have prevented the researcher from obtaining statistically significant data. Also, the results obtained from the small sample size makes generalizing the result to the general population rather difficult. Likewise, the small sample size also affected the reliability and external validity of the result. Lastly, all participants for this study have been obtained through the means of convenience sampling. As they are all university undergraduates, they may be cognitively different as compared to the general population. Hence, results obtained from this group of participants may be unable to provide generalization of the general population. Future Recommendations Firstly, future researchers who are interested in the effects of words, pictures, and sounds, and their effects on an individual's ability to recall could consider having a larger sample size, preferably a sample size of 1000, in order to increase the reliability and validity of their study. Secondly, researchers might want to include participants from all walks of life instead of just focusing on undergraduates in order to increase the external validity of the study. Specifically, researchers might want to obtain school going participants who ranges from elementary to high schoolers in order to provide a more accurate examination of the effects of using words, pictures, and sounds on their ability to recall. Lastly, for the examination of the effects of audio on the ability to recall, researchers might want to use an entire comprehension text rather than just single words and examine how reading the entire passage aloud could affect the participant's memory for recall as compared to having the entire passage read silently, hence making the experiment more relevant to the typical classroom setting. Conclusion In conclusion, the 'picture superiority effects' have been demonstrated in this study, as well as in numerous previous researches, and provided evidences that pictures are easier to recall and retrieve from one's memory than are words. Such findings provide many implications for teachers and students and shed light on how material should be presented in the classroom in order to aid recall. Despite the failure to establish a significant result for the use of audio in recalling stimulus, the effects of audio should not be disregarded as insignificant. Instead, future researchers should aim to replicate this experiment with a much larger sample size in order to establish significant findings for the use of audio in an individual's ability to recall as previously demonstrated by other researchers.