Perceiving 3D in the absence of measurable stereo-acuity

Aim: Increasingly, those who are considered ‘stereoblind’ by clinical testing, report that a 3D effect is perceived when watching stereoscopic films at the cinema. We report here the findings of a pilot study investigating the perception of 3D of stereoscopic video clips and games consoles, in observers who have no measurable stereo-acuity. Methods: Seven subjects were assessed for stereoacuity using standard clinical tests. They were then asked to perform an object depth ordering task on an autostereoscopic screen (Nintendo 3DS) and a 3D video rating task, to determine recognition of depth in entertainment media. Results: No subject had measurable stereo-acuity or simultaneous perception. Only 2 subjects achieved 41% and 55% correct depth identification on the 3DS task; the other 5 subjects performed poorly. When viewing stereoscopic 3D video clips, even subjects who demonstrate zero ability to identify depth on the 3DS task rated the ‘pop-out’ 3D effect very highly, giving a median (interquartile range) score of 8 (5) out of 10. Comparatively, 2D control videos were given a rating of 3 (8) out of 10. Conclusion: Subjects with no clinically measurable stereo-acuity report compelling ‘pop-out’ depth effects when viewing 3D stereoscopic video. There are many mechanisms for determining depth from a scene, with the presence of motion potentially allowing the appreciation of stereoscopic depth. The nature of the technological method of stereoscopic 3D delivery may also aid recognition of, or give other significant cues to, depth through artefacts or presentation method.


Introduction
The assessment of binocular vision is an integral part of the orthoptic assessment, with the results having significant implications in terms of management and diagnostic decisions. In addition, binocular vision has been shown to be important to many areas of life, such as motor skills, employment and education. [1][2][3][4][5] The measurements we record, however, do not always match up with what our patients tell us. There are many anecdotal reports of 3D depth being revealed to those who are not expected to perceive it. [6][7][8][9] There has also been recent press interest in the concept of monocular stereopsis, 10 in which qualitative evidence is presented that, when using only one eye, compelling depth is easily appreciable. 11 The assertion that 'vivid 3D vision can be experienced with just one eye' is arguably a matter of personal opinion, perception and state of binocularity. 6,10,11 For this reason, attempts by researchers to evaluate this are limited by the respondent's interpretation of the instructions.
One of the reasons for the discrepancy between clinical tests and the subject/patient response could be that current clinical testing methods only assess one aspect of depth perception. Although there are considerable differences between the clinical stereo-acuity test types, 12 they all present static, central, monochrome stimuli, with all other cues to depth reduced to a minimum. 3D entertainment media is very different to this, not only in what is shown, but in how it is delivered. The technology behind delivering 3D content in the cinema or home differs from that used in clinical tests. Whilst the glasses (Real-D system) used most commonly at the cinema are passive polarising (similar to the Randot, but circular rather than linear polarising), there is still an active element. The polarisation difference is created by an LCD filter placed in front of the projection lens, which determines which frame is shown to each eye by alternating polarity. These changes in viewing eye per frame may be imperceptible; however, each eye is not being presented an image at the same point in time, as is true of most clinical tests. The BVAT test of stereo-acuity 13 uses active shutter glasses, which is similar to the technology used for 'active' home 3D TVs; however, the BVAT differs in that the glasses have a very low refresh rate per eye of 30 Hz, whereas modern active 3D TVs have a minimum of 60 Hz refresh rate per eye. The test stimuli used by the BVAT are random dot based and static. Autostereoscopic screens, such as that of the Nintendo 3DS for which glasses are not required, are similar to the Lang stereotest and passive 3D TV screens, in which a filter on the screen determines the polarisation of each vertical line on the screen, allowing an image to be presented to both eyes at the same time.
Given the significant differences in presentation method and content, it is not surprising that variability between clinical measures and 3D entertainment media occurs.
A series of experiments are being undertaken at the University of Liverpool, exploring various aspects of 3D vision. We attempt to isolate what information is being used to provide the perception of stereoscopic depth experienced by those with unmeasurable stereo-acuity. One current study aims to compare subject performance on clinical testing with the perceived levels of 3D effect of entertainment media. This is usually performed under four different states of monocular blur, thus reducing the effect of intrasubject variability. This paper will present data obtained from subjects who volunteered for the study but demonstrated no binocularity on clinical measures, so only one state of binocularity was testable.

Methods
Ethics approval was gained from the University of Liverpool Ethics Sub-committee and the study was performed in accordance with the ethical standards laid down in the Declaration of Helsinki. Participants were recruited from the staff and student population of the University of Liverpool via advertisement for volunteers to participate in a 3D vision study. Prior to participation, informed consent was gained from the subject. Inclusion criteria for the larger study were that volunteers were aged 16 years and over, with vision of 'driving standard' (0.22 logMAR) in at least one eye. Screening consisted of monocular visual acuity assessment (ETDRS), cover testing, Bagolini glasses and stereo-acuity tests (detailed in Table 1). Those subjects who had no clinically demonstrable binocular vision were included; however, they were not tested under different states of monocular blur.
The subjects had to complete one near and two distance tasks following orthoptic screening. The near task was performed using a Nintendo 3DS handheld computer game. Distance tasks were performed using a 3D TV (720p circular polarising 1366 Â 768 pixels) at 1.2 m, an industry-typical distance for a 32 inch TV. Prior to any stereoscopic presentation, a brief questionnaire was completed to determine levels of fatigue, headache and any extraordinary visual symptoms, on a rating scale of 1 to 10. The three questions were derived from key words on a questionnaire developed to assess users of a head-mounted virtual reality device. 14 The symptom questionnaire was repeated upon completion of testing to determine whether any stereoscopic viewing had any negative effect on these subjects. Testing order and video/picture presentation order was randomised and took place in the same location, under standardised lighting conditions.

Static 3D task
A Nintendo 3DS game device was pre-loaded with six pictures of a static scene for autostereoscopic viewing. A colour paper copy of the six pictures was given to each subject with highlighting of four or five points of interest in the scene. The points chosen were clearly distinct from each other, with five pictures containing five points and one containing four (limited by the number of objects in the scene). The subject was asked to identify/ estimate the order in depth of the highlighted objects. A correct score was achieved by correctly identifying the difference in depth of the highlighted objects. For example, identifying the orders as 1,2,3,5,4 would give a score of 3 out of 5. An order of 1,4,2,3,5 would give a score of 2 out of 5, as the middle depths were identified out of place: even though the order of 2,3 was identified correctly, they were not correctly differentiated in depth from the fourth-furthest object. The fixation distance for this task was not set, as the device is designed to be used as 'handheld'. However, for the purpose of calculation of disparity range, the fixation distance was considered to be 0.4 m ( Table 1).

3D TV video task
Each subject viewed five stereoscopic 3D videos covering the stereo-acuity ranges described in Table 1.
Subjects were asked to rate on a scale of 1 to 10 the amount of 3D depth they perceived in each video. The ranges on the rating scale were described as follows: 1-3, '3D effect not seen, appears mostly 2D'; 4-7, '3D effect fairly evident'; 8-10, '3D effect very obvious, you feel you would need to move out of the way or catch objects from the screen'. As a development during this pilot experiment, subjects who entered the study at a later stage (n = 4) had only the right image of the stereoscopic video presented to both eyes, to determine whether subjects rated the 3D stereoscopic version differently to the 2D version.

New 3D TV test
The subjects were presented with a polarised version of a TNO-type stimulus on the 3D TV and asked to identify the location of the 'mouth' of the PacMan. This test was performed to extend the range of stereo-acuity levels measurable by clinical tests (Table 1). Five levels of stereo-acuity were screened including one negative control and one positive control (simultaneous perception test). The range of levels tested represented the minimum and maximum levels of stereo-acuity achievable, due to screen resolution and avoiding overlap of the TV frame. Descriptive data only are presented, due to the small number of participants in this sub-group.

Results
A total of 7 subjects, aged 45-66 years, were recruited for this pilot study with a mean (SD) age of 53 (10) years. All subjects reported a suppression response to Bagolini glasses and the new 3D TV test's positive control (simultaneous perception test). A brief description of subject characteristics is presented in Table 2.
The summative values provided present the median and interquartile range (IQR), as the number of subjects is small, non-normally distributed and uses non-continuous measures.

Symptom questionnaire
The symptom questionnaire demonstrated no change between the pre-and post-test scores, with the exception of subject 7, who had a 1 point increase in tiredness between pre-and post-viewing.

Static 3D task
The individual percentage scores ranged from 0 to 55%, with a median (IQR) score of 0 (40). The individual subject scores achieved are shown in Table 3 (static 3DS task).

3D TV video task
Subjects rated the 3D effect of the videos as 8 (5) (median (IQR)), which falls within the descriptor of, 'a very obvious 3D effect that compels interaction' (lower quartile 4, upper quartile 9). The individual ratings given by each subject are shown in Table 3 (3D TV video task (3D videos)). Given the surprisingly high values given by the non-binocular subjects early in the study, the methodology was modified to introduce a 2D video control. Subjects (n = 4) were played the 2D versions of the video clips (without being informed what was being done) by presenting the right eye image to both the left and right eyes. The subjects in this group scored the 2D videos a median IQR rating of 3 (8) whilst the videos in 3D were rated 7 (7) in this sub group.

Discussion
In this pilot study on 7 non-binocular subjects, no subject provided a clinically measurable level of disparity; however, responses to 3D entertainment media tasks ranged from nil to 'appears very 3D', and depth order was correctly identified up to 55% of the time. This discrepancy may be due to clinical stereotests only assessing disparity detection. This is only one element of the depth information presented by 3D entertainment media; other information provided does not necessarily require both eyes to be used.
At present there is no conclusive explanation of 3D perception in the absence of measurable stereo-acuity. However, we observe three key differences between 3D entertainment media and clinical stereotests. First, the artistic/monocular/pictorial cues in the image can cause monocular stereopsis, the illusion of depth from a flat image. Second, moving images provide additional monocular and stereoscopic motion cues. Third, the 3D display technology itself, the methods of 3D presentation, generate both supraliminal and subliminal artefacts.
Depth can be identified through a number of pictorial cues such as linear perspective, relative size, texture gradient, height in the visual field, aerial perspective (blur), occlusion, lighting and shadow. 16 While pictorial cues can facilitate the ordering of scene elements in depth, they do not provide a 'pop-out' 3D effect, where the effect is such that an object appears to the observer to be floating in front of their eyes. More subtle 3D effects given by monocular cues allude to the presence of relief, where the scene appears to have depth that could be 'reached into'.
Stereoblind observers do not have depth context/ volume information as they cannot detect disparity, equating to them viewing the scene with one eye. Whilst this most likely restricts the recognition of the strong 'pop-out' effect, relief is easily recognisable when  viewing a scene monocularly. 11 Based on comparison between 3D and 2D videos, subjects 6 and 7 appeared to primarily use monocular stereopsis, the median scores being very similar in each case. The literature regarding monocular stereopsis is weighted towards historical/ anecdotal accounts, with only a few recent studies attempting to quantify the effect. 11,17,18 However, the mechanism behind monocular stereopsis is still poorly understood. Perspective vergence is an oculomotor response to pictorial cues. 19 In a flat image, a cloud will elicit a different vergence response compared with a near object. 20 A cloud is known to be distant, so a compelling presentation will induce a 'vergence memory' response, to diverge gaze to the distance. Therefore, the eye may process a known image differently to an abstract pattern. When the surface of an image is not discernible by resolution 18 or context, 11 the visual system appears to process the information as more real than a flat 2D image. More recently, an increase in apparent realness at high image resolutions has been described, with images becoming indistinguishable from reality at very high resolutions. 18 This could explain anecdotal reports of the newest 2D display technologies with very high resolutions appearing three-dimensional. This current study used low-resolution 3D videos from YouTube and the 3DS, so monocular stereopsis would only contribute a limited amount to the depth effect. Currently, cinema television screens offer higher resolutions than the screen we used. Subjects 4 and 6 demonstrated good use of the pictorial cues available during the near static task (3DS), as their performance in identifying relative depth was above chance. In contrast all other subjects had very poor performance using these cues alone. As the pictorial cues used in the 3DS task pictures were limited to relative size and occlusion, the two subjects appear to give a greater weighting to these types of cues, hence the better performance. The poorer performing subjects may still use pictorial cues but may be more reliant on those that were not available in these pictures. Another explanation is that subjects 6 and 7 may have used disparity information, as described anecdotally, 9 potentially as a result of the construction of the parallax barrier of the autostereoscopic screen not requiring the eyes to be perfectly aligned to the image, rather the barrier location.
Videos are significantly different from pictures in that motion is present. Motion in 3D viewing is not limited to x and y motion, but also includes z motion, out of the screen. An explanation for the perception of depth in the 3D TV video task experience by subject 3, 8 and others, could be the presence of motion in depth, which comprises two different elements. [21][22][23][24][25][26] The first is Changing Disparity Over Time (CDOT), in which the visual system computes binocular disparities (as for static) and evaluates the change in disparity over time. The second is Intra-Ocular Velocity Difference (IOVD), in which the visual system calculates the lateral displacement in both eyes and calculates the difference between signals from the two eyes.
The recognition of disparity through the use of CDOT and IOVD has been described in cases where the recognition of static disparity is absent, including in people with strabismus of up to 20 D ET. [27][28][29][30] The IOVD mechanism is proposed to work with uncorrelated points in space, meaning the appreciation of a different direction or speed to each eye of what is computed to be the same object, and may result in the appreciation of motion in depth. In other words, a stereoblind observer could use the IOVD information in the absence of spatial correlation of the image, or of the eyes' alignment. For the 3D video task, 5 of 7 subjects rated the 3D effect as 'popping out'. As no recognition of static stereo-acuity was demonstrable, and with mostly poor performance using pictorial cues alone, it is possible that subjects could have used the time-correlated motion information to make use of the IOVD method of recognising motion in depth.
The third potential source of depth information in the absence of static disparity detection could be a result of technological factors. The most accurate 3D displays to study stereomotion present targets at identical points in space simultaneously. Most modern 3D TV technology does not fulfil these criteria for this application, since passive screens introduce a spatial vertical disparity and active screens introduce a temporal intraocular disparity. Only cinemas with dual projection systems (such as IMAX) introduce zero spatial and temporal disparity.
Binocular amblyopia therapy promotes the use of the amblyopic eye by diminishing the signal to the fixing eye by reducing either contrast or illuminance. 31,32 The shuttering of active 3D displays is comparable, halving the time the signal is presented to the fixing eye. Half of the entire presentation time is presented to the amblyopic/non-fixing eye only. Even in the case of strabismic viewers, who do not have time to take up fixation, binocular information may still be extracted through recognition of motion in depth.
The presence of screen flicker draws attention to the scene subconsciously, 33 which, combined with occlusion of the fixing eye, could further encourage use of the weaker eye. Any viewer head motion causes the 3D elements to move in relation to one another, which does not occur during 2D viewing. This combined with flicker can produce 'Wobble Stereo', where temporally separated views perceived in rapid succession give depth information. 34,35 These mechanisms may cause nonbinocular persons to report the perception of 3D at the cinema.
As commercial methods of delivering 3D improve, the artefacts in technology will reduce, potentially removing cues that clinically stereoblind persons appear to use to appreciate 'stereoscopic' 3D depth. This study, while employing a small number of subjects, compares the results of clinical stereotests with 3D entertainment media with known disparity values. We also presented a modernised clinical stereotest in the same context as the 3D entertainment media, further enabling comparisons between technology and practice. Staircase methodology will be used in future experiments to quantify responses to stimuli which vary in only one dimension, rather than relying on subjective measures. The current optimal system for artefact-free 3D viewing, and therefore ideal for future research, is a dual passive display system. This is where the left and right eye see superimposed stereo images, either through twin projection or two monitors configured with a semi-silvered mirror. These systems are not immune to problems of misalignment and crosstalk but they do fulfil the base criteria of temporal and spatial synchronicity between the left and right image.
Encouragingly for future research and participants, we did not find any marked changes in reported symptoms following stereoscopic viewing, including extended assessment.
There are weaknesses to this study as unknown variations in subject characteristics, such as whether binocularity was present at any time during the subject's lifetime, could influence findings. The data presented here are from only 7 subjects, with no analysis appropriate due to the limited number. The findings do not aim to provide a definitive explanation, rather, suggestions based on observations that could be applied in specific cases, to offer explanation as to why a patient may insist they observe 3D 'at the cinema' when stereoacuity is not measurable in the clinic. The methodology of the primary study was designed for intra-subject comparison across different simulated levels of reduced stereo-acuity and so awareness of the purpose of the study was not critical. In this subset, bias could be considered to be introduced, as the advertisement and subject information described a '3D vision study'.
These initial results do highlight the need for further research into the assessment of binocular vision, specifically to directly compare the difference in threshold of static and dynamic stereo-acuity. The effect of display technology and the implication for stereoblind persons should also be considered, as improvements in technology may remove their ability to experience the effect.
When asked by a parent, 'Is it worth paying extra for my child to see a 3D film?' orthoptists should consider that disparity is not the only cue to depth present in a scene and that there are many aspects of a film that can provide cues to depth.
The authors declare no competing interests