The redevelopment of the Kay picture test of visual acuity

Aims: To validate the design of updated optotypes to be used in the Kay picture acuity tests to improve the resolution acuity, recognition, repeatability and comparisons with gold standard logMAR acuity assessments. Methods: The study was completed in four phases. Inball phases the pictures were presented on a monitor as a single crowded optotype, with five optotypes at each visual acuity (VA) level. Phase one assessed the resolution acuity for 25 pictures, eight Landolt Csband five ETDRS letters. The recognition phase (phasebtwo) assessed children (18 months to 5 years) to determine which pictures were most commonly identified. During phase three, the resolution acuity of a reduced number of pictures and the Landolt C were reassessed to ensure that fatigue had not influenced the initial results. Phase four compared the new Kay pictures with LEA symbols and the ETDRS (part a), and repeatability of the Kay pictures and the ETDRS chart (part b). Results: Phase one (resolution acuity): 50 adult subjects were tested. The mean (±SD) acuity scores achieved using each of the 25 pictures ranged from -0.123  ±0.124 to -0.308  ±0.105. The mean (±SD)bacuity for the eight Landolt C orientations was -0.059 ±0.120, and -0.128  ±0.101 for the ETDRS letters. Following this analysis, three pictures were removed. Phase two (Recognition): 420 children were assessed (54% male) using the remaining 22 pictures. Analysis resulted in the removal of 10 pictures based on the recognition levels. Phase three (resolution acuity): 43 adult subjects were tested with the remaining 12 pictures. The picture selection was reduced to six based on a combination of similar mean bias levels, recognition levels in all children and recognition levels in the youngest children. Phase four (a) (comparability): 113 adult subjects were tested. The mean bias indicated similar results between the tests. The limits of agreement for ETDRS versus Lea symbols were slightly wider than for ETDRS versus Kay pictures. Phase four (b) (repeatability): 100 adult subjects were tested. Paired t-test analysis demonstrated no significant difference between tests in either ETDRS ( p = 0.1) or Kay pictures ( p = 0.1). Mean bias for both tests was 0.01 logMAR with similar limits of agreement. Conclusion: The newly designed Kay picture optotypes have been shown to be reliably recognised by a paediatric population. In a six-picture test format the newly designed optotypes with single picture presentation and crowding bars have been shown to be a reliable and repeatable VA test in an adult population with good agreement with current gold standard VA assessment methods. Normative data in a paediatric population are now required.


Introduction
Accurate assessment of visual acuity (VA) is vital to diagnosis and management decisions. There are a number of paediatric VA assessments currently available; however, in the UK the Kay picture VA test is one of the leading VA tests for pre-literate children in clinical practice, with many management decisions based upon the result of this test.
In 1981 the Kay picture children's vision test was created due to other methods of paediatric vision assessments being poorly graded, not easily repeatable and not easily related to the accepted VA standard of that time -the Snellen's chart. 1,2 The original design and the test pictures were aimed at children 2-3 years of age or additional patient groups in which clinicians may not be able to obtain accurate acuities on alternative assessments. 2 This was supported by previous reports that children as young as 2 years can perform recognition tasks through matching or naming appropriate optotypes, with this becoming considerably more achievable by the age of 3 years. 1,3 There has been a considerable amount of research into the design and development of VA charts. Over the years Kay Pictures Ltd has developed a number of paediatric VA assessments, developing them further from a Snellen format to a logMAR format. The crowded logMAR Kay picture test, in particular, incorporated the Bailey-Lovie chart construction principles, 4 meaning that the number of optotypes on each line, the spacing between the optotypes and the size progression between each logMAR line were standardised. 2,4 A recent review article has evaluated current paediatric vision assessment with the use of an adapted version of the previous guidelines from the International Council of Ophthalmology. 5,6 The international and national guidelines indicate that the following points should be addressed to ensure uniformity between clinical measures of VA: 1. Optotypes should be black on white background. 2. Crowding elements should be incorporated into the test. 3. Optotypes used should be of approximate equal legibility. 4. Optotypes should be evenly spaced and centrally disposed. The gap between letters should be equal to the width of the letters. 5. At least five optotypes should be displayed on each line. 6. Optotype sizes should have a geometrical progression (constant ratio) of step sizes of 0.1 log units per line. [5][6][7] From the literature it is apparent that few VA assessments consistently meet all of the above criteria. The Kay picture crowded test incorporated the Bailey-Lovie chart construction principles, 4 and therefore meets most of the criteria. However, the spacing between the optotypes is 2.5Â the stroke width based on the research used to construct the Glasgow crowded acuity cards. 8 Previous literature has highlighted that the crowded Kay picture test is a repeatable and comparable method of paediatric acuity testing. 4,5,9 However, no further comprehensive validation has been published since its development with regard to the picture recognition and the equality of resolution acuities between optotypes. Anecdotally, in recent years clinicians have reported that some of the current optotypes are not as easily recognised by today's 2-to 3-year-olds as they were previously, due to societal changes. Therefore, Kay Pictures Ltd decided to reassess the resolution acuity and recognition of a new set of picture optotypes which are visually interesting and familiar to an age group of 2-3 years. The selection of pictures was created by Kay Pictures, but the following evaluation was a collaborative process between the research team and Kay Pictures.
When reassessing resolution acuity and recognition it is important to make comparisons between letter acuity charts (ETDRS and Landolt C) and optotypes charts (LEA symbols) which are already considered to have equal resolution acuity across optotypes. 10,11 In addition, due to issues surrounding the current crowded linear test in young children, where confusion can occur regarding which picture they are identifying, there was a notion that producing a single optotype in a crowded format may potentially enable the younger participants to perform a crowded test. As the current single-picture test can overestimate acuity due to the absence of crowding, this would enable the clinician to achieve a more accurate result while maintaining equal crowding of all optotypes.
The aim of this study was to validate the design of updated optotypes to be used in the Kay picture acuity tests to improve the resolution acuity, recognition of the pictures, repeatability and comparisons with gold standard logMAR acuity assessments. The design process of the Kay pictures acuity assessment focused on the following phases: . Phase one: To compare the adult resolution acuity of 25 picture optotypes. . Phase two: To assess the recognition of these pictures in children. . Phase three: To assess the resolution acuity of the reduced number of picture optotypes. . Phase four: To compare the final picture selection with current tests (part a) and assess the test-retest reliability of the updated Kay picture acuity test (part b).

Methods
In all phases informed consent was obtained prior to assessment from the subjects (phases one, three and four) or their parents (phase two). This research protocol observed the tenets of the Declaration of Helsinki and was approved by the University of Liverpool ethics committee (phases one, three and four) and the NHS MREC (phase two). A graphic designer created 25 pictures (apple, ball, banana, bird, boat, boot, cake, cat, clock, dog, duck, fish, flower, hand, house, man, mug, aeroplane, scissors, sock, star, tree, train, umbrella and van). Some were based on pictures in the current test while others were new. All pictures met the criteria set in the original design, drawn within the 10 Â 10 grid. 1,2 The distance between the crowding bars and the optotypes was 2.5Â the line width. It was not anticipated how many pictures were going to be included in the final test.
The key aims of the development were to ensure that the pictures were recognised by as many children as possible, while maintaining validity of the images. Validity was determined by identifying pictures in which there was little variability in resolution acuity and consistent findings when compared with current tests and minimal test-retest variability.
In phases one, three and four the pictures were presented on a computer monitor as a single optotype within a box ( Fig. 1), to maintain a fixed degree of crowding. There were five optotypes at each VA level, decreasing in increments of 0.1 logMAR. Stimuli were displayed using a Windows 7 PC driving monitors with resolution in phases one and two of 1280 Â 1024 and in phases three and four of 1920 Â 1080. All testing was performed under standard clinical lighting binocularly with the subjects wearing their habitual refractive correction.

Phase one: Resolution acuity of 25 pictures
Adults were tested to ensure that cognitive ability did not affect the results. Inclusion criteria were binocular VA of at least 0.2 logMAR (determined using the ETDRS chart when wearing their habitual correction) and no known ophthalmological deficits (other than refractive correction). Bespoke software was used to generate and control a series of two-up, one-down staircase procedures, to obtain a threshold for each of the 25 pictures, all eight Landolt C orientations and five ETDRS letters (N, D, H, R and Z); all staircases were interleaved. During testing the examiner pressed a key to indicate whether the response was correct or incorrect, with each staircase ending after eight reversals. The threshold for each optotype was calculated by averaging the last six reversals. Testing time varied between 15 and 30 minutes.

Phase two: Recognition of pictures in children
In this phase children were recruited from seven UK hospital sites. Inclusion criteria were age 18 months to 5 years, the ability to name or sign a response, and VA of at least 1.0 logMAR. Testing was performed binocularly with the subjects wearing their habitual correction. Each of the 25 pictures measuring 4.5 cm square were printed on A7-size cards and shown to the child at a close distance, so that all pictures were easily visible to all children. The pictures were shown in a randomised order. The child was asked each time 'What is this?' with no clues or prompts provided. Alternative acceptable answers were identified prior to testing and listed on the back of the card for the examiner. Any answers given that were not on the list were recorded by the examiner. Subsequently all data were reviewed to decide whether responses were considered appropriate; this was completed by one examiner to ensure standardisation in scoring methods.
Phase three: Resolution acuity of a reduced selection of pictures Following phase two, the data were analysed to remove pictures from the selection that had either low recognition or provided statistically significantly different VA thresholds. In phase one, the large number of optotypes presented resulted in testing times of up to 30 minutes. Although the optotypes were randomised in this phase to minimise the impact of fatigue, some subjects did report difficulty towards the end of testing. Therefore, it was subsequently decided to repeat phase one testing with a reduced selection of pictures to ensure validity. In addition, for comparison, the Landolt C was used with VAs measured for four orientations (top, bottom, left and right). As with phase one, only adults were tested using computer presentation. A three-down, one-up staircase procedure was used (written in Psychopy) so that the staircases converged to a performance of 79.4% correct. 12,13 Testing time was between 5 and 10 minutes. The data were analysed and the pictures with the most consistent resolution acuities were used for the following phase.

Phase four (a): Comparison with current tests
Following phase three, the selection of the six final pictures was made. The chosen six were presented singly in a standard clinical format (no staircase) and compared with the Lea Symbols and ETDRS tests, with each test presented in turn in a randomised order. For the Kay pictures (at 3 m) testing commenced at 0.4 logMAR and if subjects correctly identified a picture, a picture on the line below was presented. This continued until the subject was unable to correctly identify the picture. The size of picture was then increased by two lines and five pictures per line were displayed individually until four were incorrectly identified at a particular size. The threshold was scored per picture correctly identified (0.02 logMAR per picture). For the Lea Symbols and ETDRS tests (tested at 4 m, with the computer calibrated for the exact room size), the same scoring protocol was applied, but the charts were displayed using the Thomson Test Chart Xpert.

Phase four (b): Test-retest variability
The new Kay pictures and the ETDRS were tested twice with the order randomised. ETDRS was used for comparison as it is the 'gold standard' method used in many studies. For the ETDRS test, the letters were randomised between tests one and two. For the Kay pictures, the computer randomised the picture choice (presented using Psychopy) but the examiner controlled the size selection. If the computer chose the same picture consecutively, this was changed by the examiner. This was to ensure that additional cues given by the change in target were consistent. Other methods of assessing VA adopt this approach by ensuring the same letter/optotypes are not displayed directly next to one another.

Phase one
Fifty subjects were tested. The mean (ÔSD) acuity for each of the 25 pictures ranged from À0.123 Ô 0.124 for the flower to À0.308 Ô 0.105 for the man. The mean (ÔSD) acuity for the eight Landolt C orientations was À0.059 Ô 0.120, and À0.128 Ô 0.101 for the ETDRS letters. The criterion for the picture selection was that the pictures must result in acuities within a range determined by the variation (difference between minimum and maximum values across optotypes) for Landolt C (0.134 Ô 0.071) and ETDRS (0.102 Ô 0.045). Following this analysis, three pictures (flower, man and umbrella) were removed due to the large differences in VA compared with other pictures (Fig. 2).

Phase two
In total 420 children were assessed (54% male) in different age ranges: under 24 months (n = 40), 24-35 months (n = 145), 36-47 months (n = 136) and 48-60 months (n = 99). The sample included subjects with mild to moderate learning difficulties (n = 53) and some who were not native English speakers (n = 33). Analysis of the whole group resulted in the removal of four further pictures (indicated by ‡ in Table 1) as they had lower than 80% recognition. As expected, recognition improved with age, with high levels of recognition present by 3 years of age. As the recognition levels plateaued at 2.5 years of age, the analysis was repeated for the youngest children. In the children under 2 years of age, the plane and boat had the lowest levels of recognition. Following the exclusion of these pictures, analysis of the children aged 2-2.5 years showed the lowest recognition levels were for the banana, sock, train and van. Analysis of the group of children with learning difficulties demonstrated the same pattern of recognition, resulting in no further pictures being removed. Recognition of pictures that were similar (e.g. sock and boot) was reviewed (defined as analysis of pairs, Table 1). Of the paired pictures, sock and boot were shown to be confused, with 129 children confusing sock with boot and 19 boot with sock. Bird and duck were frequently confused, and as duck had higher recognition in all age groups the bird was removed. Train and van were also confused. Initially the van was eliminated following the analysis of the 2-2.5 years age group, but in the youngest group with the greatest variability the van had a much higher recognition rate, therefore in this pair the van was kept.

Phase three
In this phase 43 subjects were assessed. Bland-Altman plots were created comparing each of the 12 pictures with the Landolt C mean VA (À0.07 Ô 0.16); mean bias and limits of agreement are shown in Table 2. At the end of this phase, the picture selection was reduced to six   (Fig. 3) based on a combination of similar mean bias levels (see Table 2; cat, dog and fish removed due to higher mean bias), recognition levels in all children (tree removed) and recognition levels in children under 30 months (ball and hand removed).

Phase four (a)
One hundred and thirteen subjects were assessed in this phase, with ages ranging from 17 to 71 years (mean Ô SD: 26 Ô 11 years

Discussion
The study data clearly show the development process and the overall validity of newly designed Kay picture VA symbols. It is evident when comparing the resolution acuities and recognition of the initial 25 optotypes that there was a need to exclude a number of pictures. The exclusion of these pictures ensured the optotypes are of equal resolution acuity and recognition for the targeted age group. Although the number of optotypes has been reduced from eight to six, the newly designed Kay picture test will still have more optotypes than other paediatric vision assessments which have been shown to be reliable methods of assessing visual acuities. 3 Previous studies have demonstrated that the Landolt C and LEA symbols have optotypes where the recognisability is considered good and no particular optotypes are more difficult to see than others. 10 This demonstrates the value of making comparisons of the Kay picture optotypes with the Landolt C, to determine which pictures to exclude. Despite the similarities in the recognisability of the optotypes with the Landolt C and the Kay pictures, it is important to note that the tasks are fundamentally different. Kay pictures assess the recognition of the optotypes, whilst Landolt C, although a resolution acuity test, is assessing the detection of the gap in the optotype. 10 Due to the differences in the methods of measurement it was crucial to make comparisons of the Kay picture test with other methods of recognition acuity in addition to the Landolt C, including LEA symbols and the ETDRS. During the design of the new Kay pictures it was ensured that once again the Kay pictures followed the construction principles of the Bailey-Lovie chart. 4 The value of employing these design principles is that it allows uniformity across other VA assessment methods that have also adopted these principles, ensuring optotypes and crowding are standardised. As the LEA symbols, ETDRS and the new Kay picture test have adopted the construction principles of the Bailey-Lovie chart, it was important to make comparisons between the tests. 4 The findings demonstrated that the ETDRS has good agreement with both the LEA symbols and the Kay pictures. However, it was encouraging that the upper and lower limits of agreement were slightly narrower when looking at the results of the Kay picture test versus ETDRS (À0.055 to 0.22) compared with the LEA symbols versus ETDRS (À0.117 to 0.183). This   Regardless of the positive result from the study, the methodology did have some limitations. Firstly, the initial testing in phase one used an extended staircase method which took a considerable amount of time for participants to complete. However, this was revised for the later phases of testing ensuring a staircase method was still evident but over a shorter period of time. These changes were important particularly for the latter phases of testing involving the test-retest phase, to ensure participant concentration was maintained to enable accurate assessment of VA thresholds.
To ensure that participant concentration or cognitive ability did not have an effect on phases one and three, adult participants were used. Using adult participants during these phases eliminated any issues that may exist when testing children, i.e. concentration. Additionally, the Kay picture test for the purpose of this study was a computerised version of the test which cannot be directly compared with the hard copy version due to the differences that exist between computerised vision assessments. Previous studies have reported that a computerised version of Kay pictures appears to be a valid alternative to the hard copy. 15 However, for the purpose of establishing the recognition and comparability this was irrelevant as all other acuity tests were computerised versions, ensuring consistency over the study.
The newest versions of the Kay picture optotypes developed in the course of this study meet all but one of the requirements of the International Council of Ophthalmology, and the British Standards of VA assessment. 6,7 The spacing between the optotypes does not apply to the newly validated Kay picture test as all optotypes are individually equally crowded in a box, even if multiple are presented on a page (an example of an individual optotype is shown in Fig. 1). Meeting these requirements further indicates the uniformity between the Kay pictures and other gold standard clinical methods of VA assessment (Lea symbols, ETDRS and Patti pictures). 5,6 The further comprehensive validation of the optotypes has allowed Kay pictures to meet this criterion. Additionally, the new format of singly crowded pictures allows the possibility of introducing a crowded VA assessment in a younger age group than the original Kay pictures test (2-3 years).
Following the validation of the newly designed Kay picture optotypes, further data collection is under way to compare the newly designed Kay picture test with the current test used in clinical practice. These further data will facilitate the clinician's interpretation of the VA scores when transitioning from the current to the new test. Additionally, the final stage of the development process will be the collection of age-related normative data.

Conclusion
The new Kay picture optotypes presented in a singly crowded format have been shown to be a repeatable method of paediatric VA assessment, highly comparable with the gold standard ETDRS VA assessment. The single crowded optotype design not only produces a reliable VA assessment, but allows the introduction of the crowding phenomenon to a younger age group. This design feature was employed with a view to avoiding the potential confusion which can be apparent with linear paediatric VA assessments in younger age groups.