CHAPTER XIII. THE THEORY OF THE MEASUREMENT OF MENTAL DEVELOPMENTIn defining the borderline of feeble-mindedness it will be found that certain assumptions are usually tacitly made as to the form of the curves of normal and retarded development. These assumptions which are often based on vague conceptions of mental measurements should be brought clearly to mind if we are to compare the relative merits of different scales of mental tests or different ways of stating the borderlines of deficiency. With this in view it is proposed to take up in this second part of the monograph a brief technical discussion of the units of mental measurement, the equivalent individual differences at different ages, and the curves of mental development. The bearing of these conceptions on the various quantitative definitions of tested deficiency, including the percentage definition, will then be discussed in the following chapter. Practical advice as to individual diagnosis or group comparisons has been confined to Part One, so that those who are not concerned with the theoretical assumptions on which the conception of mental development and the interpretations of tested deficiency are based should omit Part Two. Fig. 3. Hypothetical Development Curves (Normal Distribution) A. Comparison of Units and Scales for Measuring Individual Differences.(a) Equivalent Units Of Ability When The Distributions Are Normal.In considering the curves of development it is desirable first to notice the differences between measurement in equal physical units and measurement in equivalent units of ability or of development. The difference in the point of view of the two forms of measurement is so pronounced that I can hardly hope to make myself clear to those who are not somewhat familiar with such terms as “distribution curves,” “frequency surfaces,” “standard deviation,” and other phrases connected with the theory of probability, which are treated at length in such books as Thorndike's “Mental and Social Measurements” and Yule's “Introduction to the Theory of Statistics.” We often, by mistake, regard the growth of an inch in height, for example, as always representing an equivalent unit of growth. This will lead us into rather serious misconceptions unless we are careful, for it is perfectly evident that the growth of an inch in height has a very different significance for the three-year-old boy than for the eight-year-old. Half of the three-year-old boys grow about 3 inches during a year while at eight years of age not more than about one in seven grow that much. Moreover it In the measurement of mental ability, moreover, it is exceedingly difficult to utilize equal physical units. Most of the objective units which are commonly called alike are clearly not equal even in the physical sense. “Spelling one word,” for example, is not equal to spelling another “one word;” but only equal to spelling the same word. Out of such units of amount accomplished, it is, of course, not possible to build a satisfactory scale without referring to some other concepts of measurement. Some tests, however, are scored in equal units. When the measurements for example, are in the units of time it takes to perform the same task under the same outward conditions we have the possibility of a scale of equal objective units. Such a scale is approached by the results with the form board test which give the number of seconds it takes children to place blocks of different shapes in their proper openings. Even the unit of time may be deceptive in name, as it is with the Binet scale. A year of time is, of course, the same physical unit and the task proposed with the In order to determine equivalent units of activity we find that a number of different concepts have been utilized. With some of the scales for measuring educational products, such as Thorndike's Scale for Handwriting, equal units of merit in handwriting mean differences judged equal by relatively the same proportion of competent judges. This form of unit has not been used, however, in any scale of mental development thus far proposed. In the measurement of mental ability the most commonly accepted idea of equivalent units is that they are provided by the units of standard deviation for a series of measurements which distribute in the normal form. The meaning of these units may be understood by referring to Fig. 3 which shows Gaussian or normal distributions of abilities of individuals at various periods of life in curves A, B, C, D and E. The straight lines of the measurement scales form the bases of these distribution curves. These graphs represent the normal form of distribution usually expected when any fundamental ability is measured in a random group. If the number of cases at each unit of measurement are plotted by a point placed relatively as far above the scale, used as a base line, as the number of cases found at that unit of the scale, it will be discovered that these points arrange themselves in the form of a symmetrical curve high at the middle and flaring out The studies of biological traits suggest that a unit of the standard deviation is the most important measure we have for equivalent degrees of any trait which distributes normally. It measures the same portion of the total distance from the lowest to the highest ability on any objective scale so long as the distribution of measurements is in the normal form. It thus affords the best interchangeable unit from measurements at one life-age to those at another, provided that the distributions keep close to the form of the normal probability curve. This is the assumption on which practically all the developmental scales have been based. The difference in ability between an individual at the average and at -1 S. D. (standard deviation) below the average is equivalent to that between the last individual and one at -2 S. D. The same distances along the base line of different distribution surfaces measured in terms of their respective deviations set off equivalent portions at each age so long as the distributions are normal. For example individuals We may now compare the relations of the units in the physical scale, shown at the left of the figure, to units of the scales for adults or for the immature of any age, expressed in units of the standard deviation from the averages of these groups. Relative ability measured on the physical scale or any one of the distribution scales in Fig. 3 will be found identical since they all start from the same zero point and the distributions are all normal. But the ability of an individual in one distribution can hardly be compared with that of an individual in another distribution in a biologically significant way by their actual positions on the physical scale. A physical unit, does not measure the same sort of fact of development in a scale for the immature that it measures in the scale for adults or that it measures in another dynamic scale for the immature. This can be seen when a physical unit is compared with the amount of standard deviation which it measures in the different scales. Moreover, the correspondence of relative distances on the physical scale and any one of these other scales will not hold the moment the distributions do not start from the same point or are unsymmetrical. It does not seem seriously wrong to suppose that there are some individuals at any age who have no more mental ability than the baby of the poorest mental ability at birth. At any rate our intelligence scales are hardly fine enough to measure the difference in intellectual capacity between the dullest adult idiots and the dullest idiot In applying the concept of the probability curve we should distinguish between individuals who have attained their mature mental capacity and those who are still maturing. The former would be represented by a random group of adults (Distribution E, Fig. 3) the latter by a group of nine-year-olds (Distribution C). If we say, for example, that a child has reached a certain stage of development we might have in mind the final distribution of mature capacity or the distribution of capacity among those of his particular age or of all ages. When we compare stages of development we must, therefore, be careful to indicate the distribution surface to which we are referring. An increase in development may refer to at least five different things depending upon the scale of measurement to which reference is made. Besides an increase measured by the physical scale, the scales for adults, for the immature or for all ages, to which we have already referred, it may mean an increase judged by the distribution of increases which individuals of the same life-age and capacity make in the same period of time. This last meaning may be the most significant, although it has never been used. It has reference to a distribution surface of increases such as is represented in Distribution F, Fig. 3. This is intended to show the increases in one year of all two-year-old children who had average ability at 2 years, on the assumption (b) The Year Unit Of The Binet Scale.A sharp disagreement of opinion as to whether the Binet year units can be regarded equivalent has arisen between Karl Pearson, Director of the Galton Laboratory in London, and certain psychologists who have used the Binet scale. Cyril Burt, for example, says, as quoted by Pearson: “Except for rough and popular purposes, any measurement of mental capacity in terms of age is unsatisfactory.... The unit fluctuates in its significance all along the scale. When the child is just beginning to walk and talk, when he is 7 or 8, when he is 10 to 11, when he is on the verge of puberty—at these different periods a retardation of a single year means very different things” (164, p. 36). A number of good psychologists including Yerkes, Terman, and Kuhlmann, agree with Burt in maintaining that a year of retardation at different ages has very different significance. With this statement of Burt, Pearson takes issue, saying: “Can the psychologist to the London County Council ever have seen the growth curves of children, or would he write thus?... There is no valid reason to suppose that a year's growth in mental power may not Like many other apparently opposite statements both contain truth. The conflict arises apparently, first from a disagreement between the data obtained with the Jaederholm form of the scale, on which Pearson bases his statement, and data obtained with other forms of the scale; second, from a discrepancy in the points of view. Pearson stresses the fact that the mental year-marks equal average growth increment with the Jaederholm scale (167). He shows that the regression of years of mental excess (or deficiency) on increase of life-age is a straight line, just as he found it with physical measurements. Moreover, the standard deviation of the mental measurements for the entire group of normal school children, 6-14 years of age, was found to be about one year of mental age (.96 year for the corrected data) (167). To which Pearson's opponents might reply, these facts are of comparatively little significance unless the deviations for the separate ages are alike in terms of these year units on the scale. Neither linear regression nor the balancing of years of excess by years of deficiency at each age indicates that the deviations of the separate ages are alike in terms of the year units. The new Stanford scale, for example, shows both of these conditions and yet the range of months of life-ages which sets off the middle 50% of the children of the different tested ages increased decidedly from 6 to 14 years of age. The middle half of the tested ages, for example, at age VI on the scale include a randomly selected group of six-year-old children whose range of life-age is ten months, at age VIII on the scale this range is 13.4 months, at X it is 16 months, at XII, 20 months, and at XIV, 26 months. “The number of 6-year-old children To this argument Pearson might reply that he had not overlooked the question of variation in the deviations from one age to the next for he has a footnote in which he states regarding the Jaederholm data: “There are, however, relatively little differences in these mental age standard-deviations of the normal children beyond what we may attribute to the effect of random sampling” (164, p. 46). In this respect, then, the Jaederholm data differ notably from Terman's data obtained with random groups with the Stanford scale and, as I shall show, from data obtained by Goddard with the 1908 Binet scale, the two largest groups of Binet test data which have been collected. Even with the Jaederholm data on efficient school children, although the largest difference between the standard deviations of different age groups is only about twice its probable error, it is notable that 24 of his 39 7-year-olds are included within an interval of the middle year of tested age, while only 9 of his 35 11-year-olds are included within the same middle year interval. Taking Goddard's data for the 1908 scale for the separate ages from 5-11 at which probably the factor of selection for his groups may be neglected, I have calculated the standard deviations from his Table I and find them as follows:
The differences between the deviations for ages 7 and 11 or between ages 8 and 10, are more than three times their standard errors, so that we would not be justified The comparison of the year units on the Binet scale with the diagrams in Fig. 3 shows that if the scale at each life-age shut out the same lowest proportion, say half, of the children of that age, then the year units might be regarded as equal in the sense of equal average growth increments, as Pearson suggests. A child 7 years of age testing VII would be at least one annual average-growth unit higher in mental development than one of 6 years testing VI, and so with each age until the limit of development had been reached. This is the condition approximated closely for children by the new Stanford scale and the corrected Jaederholm data. Since there is little prospect, however, even with a scale perfected so far as its age norms are concerned, that the total distributions for each of the different years would be the same multiple of the year-units, the main significance of the age units is in permitting the statement that a child had reached the tested development normal for the children of a certain age. It is also legitimate to use years of retardation as a short way of expressing rough borderlines when they happen thus to afford an easy method of empirically describing equivalent borderlines for a particular scale. This is what I have done for convenience in Part One of this book. I certainly do not mean to contend that four-years retardation has theoretically the same significance at different ages, in terms of the deviation of the separate ages. To me the Binet years are no more than names for certain positions on the scale. To most psychologists who have been dealing with the With the scales in use in this country the Binet year units are not equivalent in the sense in which they are usually spoken of as equivalent. We should recognize this and emphasize it. Even if the norms at each age marked off the same proportion of the individuals, as shown in A and B of Fig. 4, unless we knew that the forms of distribution were always alike, we should not know that the distance between successive age norms was the same on any sort of objective scale other than average age increments. Moreover, we would not have an objective scale of equal units applicable to measuring the deviation of children of any one age. The average annual increments would not necessarily represent the same proportion of the total distance from the lowest to the highest ability at different ages even if the distributions were all normal. With normal distributions it would also be necessary to demonstrate empirically that the annual average growth increment between successive ages always bore a constant relation to the deviations at these adjacent ages as shown in B of Fig. 4 where the increment is equal to 1 S. D. at each age. This could not possibly hold when the increment lessened near maturity. Fig. 4. The Question of Equivalence of Year Units. With the empirical evidence against the equivalence of the year units and the impossibility of determining their equivalence unless we first know that ability is distributed normally at each age, it is certainly hazardous to assume that individual deviations measured in terms of year units are equivalent at different ages. It may be noted that it is quite as hazardous to suppose that the units of the Point scale are equivalent in any theoretical or practical sense. This question will be discussed later in Chap. XIII, B, (b). (c) Is Tested Capacity Distributed Normally?Before leaving the question of the significance of units on a scale described in terms of the standard deviation we should ask whether tested mental abilities have been found to distribute normally, i. e., in the form of the symmetrical Gaussian curve with each extreme the same distance from the middle measurement. Contrary to the usual supposition in this matter, it seems as if the evidence was somewhat against this assumption, although neither position can be asserted at all dogmatically on the basis of our present data. A rÉsumÉ of this evidence which I have given below makes it appear that the assumption of a normal distribution will not conflict with a practical use of normal probability tables for medium degrees of ability, but may quite seriously interfere with such use for the borderline of deficiency. There is little doubt, as Pearson believes, that the bulk of the children now in special classes for the retarded in the public schools would fall within the lower range of a normal distribution fitted to the general population. On the other hand, there is likely to be a respectable minority of the deficients which will be beyond such a normal curve. These facts are sufficiently evident, I believe, to make it impossible to base quantitative descriptions of borderline of deficiency on a hypothesis of normal distribution. The best evidence on this point is probably the data of Norsworthy with eleven tests on groups of 100 to 150 feeble-minded children in institutions and special classes and 250 to 900 normal children. She expressed the position of each child in terms of the deviation of the group of normal children of his age for each test. Pearson has presented her data graphically on the assumption that her defective group represented 0.3% of a general population The other data, which I have found, that indicate that tested ability, when measured in equal physical units for the same task, is skewed toward deficiency, have to do with tests that are pre-eminently for psychomotor activities rather than intellectual. They consist of Sylvester's and Young's results with the form board test on Philadelphia school children, Stenquist's results with his construction test, and Smedley's results with the ergograph test on Chicago school children. Here we may apply the better criterion of the distance of the quartiles above and below the median of the group. These positions would be less likely, through extreme records, to be affected by chance conditions during the testing. It is to be remembered that if the records of school pupils appear to be normally distributed this would not settle our problem, since it is apparent that idiots and many Sylvester (191) tested with the form board a group of 1537 children in the Philadelphia public schools, from 80 to 221 at each age from 5 to 14 inclusive. “Except that no especially backward or peculiar children were included there was no selection.” This study gives, with the complete distribution tables, the number of seconds required for the same task by the children at each age. If we find that the limit of the lower 25 percentile was farther from the median than the limit of the upper 25 percentile we can be reasonably sure that the difference would be still greater if the excluded deficient and backward children were also included. By calculating the quartiles and their differences from the medians at each age, I find that for only two of the eight ages is the upper quartile farther from the median than the lower quartile. The average excess of the distances of the lower quartile is .64 of a second. At only age 7 is the difference three times its probable error, 2.1 seconds, P. E. .67. The form board distributions thus tend to be slightly skewed toward deficiency. The errors of the quartiles were found by the method given in Yule's Introduction to the Theory of Statistics, Chap. XVII, which assumes normal distribution, so that they are too small. The skewness is more manifest when the extreme measurements are compared with medians at each age. It is not possible, unfortunately, Since it was not important to compare the amounts of skewness in different data, I have not attempted the more elaborate calculations of coefficients of skewness. These would give the results a more elegant statistical expression. The simpler method I have here used affords more convincing evidence of asymmetry for the non-mathematical reader. Young has published the results with Witmer's form board test on approximately two hundred Philadelphia children for each age, giving the results for the sexes separately for each half year of life-age (227). This affords 36 different groups in which he gives the median and upper and lower quintiles for the shortest time records. The lowest quintile is farther from the median in 25 cases, equal in 6 and less than the upper quintile in only 6 of the 36 comparisons. This skewness would have been even greater if children of the special classes had not been excluded from his groups. Stenquist's results (54) with his construction test are scored in arbitrary units in which allowance is made for the quality of the score, but we should expect no constant effect on the form of the distribution from the character of these units of measurement. At ages 6 to 13 he tested from 27 to 74 pupils randomly selected from the public schools, a total of over 400. For six of these eight ages the lower quartile is farther from the median than the upper quartile, when calculated from his distribution table. The number of cases at each age, however, is so small that the largest difference, 15 units, is not three times its probable error, 6. A casual observation of his standard percentile curves for the ergograph test at the different ages gives the impression that the distributions are decidedly skewed toward deficiency, but this impression is not justified by a careful analysis of his results (51). In the table which accompanies his standard percentile curves, giving his total results for the two years, we find that there is a sharp disagreement between the distributions of the boys and the girls. The distributions for the boys at each age between 6 and 13 years show a greater distance, measured in kilogram-centimeters, from the median to the 80-percentile than from the median to the 20-percentile, in 5 ages out of 8. The total difference is also slightly greater between the median and the upper 80-percentile. On the other hand, the table for the girls at these ages shows the 20-percentile farther from the median in 5 out of 8 ages, with a total difference considerably greater than that shown for the boys. Usually the differences were small compared with their errors. With the boys only at age 13 was the difference in favor of the 80-percentile three times its probable error, while with the girls the four oldest ages show the distance of the 20-percentile greater by three times its probable error. A comparison with the reports of Smedley on this test for the previous year (Report No. 2), leaves his results still more uncertain. While he does not give the medians at each age, we may make less satisfactory comparisons between the distance of the 10-percentile from the 25-percentile and the distance of the 90-percentile from the 75-percentile. If we do this, we find the distance is uniformly greater at the upper end of the distributions for each age both for the boys and girls. The Smedley results are, therefore, decidedly contradictory. The first Broadly considered, the Binet records with school children point to a skewed distribution toward deficiency when large allowance is made for the difference in value of the year units. It is extremely rare to find a child testing 4 years in advance of his life-age, while 15-year-old idiots are presumed to test 12-year-units or more under a mature standard. Pearson believes that “the Gaussian curve will be found to describe effectively the distribution of mental excess and defect” for intermediate ages as measured by Jaederholm's form of the Binet scale. The data on which Pearson places reliance are Jaederholm's results in testing 261 normal children 6-14 years of age in the Stockholm schools and 301 backward children in the special help classes of the same city. The best fit of a normal curve to the data was obtained with a group of 100 8-year-old children, in which case the chances were even that samples from a normal distribution would fit. With his larger normal and backward groups combined in proper proportions in one population the chances were 20 to 1 that such a distribution as was actually found would not fit into the Gaussian distribution. He admits that “this is not a very good result,” although it is better than when the Gaussian curve is fitted to either the normal or the backward group alone. In a subsequent paper he gives each child a score relative to the standard deviation of the normal child of his own age, a method comparable to his treatment of Norsworthy's data. He then finds that “10% to 20% or those from 4 to 4.5 years and beyond of mental defect could not be matched at all from 27,000 children” (164, p. 46). In each case the distributions actually found were skewed somewhat toward deficiency. Furthermore, Pearson thinks that the skewed distributions of his data may possibly be explained by the drawing off of older children of better ability to the “Vorgymnasium,” or to the higher-grade schools, by the incompleteness of the higher age testing, or by the “possibility of the existence of a really anomalous group of mental defectives, who, while continuously graded inter se, and continuously graded with the normal population as far as intelligence tests indicate, are really heterogeneous in origin, and differentiated from the remainder of the mentally defective population” (164, p. 34). The last hypothesis, of course, supposes that mental ability is skewed and suggests the cause. He supplements this explanation by stating that the heterogeneous cause of the “social inefficiency” of the deficients may not be connected directly with the intellect but affect rather the conative side of the mind. A skewed distribution under biological principles of interpretation supposes a single cause or group of causes especially affecting a portion of the population. It is also to be noted that the apparent form of distribution may be the result of the nature of the test and the units in which it is scored. Some tests might not discriminate equally well a difference in ability at the lower and at the upper ranges of ability. If the test were too easy the group might bunch at the upper portion of the scale and the distribution appear to be skewed toward the lower extreme where there were only a few cases. Turning to the analogy of measurements of physical growth, a strong argument may be made for the hypothesis of shifting forms of distribution. As Boas points out regarding measurements of the body at adolescence, owing to the rapid increase of the rate of growth the distribution of the amounts of growth is asymmetrical, “the asymmetry of annual growth makes also all series of measurements of statures, weights, etc., asymmetrical.” Moreover, “acceleration and retardation of growth affects all the parts of the body at the same time, although not all to the same extent.... Rapid physical and rapid mental growth go hand in hand” (80). There is no reason to suppose that the brain is free from this phenomenon of asymmetrical distribution of annual increments of growth among children of the same age when the rate of growth is changing as at adolescence. It is therefore to be expected that the separate age distributions would be skewed at early ages and at adolescence even if the distribution should be normal with a static population. The presumption from physical measurements is that the form of distribution shifts with age. In spite of these arguments and the evidence of asymmetry of measurements at least at some periods of life it is to be noted that current opinion is probably contrary to this hypothesis, although, as I believe, because it has been concerned mainly with those who are not of extreme ability. For all large medium ranges of ability slight skewness might well be negligible. It is interesting to note that Galton says that “eminently gifted men are raised as much above mediocrity as idiots are depressed below it” (159, p. 19). Measured by intelligence quotients with the Stanford scale, Terman finds among school children that deviations below normal are not more common than those above (197, p. 555). Burt, following a suggestion of Cattell as to college men, however, seems to incline to the opinion that the general distribution of ability, like wages, is skewed toward the upper end. He adds, “In crude language, dullards outnumber geniuses, just as paupers outnumber millionaires” (85). (d) Equivalent Units Of Development When The Form Of Distribution Is Uncertain.For our problem of units and scales of measurement, an asymmetrical distribution sets a very difficult problem. It may be that this very difficulty has been one of the main reasons for slowness in recognizing the drift of the evidence. In order to set forth the difference in the conception of measurement when distributions become asymmetrical I have presented this hypothesis in connection Under conditions of variable symmetry there is a sense in which the same relative physical score in units running from zero ability to the best ability would always have an equivalent objective meaning, but this might not express equivalent development conditions at different ages. For example, with shifting forms of distribution, to say that a child of six years had reached three-fifths of the best development for his age on an objective scale might give no significant indication of how nearly he was keeping pace with those three-fifths of the best ability of another age. Neither would his position in units of the deviation of ability at his age give this information without knowledge of the form of the distribution of ability at his age. With varying forms of distribution at different stages of development this would afford an insurmountable difficulty. Fig. 5. Hypothetical Development Curves (Changing Forms of Distribution) In using percentiles it is to be remembered that equal differences between percentiles are not comparable in the same distribution except in the sense of the same extra proportions of the group to be met in competition. A change in the degree of ability from the lowest percentile to the lowest 2 percentile would be very different from the change in the degree represented by the 50 percentile to the next percentile above. Differences in the ability of individuals ranking near each other in the middle of the same percentile series would be distinguished with difficulty while it would be easy to make such discriminations at the extremes. The special value of the percentile units in measurement of ability lies in the comparison of individuals of corresponding position in corresponding groups in which the ability may not be assumed to distribute alike. The concept that 995 out of every 1000 randomly selected individuals at his age are ahead of a particular individual in the struggle for existence has very definite and significant meaning which is quite comparable from one period of life to another regardless of the form of the distribution. B. The Curves of Mental Development.When we endeavor to make our ideas of mental development more definite, we are assisted by thinking of the various stages in graphic form. This is especially true when trying to think of the position of the deficient individuals, relative to the average individuals and to genius. In diagrammatically presenting these concepts in Fig. 3 and Fig. 5 we do not wish to assume that all the principles on which the developmental curves have been plotted have been decided. If they make clearer the points still under discussion and direct the discussion to specific features so that more data may be brought to bear upon the empirical determination of their characteristics, they will serve a useful purpose. For our present ends, we shall consider only certain features which have a bearing upon the interpretation of developmental scales and the quantitative definition of the borderline. In the graphic presentation of the curves of development in Figures 3 and 5 the relative position at various ages has been suggested hypothetically for those of the best ability and median, or middle ability, as well as the borderline of the deficients. Otis has given a very able logical analysis of certain concepts underlying the testing of mental development (163). His discussion differs from the present in its aim to determine the proper mental age for particular tests, a question which I have not considered. It also supplements the present discussion by showing the changing value of the same intelligence quotient with normal distributions of ability under certain assumptions as to range of ability and decrease in the annual increments of ability with age. (a) The Significance Of Average Curves Of Development.Some investigators are apparently inclined to question the significance of any curve of mental development on account of the very different forms of development which they have found in particular cases. A quotation from Goddard will state this problem: “It seems to me that there is considerable evidence that there are a good many children that develop at a normal rate up to a certain age and then slow down; “Morons are not usually discovered until twelve or fourteen years of age. The picture to me of the development of the feeble-minded is rather that these different types develop each in his own way very much as the physical side develops. Different families have different determiners of development. Just as it was determined before I was born that I should be five feet, ten inches tall, I developed that height and no further. In the same way, probably, that determiner carries with it the determination of the rate of development and the time. This carries with it the fact that I should have been an average boy from birth. As a matter of fact I was very much under-size until I was fifteen or sixteen years of age. Then I shot up. Other cases are over-size. It may be a false analogy, but it seems to me to illustrate the rate at which these cases develop” (111). This view raises clearly the question how far the curve of average development represents a common tendency of different individuals in development. Are the individual curves of development so varied in form that an average curve does nothing but obscure their significance? The study of individual curves of growth in height and weight by Baldwin indicates that the bigger children tend to develop earlier, the smaller later (73). The individual curves of mental development may be analogous. If so, the average curves may not adequately represent the common tendencies of development. Nevertheless, it is to be remembered that with height and weight the average An analogous problem arises when we consider the question of variations in the maturity of different mental processes. Besides the question whether the average curve is useful in view of the variation among individuals in their rates of maturity for the same process, the psychologists have a still more difficult problem about curves of general ability. These curves are built by combining the results of numerous psycho-physical tests which are very different in type. We need to raise the question whether the type of process measured by memory for digits, for example, matures at the same rate as those processes measured by other memory tests: in general, how much a single test or combination of tests represents a common process. Furthermore, we need to inquire whether processes measured by memory tests mature like those measured by tests emphasizing reasoning, imagination, motor ability and other groups of activities. We thus have the problems of the different rates of maturity of the different tested processes in the same individual and of common tendencies among these specific processes. In order more clearly to present this problem of the significance of developmental curves for different processes, I have brought together the age norms from 8 to 14 years for 40 tests as given by different investigators. No norms were included which were not based on tests of at least 25 individuals. After 14 years the data which have been collected are open to the objection that the norms for the older ages would be seriously affected by the fact that they were obtained upon children remaining in school, usually in the elementary school, i. e., upon groups, among which a large portion of those of better or of poorer ability had been eliminated. The relative position of the norms for The variation in age norms with different tests is shown graphically in Figures 6, 7 and 8. In order that the various tests may be plotted on the same scale, so as to compare changes in development for the different tested processes, I have used the average increase in ability from 8 to 9 years of age for each test as a common measure and arbitrarily plotted the slant of the curve between these ages at 45 degrees. The increase from 8 to 9 is represented by 10 units on the objective scale to the left of the graphs. On this basis it is possible roughly to compare changes in the absolute annual increase at different ages for the same test and for different tests. It assumes that the units in which each test is scored are equivalent for that test. An average difference between the basal ages or between any two ages cannot be assumed to be accompanied by the same distribution of increases. Moreover, the 8-year norm is at different distances from zero for the different tests so that the relative increase from 8 to 9 cannot be regarded alike for the different tests. The method, however, is sufficiently accurate for illustrating the very different forms of the developmental curves which might be expected if they were measured by absolute increases from year to year. Even the variation in the slant of the lines at the different ages gives a graphic picture The tests on which Figures 6, 7, and 8 were based included practically all which were reported in the researches used. They were as follows: Norsworthy (159), perception of 100-gram weight, cancelling A's (boys), ideas remembered from four simple sentences, memory of related and of unrelated words, part-wholes, genus-species, opposites and reverse of opposites given the next day, “a-t” test. J. Allen Gilbert (108), taps in 5 seconds, fatigue in tapping, visual reaction time, color-discrimination reaction time, reproduction of 2-second interval. Smedley (51, No. 3), strength of right-hand grip (boys), taps in 30 seconds (boys), ergograph; visual, auditory, audio-visual, and audio-visual-articulatory memory for digits. W. H. Pyle, Standards of Mental Efficiency (J. of Educ. Psychol., 1913, IV., 61-70), uncontrolled association, opposites, part-wholes, genus-species, digit-symbol and symbol-digit substitution, memory for concrete and for abstract words, memory of Marble Statue selection, (only boys' norms used for each). Pyle and Anderson combined by Whipple (220) two word-building tests (boys). Anderson as given by Whipple memory for letter squares. D. F. Carpenter, Mental Age Tests (J. of Educ. Psychol., 1913, IV., 538-544), substitution of colors in forms and of numbers in forms, perception time in marking A's, concentration, i. e., difference in time of last test under distraction, memory of pictures of objects, all tests devised by Carrie R. Squire. Stenquist (54), construction test. Sylvester (191), form-board test. Fig. 6. Tests of the Development of Memory Processes. Medians at Each Age of the Central Tendencies of the Tests. Fig. 7. Different Types of Development. Medians at Each Age of the Central Tendencies of the Tests. Fig. 8. Forty Curves of Development. Distribution at Each Age of the Central Tendencies of the Tests. In Fig. 7 curve H includes Gilbert's visual reaction time, Norsworthy's A and a-t tests, Carpenter's two A tests; curve I includes Gilbert's and Smedley's tapping tests; curve J is the median of the central tendencies of all 40 tests; curve K includes Norsworthy's two opposites and her part-whole and genus-species tests, the Pyle opposites, genus-species and part-whole tests; curve L is the same as D, curve M includes Smedley's strength of grip and ergograph tests and Gilbert's fatigue of tapping; curve N includes Pyle and Anderson's word building tests and Pyle's uncontrolled word association test. In Fig. 8 curve P is Gilbert's visual reaction time test, curve S is Norsworthy's test for memory of unrelated words, the other curves are the median and quartiles for the central tendencies of all 40 tests after each was expressed at each age in terms of the gain from 8 to 9 years taken as a unit. Several points are to be noted about the nature of the curves for different tests. In Fig. 6 showing the curves for different forms of memory tests, that for the memory of digits is very different in character from that for memory of related material. The most extreme differences in the time of maturity are shown by the test for memory for digits presented orally and the substitution of color in forms, the former continues to increase so rapidly relative to the absolute increase from 8 to 9 years that it cannot be represented in the graph reaching 539 units of the scale by 14 years of age, while improvement in ability in the latter is not measured after 9 years. We cannot take time to discuss how much of the differences between the various curves may be due to the nature of the tests themselves, the form of scoring the results, or the condition under which they were given, selection of subjects, etc. The conclusion is safe, however, that when groups of three or four tests of similar type show such marked differences From Fig. 7 we may learn that tests emphasizing functions such as speed of motor or perceptual motor reaction, curves H and I, are notably different in their form from curves for tests of imaginative processes, curve N. As we group tests together covering larger ranges of activity we approach the median curve for general ability. Note the median curve for 17 memory tests (curve L) compared with the median for the 40 tests (curve J). By empirical studies we might pick out types of tests which would most closely represent the maturity of average ability. For example, the median for the substitution tests, curve E, resembles the median for the memory tests, curve D, more closely than does that of the 4 digit tests, curve B. Curve K, for 7 association tests, resembles the median for the 40 tests, curve J, much more closely than the curve for the perceptual-motor speed tests, curve H. This difference can not be explained by the use of 7 instead of 5 tests in calculating the central tendency of the group. It probably means that the sort of psycho-physical processes usually tested more closely represent on the average the abilities shown in association tests than they do the abilities shown by speed of motor reaction. The significance of this sort of analysis for those constructing a scale for measuring intellectual ability is obvious. Fig. 8 shows the median and quartile range for the central tendencies of the 40 tests and gives examples of two extremely different tests, visual reaction time and memory for unrelated words. How closely these particular tests represent fundamental differences in the maturity of different processes, we cannot, of course, be sure without prolonged research; but nobody would question that Another feature of all developmental curves which is apparent as soon as the causes of development are considered, is that growth in an individual is the result of several factors. These include the native capacity, the rate at which that capacity manifests itself instinctively, and the external stimuli which encourage or retard that manifestation. To some extent these factors vary independently. Our curves of development will never completely express all the facts until they analyse out all these factors for each of the processes. In the meantime we shall be able to think of general trends of development by considering average curves. The fact that they represent combinations of unanalyzed factors must, however, make us very cautious in interpreting our norms. (b) Changes In The Rate Of Development.There has been considerable discussion of the form of the curves of mental development. The logical aspects of the curves on the assumption of normal distribution of ability at each age and uniform age of maturity have been treated by Otis (163) and the bearing of these assumptions upon the Binet scale pointed out. Thorndike has plotted the developmental curves for a dozen tests on the basis of the variability at 12 years of age used as unit and gives a chapter in his Educational Psychology to the changes with maturity (198, Chap. XI). Bobertag suggests that the rates of development of normal and deficient children are analogous to the upward progress of two projectiles fired from such different heights that the force of gravity While recognizing that the complete curve of mental development is logarithmic in form Pearson contends that, when measured by Jaederholm's adaptation of the Binet scale, development is adequately represented by a straight line from 6 to 15 years of age (164). As this conclusion is based upon the use, as equivalent units, of years of excess and deficiency at all these ages the data lacks the cogency of a scale of equal physical units. With the Point Scale it is not known whether the units in different parts of the scale are equivalent. Without assuming that they are equal it is impossible to discover the form of curves of development from the records of children at a series of ages. Yerkes and Wood publish a curve of the increase of intellectual ability based upon point-scale measurements, which resembles in form the hypothetical curves. They say: Waiving the question whether annual increases or the range of measurements relative to the age norms would be satisfactory indications of the change in the rate of growth, it seems to be fairly clear that neither of these criteria would be adequate unless we first knew that the units in which they were measured were equivalent at different portions of the scale. To show that the point scale units are even theoretically equivalent it would seem to be necessary to assume, on the basis of normal distribution of ability, that each unit of the deviation for each age distribution either equaled the same number of scale units or the same proportion of the total distance from lowest to highest ability at each age measured in the point-scale units. The originators of the scale do not seem to have planned it with this in view. Moreover, the difficulty of empirically demonstrating such equivalence of units on a point scale or any form of the Binet scale prevents its use for indicating curves of mental development, however serviceable it may be for other purposes. The simplest demonstration of the form of the development curves is applying the same test, scored in equal physical units, to children of different ages. In Figs. 6, The best developmental curves empirically determined are probably those for the form board presented by Sylvester (191), Wallin (212) and Young (227) since in each of these cases the same test was presented at all ages and the scores were in equal physical units of seconds. It can hardly be supposed, however, that the form board curves alone would be typical of average mental development. To know something about the general curve of mental development we need a combination of a number of mental tests scored on scales of equal units. These may be either equal physical units or units on scales for mental development similar to those of Thorndike and others for measuring educational products, handwriting, arithmetic, spelling, etc. That either a straight line or a simple curve would represent the development of ability from birth to maturity is very doubtful. When we consider the entire developmental curve from birth nobody doubts that there is a change in the rate of development at the time of the arrest of instinctive changes at adolescence. There are probably fluctuations in the rate before this final arrest. Pintner and Paterson also assume a complex curve of development (44). Whether the fluctuations should be allowed for in the description of the borderline of deficiency is the important question in our study. With measurements of bodily growth we noted that changes in the rate of maturity are accompanied by a skewness of distribution of (c) The Question Of Earlier Arrest Of Deficient Children.It has been assumed by Bobertag (81), Stern (88), Goddard (117) and others that deficient children reach their maturity earlier than normal children. If this were true the curves of mental development for the average and for the deficient children should not be expected to retain their same relative positions after the idiots had begun to show arrested development. Moreover, unless this arrest were compensated by some peculiar form of accelerated growth among those above normal ability, we might expect that the distributions of ability would change in form at the various ages after arrest had begun. A relative increase in the distance of older deficients from the average as compared with younger deficients may be interpreted as meaning either the earlier cessation of growth of the deficients or a change in the relative rates of growth of individuals of different mental capacity. When fully considered the present evidence from the Binet tests fails, I believe, to demonstrate the earlier arrest of Goddard has reported tests upon the same group of 346 inmates in an institution for the feeble-minded who were tested three years in succession (117). The paper suggests that the idiots, as a group increased less in absolute ability than those of higher mental age. The average gain for 55 idiots who tested I or II mentally was about half a test in the two years. In order to reach our present problem, however, we must know that the idiots, for example, developed relatively less mentally than did those of the higher grades of ability in the imbecile and moron groups of the same life-ages. This question cannot be answered from the paper. It probably cannot be adequately answered from mental age results on account of the irregularity in the value of the year units at different points on the Binet scales. Bobertag summarizes Chotzen's data obtained by the examination of the children in the Breslau Hilfsschulen with the Binet scale. He believes that the position on an objective scale attained by the average of these retarded children is progressively lower with advancing age relative to the average position attained by normal children, assuming that the quotient for normal children remained constant at each age. The average intelligence quotients of all the children in the special schools (exclusive of those testing III or less) was 0.79 for those 8 years of age, 0.72 for those 9 years, 0.70 at 10, and 0.67 at 11-12 (81, p. 534). Stern also compiled a table from Chotzen's results which shows this decrease in intelligence quotients with life-age separately for each group of those whom Chotzen TABLE XX. Average Intelligence Quotients of Children of Different Ability. (From Chotzen's Tables X & XI.)
The Jaederholm data with his form of the Binet scale, as treated by Pearson, shows a straight regression line for the backward children which falls below the normal development line on the average four months of mental age for each additional year of life from 7-14 (167). Accepting Pearson's interpretation that a year of excess or deficiency and a year of growth is a constant unit, we find that the deficient group from special classes was falling continually behind the normals with increase of age a relatively greater distance from any rational reference point. Pearson accounts for this change in the distance between the The best evidence as to the relative positions of the curves for deficients and those for average ability would be provided by using psychological tests that could be adequately scored in terms of equal physical units for the same task. The position of various lower percentiles relative to the average or to an assumed reference point could then be compared on the same objective scale. I have reviewed studies of this type in discussing skewed distributions in Chap. XIII, A, c. I there reached the conclusion that the weight of the evidence was that the distributions were slightly skewed in the direction of deficiency, although the evidence was not conclusive. We are now raising the further question whether this skewness increases with age. On account of the difficulty of determining the points for zero ability in terms of the physical scales used, let us see what conclusion might be reached if we calculated the relative distance of median and low ability of equivalent degree from the scores of the same higher degree of ability assumed as a reference point at the various ages. There seems to be no reason in the theory of measurement why the highest score instead of the lowest score in random samples might not be used for a reference point for comparing the distances between normal and deficient children at different ages. Instead of using the highest single score, it would be better to use the upper quartile or quintile since it would be less affected by a chance error in giving the test. Fig. 9. Relative Positions at Each Age of the Median and of Corresponding Bright and Retarded Children with the Form Board Test. So far as physical growth is concerned Baldwin (74, 75) has shown with repeated annual measurements on the same group of children that the period of adolescent acceleration shifts from 12½ years for the tallest boy to 16 years for the shortest boy. For the tallest girl the maximum height was attained at 14½, for the shortest at 17 years, 3 months. Maturity may be reached at 11 years by a tall well nourished girl, while with a short girl light Doll presents evidence from the physical measurements of a large feeble-minded group in institutions which he suggests shows that the shorter among them cease growing earlier. When the height of these feeble-minded is measured in relation to the Smedley percentiles of the height of normal children of their corresponding ages, he finds a correlation of -.20 between age and percentiles of height, the taller relative to normals being younger. He says: “This confirms Goddard's similar conclusion, but negatives for the feeble-minded at least, the theory affirmed by some writers, that children who grow at a retarded rate continue their growth to a later age” (98 p. 51). On the contrary this minus correlation is more likely to mean only that the Smedley norms on school children are too high for the older ages because of the excess of taller children who remain for the high school work. This would give the minus correlation without supposing that the taller individuals continue their growth to a later age, as he thinks. Moreover, a total longer period of physical growth for smaller, less normal, children has been demonstrated. Boas (80) says: “Among the poor the period of diminishing growth which precedes adolescence is lengthened and the acceleration of adolescence sets in later; therefore, the A complicated situation is presented when we come to represent graphically the effect on the distributions of these differences in growth among those of different intellectual capacity. In the hypothetical diagrams, Fig. 5, it is shown how arrest of development might be presented graphically in relation to the distribution curves, ability being measured on the same physical scale. The earlier acceleration and earlier maturity of those of better ability are indicated. The distributions are shown as skewed at all ages after birth. Equivalent units of mental development at different ages can be found only in corresponding percentages of the groups, not in the units of the deviation or in development quotients relative to the averages at different ages. In other words the lowest 0.5% continues to be an equivalent unit while -3 S. D. measures different portions of the group and different portions of the distance from lowest to highest ability. Corresponding percentages retain one common significance, namely, that the same proportion of the group is ahead in the struggle for survival, regardless of the form of the distribution. It is hoped that the discussion of the statistical problems connected with the quantitative study of mental development has given more meaning to the different attempts to devise scales for measuring mental ability. It should be Until we have a scale of equal objective units for mental ability, it is not possible to obtain a measure of relative development which shall take into account the amount of relative change. We must be content to measure the change in percentile rank (changes in serial position) of an individual relative to those of his own age. Having clarified our conceptions of mental development and brought them into harmony with certain suppositions regarding the distribution of ability and its change from year to year, we are in a better position to evaluate in the following chapter the different objective methods of defining the borderline of feeble-mindedness. CHAPTER XIV. QUANTITATIVE DEFINITIONS OF THE BORDERLINEOn the basis of the detailed conception of the developmental curves and distributions of ability at different ages, which we have been considering, we can now compare the percentage method with other quantitative methods of describing the borderline on developmental test scales. A. Different Forms of Quantitative DefinitionsThe earliest form of the quantitative description of the borderline on a scale of tests, was in terms of a fixed unit of years of retardation. This was taken over apparently from the rough method of selecting school children to be examined for segregation in special classes by choosing those who were two or three grades behind the common position for children of their ages. As this amount of school retardation was greater for older children, an additional year of retardation was required after the child had reached 9 years of age. I believe that nobody would seriously defend a practice of making an abrupt turning point of this kind, except on grounds of practical convenience. The theory of stating the borderline in terms of a fixed absolute unit of retardation is so crude that it has now been generally superseded by methods which make the amount of retardation a function of the age. In order to relate the definition to the age of the child, at least during the period of growth, Stern suggested the “intelligence quotient,” consisting of the tested age divided by the life-age (188). This has been adopted by Kuhlmann with his revision of the Binet scale (139) and by Terman with the new Stanford scale (197). With the The suggestion of defining the borderline of tested deficiency in terms of a multiple of the standard deviation of ability of children who are efficient in school was made by Pearson in 1914. Tested inefficients did not with him include all inefficients, as he recognized other sources of deficiency. He had previously suggested a scale of mental ability in units called “mentaces”, 100 of which were equivalent to a unit of the standard deviation of all ability assumed to be normally distributed. On this scale of mentaces the imbeciles were 300 mentaces or more below average ability and would be expected to occur once among 1000 individuals chosen at random. Very dull, including some mentally defective individuals, were also to be found from 208 to 300 mentaces below the average (166, p. 109). Defining the borderline in terms of the deviation The following quotation from Pearson will make the method of stating the borderline in terms of a multiple of the deviation clearer: “Now the question is, what we mean by a 'special or differentiated race': I should define it to mean that we could not obtain it by any selection from the large mass of the normal material. Now in the case of the mentally defective, we could easily obtain children of their height, weight, and temperature among the normals. We could, out of 50,000 normal children, obtain children practically with the same powers of perception and memory as the feeble-minded, as judged by Norsworthy's data. But not out of 50,000, nor out of 100,000 normal children, could we obtain children with the same defect of intelligence as some 50% of the feeble-minded children. In other words, when the deviation of a so-called feeble-minded child from the average intelligence of a normal-minded child is six times the quartile or probable deviation of the group of normal children of the same age, it falls practically outside the risk of being an extreme variation of the normal population. Now six times the quartile variation is almost exactly four times the standard deviation or the variability in intelligence of the normal child, and in the next material I am going to discuss [Jaederholm's], we have shown that the standard deviation in intelligence of the normal child is just about one year of mental growth” (164, p. 35). With the Jaederholm data obtained in testing children in the regular and in the special classes in Stockholm by a modified form of the Binet scale, Pearson found that a year of excess or defect in intelligence was practically a uniform unit from 7 to 12 years of age and was about equivalent to the standard deviation of normal children measured in these year units. He, therefore, uses a year unit and the standard deviation as interchangeable for these data. He does not, however, always make it clear The quotation from Pearson, which we have given above, indicates that he would determine the borderline on the scale by the standard deviation of 'normal' children. In his case he actually used children who were efficient in school, as contrasted with those in special classes. On the other hand, he argues at length that all mental ability, including that of the social inefficients, is distributed in the form of the normal curve (167). Under this assumption it is, therefore, little theoretical change in his position to suppose that the borderline might be described in terms of the standard deviation of a random sample of the population. Defining the borderline in terms of a multiple of the deviation of a random sample at each age thus becomes directly comparable with the other forms of the quantitative definition, supposing that all refer to conditions to be found in a completely random sample. It is in this sense that I shall refer to the method of defining the borderline in terms of a multiple of the deviation. The percentage method of defining the borderline seems to have been the spontaneous natural working out of the problem in the minds of several investigators. At the same time that I suggested this method in a paper before the American Psychological Association (151) Pintner and Paterson had prepared a paper suggesting a percentage definition of feeble-mindedness (44) and Terman had worked As a point of detail it is to be remembered that in translating percentages into terms of the deviation, the size of the group for which the percentages are determined is important if the groups are small, since the same percentage lies above slightly different multiples of the standard deviation with different sized groups. On this point the reader may see a paper by Cajori and the references cited there (86). B. Common Characteristics of Quantitative DefinitionsIn distinction from qualitative methods of describing the mentally deficient, all quantitative definitions assume that those of deficient mentality do not represent a different species of mind; but that they are only the extreme representatives of a condition of mental ability which grades up gradually to medium ability. The deficient are not an anomalous group such as we find with some mental diseases. Except for the comparatively rare cases of traumatic or febrile origin, the deficient individual is a healthy individual so far as his nervous system is concerned, even though his capacity for brain activity is below that of those who socially survive. They are not as a group abnormal in the sense of diseased, but only unusual in the sense of being extreme variations from medium ability in a distribution which is uninterrupted in continuity. This distinction has been fully discussed by Goring in his work on The English Convict, which those who are interested in a full mathematical discussion of the significance of mental deficiency are urged to read. None of those who advocate quantitative definitions would contend, I believe, as some of their opponents seem to think, that such definitions afford a final diagnosis for particular cases. In attempting to place the borderlines on a scale of tests, this is always done with the clear recognition that such borders are only symptomatic of deficiency. The diagnosis of “social inefficiency,” to use Pearson's term, rests upon many facts among which the test result is only one, albeit the most important. Other characteristics which each of the above quantitative definitions, except that of a constant absolute amount of deficiency, have in common, or might easily have if they were stated in their best forms, include the possibility of adaptation to any developmental scale, the suggestion of borderlines for both the mature and immature, the distinction of a group which might be regarded as presumably deficient from one that was of better but doubtful ability and of this from a still better group which was presumably socially efficient. Perhaps the most curious and important thing about these definitions is that they are all substantially identical, except in their terminology so long as general mental capacity is found to distribute in the form of the normal probability curve and to extend to absolute zero ability If the distributions do not extend to the same zero points of lowest ability on an objective scale (see Fig. 5), the ratio is clearly at a disadvantage compared with either of the other methods, since it assumes that the same percentage of average ability is an equivalent measure. This does not hold when the lowest ability at different ages is not at the same point on the scale of objective units. For example, .7 of an average 100 units above 0 is not equivalent to .7 of an average 150 points above a zero ability of 30 points on the objective scale. The idea of regarding percentages of averages as equivalent is therefore generally avoided in mental measurement. In case the position of the absolute zero points of ability may be different, the distance from the average should be stated in terms of the deviation. In this respect the method of the deviation or the lowest percentage are equally good so long as the form of distribution does not change. 1. With the percentages fixed at the lowest 0.5% as presumably deficient and the next 1.0% doubtful, these borderlines for tested deficiency have the advantage of being more conservative than those at present advocated. On the basis of our empirical knowledge this is an important reason for urging borderlines on the scales at least as low as those suggested herein. Disregarding the extremely high borderlines which have fallen into disuse, we still find that social deficiency is often presumed for those testing above the lowest 1%. With the new Stanford scale, Terman presumes “definite feeble-mindedness” below an Intelligence Quotient of .70, below which he finds that 1% of 1000 unselected children fell. I Q's from .70 to .80 would include his uncertain group, which he describes as “border-line deficiency, sometimes classified as dullness, often as feeble-mindedness” (57, p. 79). His tables show 5% below an I Q of .78. We have no results with a random group of adults by which to judge how many would be below these borders. When the I Q has been applied to scores with other scales a larger percentage has often been found to be excluded. Fernald has shown that Haines' suggestion of a coefficient of .75 with the Point scale would exclude 16% of 100 Cincinnati girls selected at random from among those who left school at 14 years to go to work (16). Unless the examiner wishes to assume that social inefficiency is more frequent than it has been demonstrated by the practical tests of life, the success of those who have low quotients should make him exceedingly cautious about accepting the various borderlines which have been suggested by those who have not tested their criteria by the percentage method. It is not merely that the borderlines should be lowered, but that they should be lowered With the Point scale Yerkes and Wood say regarding “the coefficient of intelligence .70, which we accept as the upper limit of intellectual inadequacy or inferiority”: “Our data indicate that grades of intellectual ability measured by the coefficient .70 or less are socially burdensome, ineffective, and usually a menace to racial welfare” (226). With the most reliable part of their data, that for children from 8-13, this coefficient excludes the lowest 8.39%. Moreover, the lowest group for which they suggest a borderline, the dependents, falls at .50 or below and includes 1.05%. 2. A second practical advantage of the percentage borderlines on the scale is that they make no assumption as to the uniformity of the norms for the different ages. Except for the Stanford and the Jaederholm scales, there is little evidence that the age norms exclude equivalent portions of the children at the different life ages. Goddard's Table I gives the data from which the following percentages of those who pass the norm are calculated, not counting those above 11 years, since the older groups are clearly affected by selection:—5 yrs., 88%; 6 yrs., 79%; 7 yrs., 81%; 8 yrs., 51%; 9 yrs., 60%; 10 yrs., 73%; 11 yrs., 44%. Kuhlmann's figures when using his own revised scale with public school children including the seventh grade, are:—6 yrs., 100%; 7 yrs., 95%; 8 yrs., 90%; 9 yrs., 87%; 10 yrs., 81%; 11 yrs., 80%; 12 yrs., 57%. It is clear that any change in the test norm from age to age must disturb the quotient which is based on these norms, although it would not affect the intelligence coefficient with the Point scale. With widely varying norms of the other scales, the I Q borderlines show much greater variation. In a recent review of the evidence, including Descoudres' report (96) on retesting the same children for several years Stern recognizes that an I Q index is not constant after 12 years (187). Doll records decided changes in quotients for the same individual at different ages (99). So far as the 1908 scale is concerned, using Goddard's data, our Table V shows that at five years of age the lowest 1.8% would fall at or below a quotient of .40, at eight years the lowest 1.9% would show a quotient of .62 or less, and at 15 years the lowest 2.8% fall below a quotient of .75. The rough tentative approximation of scale limits which I have suggested for the lowest 1.5% shows that a series of quotients for children from 5 to 15 years of age would be below .75 at every age and below .65 for half of these ages. For the presumably deficient group the quotients would be still lower in order to be as conservative as the borderlines that I have suggested with the Binet scale as at present standardized. With the coefficient of intelligence and the Point scale, the Yerkes and Wood data show that their borderline of .70 excluded 13% of 196 children 8 and 9 years of age, The data at present available thus indicate that we should not expect to find the same ratio at different ages excluding similar percentages. If the ratios have a value for comparing individuals of different ages, they seem to fluctuate so decidedly from age to age that they can hardly be trusted for stating the borderlines of deficiency without empirical confirmation for each age. Pearson found that the children of the older ages in the special classes were more and more deficient, measured in terms of the standard deviation of the normal group. This shift on the average was four months of mental age downward for each year of life during the period 7-14 which he studied. It makes uncertain the definition of the borderline in terms of a constant multiple of the deviation or of a constant quotient, unless this shift is shown to be due to imperfections of the tests which can be corrected, or to changes in the selection of the tested groups at advanced ages. Pearson's suggestion of -4 S. D. as a borderline with the Jaederholm data gives some very curious results with the group of children in the special schools at Stockholm. Under his interpretation at life-ages 8-11 from 0 to 5.2% of the pupils in these classes would be regarded as deficient, while for life-ages 12-14, 15.2% to 44.4% are beyond -4 S. D. In passing it is to be noted that if one accepted Pearson's suggestion that the borderline should be fixed at -4 S. D., in case the distribution of mental capacity were strictly normal, only four children in 100,000 would be found deficient, according to the probability tables. With the method of the standard deviation it would be necessary either to show that the deviation was constant 4. All the quotient methods of defining the borderline encounter a serious practical difficulty in fixing the borderline for the mature, so that it will be equivalent to that for the immature. With the Stanford scale in calculating the quotient for adults, no divisor is used over 16 years. Yerkes and Bridges also think that this is about the time that the development of capacity ceases. Kuhlmann and others use 15 as the highest divisor. Wallin objects to either of these ages being used as the age of arrest of mental development (15, p. 67). Both the methods of the standard deviation and percentage have a similar difficulty, in that the borderline for the mature has to be empirically determined on a test scale. In this dilemma, however, the data collected with the random group of 15-year-olds in Minneapolis and published in the present study, places the borderline for the mature on either the 1908 or 1911 Binet scale in a much safer position, so far as empirical data is concerned, than the borderline for the mature for any other scale. This is true whether that borderline be then stated in terms of either the quotient or percentage methods. Translated into terms of the quotient, our percentage Unfortunately, the borderlines of the mature for the Stanford and other scales depend upon empirical results obtained not with random groups, but upon a composite of selected groups of adults built up by the investigator on an estimate that this combined group represents a random selection among those with a typical advance in development, an almost superhuman task. Fortunately the empirical determination of this borderline for the mature might be improved later by obtaining data on less selected groups. The clearer significance of the empirical data for the borderline for the mature which I have presented for the Binet 1908 and 1911 scales from a random group of 15-year-olds seems to be an important practical advantage. It provides an empirical basis for judging the implication of test results with adults. It gives adults the benefit of the doubt if they improve after 15 years of age. 5. Compared as to their popular significance, there is no doubt that the lowest 0.5% of the individuals of a particular age has very much more significance to those not familiar with detailed statistical practise than a coefficient or a multiple of the standard deviation. A statement that an adult has only the tested ability of a child of 7 years is certainly much more impressive than his score in other D. Theoretical Advantage of the Percentage Method with Changes in the Form of the DistributionsWith our present series of tests, the percentage method will best provide a concept of the equivalence of the borderlines at different ages provided the form of the distribution does not remain uniform. I discussed this question briefly in connection with units of measurement. In considering curves of development, I assembled some of the evidence which makes the assumption of normal distribution or even of a constant skewness at least uncertain. In my opinion the weight of the evidence is against the hypothesis that the distributions retain a constant form during the period of development. If this were clearly demonstrated, both the ratio methods and deviation would fail to express equivalent borderlines for the different ages with the Binet scales. A fixed multiple of the standard deviation or a fixed quotient would exclude different percentages of the population at each age when the skewness varied. By reference to Figures 3 and 5, it can be seen that, if our physical units in which we expressed the measurement were uniform and ability always extended to the same absolute zero point, it is true that .01 of the physical units reached by the best at each age would be the same relative amount of ability of the best at each age, stated in physical units, regardless of the form of the distributions. Such a concept, however, has an unknown biological or social significance so far as I can see, except for a constant form of distribution. The same relative physical score compared with the highest at each age, theoretically The recent rapid perfection of objective scales to measure educational products, like ability in handwriting, etc., in equal units running to an absolute zero of ability, suggests that it might be possible ultimately to state the borderline of deficiency in terms of the same relative objective distance between the best and zero ability at each age on a scale of general ability. This ideal could be approached, for example, with the Sylvester form-board test in which the units are seconds required to complete the same task, if we could agree upon a maximum number of seconds without success which should mean no ability, and if this zero should remain the same at each age. It would only be necessary to take, for example, the best position or the median or the upper quartile at each age as the other point of reference. We could then say that a borderline in physical units was always, for example, .01 of the median record at each age above zero. Such a method would provide relatively equal objective borderlines at each age To demonstrate its worth, however, this method of defining the borderline in terms of the same proportion of the physical difference between zero and the median at each age, would also have to provide a better prediction of ultimate social failure. It would have to be shown that individuals below the relative objective borderline at maturity were below the same relative objective borderline during immaturity. Moreover, it would have to be shown that this relationship was closer than it would be with percentile records. It is a form of this relative objective measurement which Otis advocates in his “absolute intelligence quotient,” which he proposes as logically the best measure of ability. It consists of the ratio of the score of the individual measured in equal absolute units of intelligence, divided by his age (163). While a relative objective borderline might under certain circumstances afford a better criterion than the same lowest percentage of individuals, there are two very serious practical difficulties which at present make it impossible. In the first place, with the exception of a few motor tests, there are no test results with children of different ages measured in terms of equal objective units for the same task. Even if the Binet year units are equal, as applied to the same task, there is no accurate means of dividing the year units into smaller physical units on the basis of scores with the tests. This makes the use of the Binet scale impossible and we should be forced back upon The second practical difficulty which at present makes a relative objective borderline impossible is that we know nothing as to the prediction of social failure and success from relative positions on the objective scale used even with the few isolated tests that might be made available. Until we have data on this question, as well as scales of tests for native ability that are measurable to zero ability in objective terms, the percentage method affords the only available way of stating equivalent borderlines when the form of distribution changes. If the age of arrest of development shifts either earlier or later with different degrees of capacity, then there seems to be no logical escape from a change in the form of distribution. Stern recognized this when he concluded that idiots reach an arrest of development earlier than those better endowed, so he stated that his quotient would not hold for them. He said: “The feeble-minded child, it must be remembered, not only has a slower rate of development than the normal child, but also reaches a stage of arrest at an age when the normal child's intelligence is still pushing forward in its development. At this time, then, the cleft between the two will be markedly widened. “From this consideration it follows that the mental quotient can hold good as an index of feeble-mindedness only during that period when the development of the feeble-minded individual is still in progress. It is for this reason Perhaps the most interesting characteristic of the percentage method is that it automatically adjusts itself to any form of distribution. In case the distributions of ability turn out to be normal for each age and the arrests of development for different degrees of ability distribute alike, then the borderline fixed by the percentage method becomes identical with the corresponding borderlines by the quotient, deviation, or relative objective distance. It can be directly translated into a quotient or a multiple of the standard deviation. This fact affords a good check upon the empirical borderlines fixed by the percentage method for different ages. If the distribution is normal, the lowest 1.5% and 0.5% would be identical with -2.17 S. D. and -2.575 S. D. in samples of 10,000 cases. We may check these percentage borderlines by Goddard's results for ages 5-11 tested with the 1908 Binet scale. I have given the standard deviation for the ages 5-11 with this data in Chap. XIII a, 2. Applying the criterion of 2.575 S. D. to these deviations, we find that to be in the lowest 0.5%, if the distribution were normal, would be about a year less of deficiency than we have suggested, while Pearson's borderline of -4 S. D. would be close to that we suggest. The empirical data thus suggest that the assumption of a normal distribution is faulty at the borderline or else Goddard's data is incorrect for fixing the limits on the scales. I have already given the evidence for supposing that the distribution is skewed during the years of growth. When approximately random samples are not available, a multiple of the deviation of an efficient group such as Recalling the practical advantages of the percentage method which we enumerated in the preceding section, we can now better understand the value of a method that is not disturbed by the form of distribution of mental capacity which may ultimately be found to prevail at different ages. It is safer at present to assume that the distributions do change enough in form at the lower end seriously to affect the borderlines of deficiency as defined by other methods. If, however, the form of distribution remains uniform, it would first be necessary for those advocating the use of any of the other quantitative definitions to show that the units of their scales are equal under some reasonable hypothesis. A ratio or a deviation statable only in scale units which are not demonstrably equal is a hazard, with the chances badly weighted against its reliability. So far as both the Binet and the Point scales are concerned we have found that the units are not equal. A quotient or coefficient arrived at by assuming their equality is sure to mean seriously erroneous fluctuations in the borderlines. Referring to the percentage method, Yerkes and Wood say: “Frequency of occurrence is unquestionably a useful datum, which should be presented, if not instead of, then in addition to, certain other statistical indices which possess greater scientific value” (226). These other indices require both equal scale units and uniform distributions from age to age. The ratio and deviation methods This leaves us in the unfortunate situation that the borderline positions on the scale will have to be stated separately for each age and will have to be found empirically. Moreover, we shall need to determine more accurately in what lowest percentage an individual must test in order reasonably to predict that he will require social care for the good of himself and society. As soon as anybody can discover a means of defining the borderline, which is equally accurate and significant, and which, in addition to counting the proportion of better individuals to be met in the competition of life, will also evaluate the distance they are above the borderline, we all shall be eager to accept this better criterion of deficiency. A form which it might take is that of relative objective distance between zero and median ability. If measurable in equal objective units, this would be independent of the form of distribution and would improve the quantitative description of equivalent deficiency, provided that it also forecasted future social failure as well as the percentage method. What form of stating the borderline of tested deficiency may ultimately meet with approval, a verbal definition of feeble-mindedness will never remain an ideal scientific statement until it finds expression in quantitative terms. |