PART TWO THEORETICAL CONSIDERATIONS

Previous

CHAPTER XIII. THE THEORY OF THE MEASUREMENT OF MENTAL DEVELOPMENT

In defining the borderline of feeble-mindedness it will be found that certain assumptions are usually tacitly made as to the form of the curves of normal and retarded development. These assumptions which are often based on vague conceptions of mental measurements should be brought clearly to mind if we are to compare the relative merits of different scales of mental tests or different ways of stating the borderlines of deficiency. With this in view it is proposed to take up in this second part of the monograph a brief technical discussion of the units of mental measurement, the equivalent individual differences at different ages, and the curves of mental development. The bearing of these conceptions on the various quantitative definitions of tested deficiency, including the percentage definition, will then be discussed in the following chapter. Practical advice as to individual diagnosis or group comparisons has been confined to Part One, so that those who are not concerned with the theoretical assumptions on which the conception of mental development and the interpretations of tested deficiency are based should omit Part Two.

Fig. 3. Hypothetical Development Curves (Normal Distribution)

When we try to picture to ourselves the significance of individual differences and mental development we are at once forced to think in terms of graphs showing the distribution of abilities at particular periods of life and the changes from one life-age to another. To simplify the discussion I have presented in Fig. 3 the graphic picture of the conditions on the simplest hypothesis, namely, that mental capacity at each age is distributed in the form of the normal probability curve extending to zero ability and that individuals retain their same relative capacity on the scale of objective units.

A. Comparison of Units and Scales for Measuring Individual Differences.

(a) Equivalent Units Of Ability When The Distributions Are Normal.

In considering the curves of development it is desirable first to notice the differences between measurement in equal physical units and measurement in equivalent units of ability or of development. The difference in the point of view of the two forms of measurement is so pronounced that I can hardly hope to make myself clear to those who are not somewhat familiar with such terms as “distribution curves,” “frequency surfaces,” “standard deviation,” and other phrases connected with the theory of probability, which are treated at length in such books as Thorndike's “Mental and Social Measurements” and Yule's “Introduction to the Theory of Statistics.” We often, by mistake, regard the growth of an inch in height, for example, as always representing an equivalent unit of growth. This will lead us into rather serious misconceptions unless we are careful, for it is perfectly evident that the growth of an inch in height has a very different significance for the three-year-old boy than for the eight-year-old. Half of the three-year-old boys grow about 3 inches during a year while at eight years of age not more than about one in seven grow that much. Moreover it is not always satisfactory to regard the same relative increase in physical size as an equivalent unit of development. To say that a boy 20 inches tall who grows 1-10 in height shows an increase in development equivalent to a boy of 50 inches who grows one-tenth, may be quite misleading. Nearly every 20-inch child grows one-tenth in height in a year while not one in fourteen of the boys who are 50 inches in height may grow at that physical rate. In considering human traits, and especially developmental traits, it would seem to conduce to more significant thought if we gave up at times our habit of thinking in terms of equal or relative physical units and thought instead in terms of more equivalent biological units.

In the measurement of mental ability, moreover, it is exceedingly difficult to utilize equal physical units. Most of the objective units which are commonly called alike are clearly not equal even in the physical sense. “Spelling one word,” for example, is not equal to spelling another “one word;” but only equal to spelling the same word. Out of such units of amount accomplished, it is, of course, not possible to build a satisfactory scale without referring to some other concepts of measurement. Some tests, however, are scored in equal units. When the measurements for example, are in the units of time it takes to perform the same task under the same outward conditions we have the possibility of a scale of equal objective units. Such a scale is approached by the results with the form board test which give the number of seconds it takes children to place blocks of different shapes in their proper openings.

Even the unit of time may be deceptive in name, as it is with the Binet scale. A year of time is, of course, the same physical unit and the task proposed with the Binet scale is always the same, but the other essential with this scale, the children of each age who pass the tests at each age norm, varies decidedly. “Test-age five,” for example, means 44% of the children pass and “test-age eleven” means 88% pass, even with approximately random samples of children of these life-ages. This question of the equality of the Binet age units will have to be considered further, therefore, in connection with the other concept of equivalence used in psychology.

In order to determine equivalent units of activity we find that a number of different concepts have been utilized. With some of the scales for measuring educational products, such as Thorndike's Scale for Handwriting, equal units of merit in handwriting mean differences judged equal by relatively the same proportion of competent judges. This form of unit has not been used, however, in any scale of mental development thus far proposed.

In the measurement of mental ability the most commonly accepted idea of equivalent units is that they are provided by the units of standard deviation for a series of measurements which distribute in the normal form. The meaning of these units may be understood by referring to Fig. 3 which shows Gaussian or normal distributions of abilities of individuals at various periods of life in curves A, B, C, D and E. The straight lines of the measurement scales form the bases of these distribution curves. These graphs represent the normal form of distribution usually expected when any fundamental ability is measured in a random group. If the number of cases at each unit of measurement are plotted by a point placed relatively as far above the scale, used as a base line, as the number of cases found at that unit of the scale, it will be discovered that these points arrange themselves in the form of a symmetrical curve high at the middle and flaring out along the base-line scale. This bell-shaped curve, known as a normal probability curve, shows that the largest number of cases occurs at the middle or average measurement. From this middle point on the scale the number of cases falls off gradually and symmetrically in both directions. Distances along the base line of this distribution surface may then be measured in terms of the standard deviation regarded as unity. This S. D. is the best measure of the scatter of the deviations. It is the square root of the average of the squares of the deviations of the separate measurements from the average of all the measurements. There are approximately four units of the standard deviation between the average and either extreme when the distribution is normal, as in Fig. 3. Only six cases in one hundred thousand fall outside these limits.

The studies of biological traits suggest that a unit of the standard deviation is the most important measure we have for equivalent degrees of any trait which distributes normally. It measures the same portion of the total distance from the lowest to the highest ability on any objective scale so long as the distribution of measurements is in the normal form. It thus affords the best interchangeable unit from measurements at one life-age to those at another, provided that the distributions keep close to the form of the normal probability curve. This is the assumption on which practically all the developmental scales have been based. The difference in ability between an individual at the average and at -1 S. D. (standard deviation) below the average is equivalent to that between the last individual and one at -2 S. D. The same distances along the base line of different distribution surfaces measured in terms of their respective deviations set off equivalent portions at each age so long as the distributions are normal. For example individuals measuring between -2 and -3 S. D. in any distribution in Fig. 3 are equivalent in ability to those lying between -2 and -3 S. D. in any other of these normal distribution surfaces. Later we shall consider equivalent units when the form of the distribution of ability is not normal or is unknown.

We may now compare the relations of the units in the physical scale, shown at the left of the figure, to units of the scales for adults or for the immature of any age, expressed in units of the standard deviation from the averages of these groups. Relative ability measured on the physical scale or any one of the distribution scales in Fig. 3 will be found identical since they all start from the same zero point and the distributions are all normal. But the ability of an individual in one distribution can hardly be compared with that of an individual in another distribution in a biologically significant way by their actual positions on the physical scale. A physical unit, does not measure the same sort of fact of development in a scale for the immature that it measures in the scale for adults or that it measures in another dynamic scale for the immature. This can be seen when a physical unit is compared with the amount of standard deviation which it measures in the different scales. Moreover, the correspondence of relative distances on the physical scale and any one of these other scales will not hold the moment the distributions do not start from the same point or are unsymmetrical.

It does not seem seriously wrong to suppose that there are some individuals at any age who have no more mental ability than the baby of the poorest mental ability at birth. At any rate our intelligence scales are hardly fine enough to measure the difference in intellectual capacity between the dullest adult idiots and the dullest idiot babies. We shall, therefore, here assume that mental capacity extends to zero at each age. The importance of this will be evident when we consider the question whether the distributions of ability are symmetrical around the average point at each age. Postponing for the present the discussion of unsymmetrical or skewed distributions, we may consider the several meanings of stages of development.

In applying the concept of the probability curve we should distinguish between individuals who have attained their mature mental capacity and those who are still maturing. The former would be represented by a random group of adults (Distribution E, Fig. 3) the latter by a group of nine-year-olds (Distribution C). If we say, for example, that a child has reached a certain stage of development we might have in mind the final distribution of mature capacity or the distribution of capacity among those of his particular age or of all ages. When we compare stages of development we must, therefore, be careful to indicate the distribution surface to which we are referring.

An increase in development may refer to at least five different things depending upon the scale of measurement to which reference is made. Besides an increase measured by the physical scale, the scales for adults, for the immature or for all ages, to which we have already referred, it may mean an increase judged by the distribution of increases which individuals of the same life-age and capacity make in the same period of time. This last meaning may be the most significant, although it has never been used. It has reference to a distribution surface of increases such as is represented in Distribution F, Fig. 3. This is intended to show the increases in one year of all two-year-old children who had average ability at 2 years, on the assumption that at 3 years these children would on the average equal the average of all three-year-olds. It is clear that when these increases are measured in objective units the latter have a still different significance from that assigned to them in connection with other scales. An increase of one objective unit here might represent twice the standard deviation, while it only represents 0.2 of the standard deviation in another distribution.

(b) The Year Unit Of The Binet Scale.

A sharp disagreement of opinion as to whether the Binet year units can be regarded equivalent has arisen between Karl Pearson, Director of the Galton Laboratory in London, and certain psychologists who have used the Binet scale. Cyril Burt, for example, says, as quoted by Pearson:

“Except for rough and popular purposes, any measurement of mental capacity in terms of age is unsatisfactory.... The unit fluctuates in its significance all along the scale. When the child is just beginning to walk and talk, when he is 7 or 8, when he is 10 to 11, when he is on the verge of puberty—at these different periods a retardation of a single year means very different things” (164, p. 36).

A number of good psychologists including Yerkes, Terman, and Kuhlmann, agree with Burt in maintaining that a year of retardation at different ages has very different significance.

With this statement of Burt, Pearson takes issue, saying:

“Can the psychologist to the London County Council ever have seen the growth curves of children, or would he write thus?... There is no valid reason to suppose that a year's growth in mental power may not be taken for all practical purposes to mean the same unit for ages of 6 to 15, the period for which Binet and Jaederholm have used the tests” (164, p. 44).

Like many other apparently opposite statements both contain truth. The conflict arises apparently, first from a disagreement between the data obtained with the Jaederholm form of the scale, on which Pearson bases his statement, and data obtained with other forms of the scale; second, from a discrepancy in the points of view. Pearson stresses the fact that the mental year-marks equal average growth increment with the Jaederholm scale (167). He shows that the regression of years of mental excess (or deficiency) on increase of life-age is a straight line, just as he found it with physical measurements. Moreover, the standard deviation of the mental measurements for the entire group of normal school children, 6-14 years of age, was found to be about one year of mental age (.96 year for the corrected data) (167). To which Pearson's opponents might reply, these facts are of comparatively little significance unless the deviations for the separate ages are alike in terms of these year units on the scale. Neither linear regression nor the balancing of years of excess by years of deficiency at each age indicates that the deviations of the separate ages are alike in terms of the year units. The new Stanford scale, for example, shows both of these conditions and yet the range of months of life-ages which sets off the middle 50% of the children of the different tested ages increased decidedly from 6 to 14 years of age. The middle half of the tested ages, for example, at age VI on the scale include a randomly selected group of six-year-old children whose range of life-age is ten months, at age VIII on the scale this range is 13.4 months, at X it is 16 months, at XII, 20 months, and at XIV, 26 months. “The number of 6-year-old children testing 'at age' is approximately twice as great as the number of 12-year-olds testing at age, and 50% greater than in the case of the 9-year-olds” (196, p. 557).

To this argument Pearson might reply that he had not overlooked the question of variation in the deviations from one age to the next for he has a footnote in which he states regarding the Jaederholm data: “There are, however, relatively little differences in these mental age standard-deviations of the normal children beyond what we may attribute to the effect of random sampling” (164, p. 46). In this respect, then, the Jaederholm data differ notably from Terman's data obtained with random groups with the Stanford scale and, as I shall show, from data obtained by Goddard with the 1908 Binet scale, the two largest groups of Binet test data which have been collected. Even with the Jaederholm data on efficient school children, although the largest difference between the standard deviations of different age groups is only about twice its probable error, it is notable that 24 of his 39 7-year-olds are included within an interval of the middle year of tested age, while only 9 of his 35 11-year-olds are included within the same middle year interval.

Taking Goddard's data for the 1908 scale for the separate ages from 5-11 at which probably the factor of selection for his groups may be neglected, I have calculated the standard deviations from his Table I and find them as follows:

Life-Ages
5 6 7 8 9 10 11
Standard deviations in Mental Excess or Deficiency 1.10 .98 .93 .99 1.04 1.23 1.19

The differences between the deviations for ages 7 and 11 or between ages 8 and 10, are more than three times their standard errors, so that we would not be justified in assuming that the standard deviations of the separate ages measured in terms of years of excess are equivalent. There seems to be a tendency for the deviations to increase, at least from age 7 to 10 and 11.

The comparison of the year units on the Binet scale with the diagrams in Fig. 3 shows that if the scale at each life-age shut out the same lowest proportion, say half, of the children of that age, then the year units might be regarded as equal in the sense of equal average growth increments, as Pearson suggests. A child 7 years of age testing VII would be at least one annual average-growth unit higher in mental development than one of 6 years testing VI, and so with each age until the limit of development had been reached. This is the condition approximated closely for children by the new Stanford scale and the corrected Jaederholm data. Since there is little prospect, however, even with a scale perfected so far as its age norms are concerned, that the total distributions for each of the different years would be the same multiple of the year-units, the main significance of the age units is in permitting the statement that a child had reached the tested development normal for the children of a certain age.

It is also legitimate to use years of retardation as a short way of expressing rough borderlines when they happen thus to afford an easy method of empirically describing equivalent borderlines for a particular scale. This is what I have done for convenience in Part One of this book. I certainly do not mean to contend that four-years retardation has theoretically the same significance at different ages, in terms of the deviation of the separate ages. To me the Binet years are no more than names for certain positions on the scale.

To most psychologists who have been dealing with the measurement of mental development, I believe that the most significant concept of equivalent units would be in terms of the deviations for each age provided that the form of the distributions remained normal. But the deviations vary so much in the terms of the year units that it is not likely that they will be willing to accept a year of excess or deficiency as an equivalent unit for different ages with the common forms of the scale in use in English-speaking countries. Moreover, below the age of 6 and above 15, the limits which Pearson discusses, there is good reason to expect the year unit to vary still further. This Pearson recognizes for the complete developmental curve. It is only at the intermediate years, in which the average increases are most constant in relation to the deviations of the separate ages, that the year unit may be at all serviceable in measuring the deviation of a child from the norm of his age.

With the scales in use in this country the Binet year units are not equivalent in the sense in which they are usually spoken of as equivalent. We should recognize this and emphasize it. Even if the norms at each age marked off the same proportion of the individuals, as shown in A and B of Fig. 4, unless we knew that the forms of distribution were always alike, we should not know that the distance between successive age norms was the same on any sort of objective scale other than average age increments. Moreover, we would not have an objective scale of equal units applicable to measuring the deviation of children of any one age. The average annual increments would not necessarily represent the same proportion of the total distance from the lowest to the highest ability at different ages even if the distributions were all normal. With normal distributions it would also be necessary to demonstrate empirically that the annual average growth increment between successive ages always bore a constant relation to the deviations at these adjacent ages as shown in B of Fig. 4 where the increment is equal to 1 S. D. at each age. This could not possibly hold when the increment lessened near maturity.

Fig. 4. The Question of Equivalence of Year Units.

If the distributions of ability were variously skewed, the year units of excess or deficiency would not be shown to be equivalent at the different ages even if the proportion of individuals one year accelerated was equal to the number one year retarded, two years accelerated equal to those two years retarded, etc., at each age and the norm at each age shut out the same proportions of the age group. This is shown in C of Fig. 4 in which the year units are clearly not equal steps from lowest to highest ability even for the same age and yet the usual criteria which have been suggested for discovering the equivalence of the units are fulfilled. Whether the actual distribution of ability is skewed or normal cannot be determined by the Binet scale, of course, on account of the uncertain and probably varying size of its year units in measuring deviations at any age.

With the empirical evidence against the equivalence of the year units and the impossibility of determining their equivalence unless we first know that ability is distributed normally at each age, it is certainly hazardous to assume that individual deviations measured in terms of year units are equivalent at different ages.

It may be noted that it is quite as hazardous to suppose that the units of the Point scale are equivalent in any theoretical or practical sense. This question will be discussed later in Chap. XIII, B, (b).

(c) Is Tested Capacity Distributed Normally?

Before leaving the question of the significance of units on a scale described in terms of the standard deviation we should ask whether tested mental abilities have been found to distribute normally, i. e., in the form of the symmetrical Gaussian curve with each extreme the same distance from the middle measurement. Contrary to the usual supposition in this matter, it seems as if the evidence was somewhat against this assumption, although neither position can be asserted at all dogmatically on the basis of our present data. A rÉsumÉ of this evidence which I have given below makes it appear that the assumption of a normal distribution will not conflict with a practical use of normal probability tables for medium degrees of ability, but may quite seriously interfere with such use for the borderline of deficiency. There is little doubt, as Pearson believes, that the bulk of the children now in special classes for the retarded in the public schools would fall within the lower range of a normal distribution fitted to the general population. On the other hand, there is likely to be a respectable minority of the deficients which will be beyond such a normal curve. These facts are sufficiently evident, I believe, to make it impossible to base quantitative descriptions of borderline of deficiency on a hypothesis of normal distribution.

The best evidence on this point is probably the data of Norsworthy with eleven tests on groups of 100 to 150 feeble-minded children in institutions and special classes and 250 to 900 normal children. She expressed the position of each child in terms of the deviation of the group of normal children of his age for each test. Pearson has presented her data graphically on the assumption that her defective group represented 0.3% of a general population of 50,000 children, and then fitted a normal distribution curve to her data with her normal group. The result makes it evident, especially for the intelligence tests, that the defective group would better be described as part of a skewed distribution. To less extent this is also true for the maturity and memory tests (15, p. 30). Norsworthy's own table of data show that 43 of the 74 feeble-minded taking the intelligence tests were over -5 times the probable error of their ages below the averages of the normal children, a criterion which she proposes as indicating ability outside of that included in the normal species. Moreover, 9 children score between -22 P. E. and -32 P. E. which is far beyond any conceivable extension of the normal curve. Her figure for the composite results of all her mental tests is also manifestly skewed toward deficiency although she hesitates to adopt this conclusion, and was content with showing that they grade off into the distribution of normal children.

The other data, which I have found, that indicate that tested ability, when measured in equal physical units for the same task, is skewed toward deficiency, have to do with tests that are pre-eminently for psychomotor activities rather than intellectual. They consist of Sylvester's and Young's results with the form board test on Philadelphia school children, Stenquist's results with his construction test, and Smedley's results with the ergograph test on Chicago school children. Here we may apply the better criterion of the distance of the quartiles above and below the median of the group. These positions would be less likely, through extreme records, to be affected by chance conditions during the testing.

It is to be remembered that if the records of school pupils appear to be normally distributed this would not settle our problem, since it is apparent that idiots and many imbeciles are not sent to the public schools at all. The lowest children at any age would not be represented in the regular school groups. On the other hand, the brightest children are not generally drawn away from the public schools at least before 14 years of age in this country. We shall confine ourselves, therefore, to school-children 6-13 years of age. If we find that they show ability skewed toward deficiency the results will underestimate rather than over-estimate the skewness.

Sylvester (191) tested with the form board a group of 1537 children in the Philadelphia public schools, from 80 to 221 at each age from 5 to 14 inclusive. “Except that no especially backward or peculiar children were included there was no selection.” This study gives, with the complete distribution tables, the number of seconds required for the same task by the children at each age. If we find that the limit of the lower 25 percentile was farther from the median than the limit of the upper 25 percentile we can be reasonably sure that the difference would be still greater if the excluded deficient and backward children were also included. By calculating the quartiles and their differences from the medians at each age, I find that for only two of the eight ages is the upper quartile farther from the median than the lower quartile. The average excess of the distances of the lower quartile is .64 of a second. At only age 7 is the difference three times its probable error, 2.1 seconds, P. E. .67. The form board distributions thus tend to be slightly skewed toward deficiency. The errors of the quartiles were found by the method given in Yule's Introduction to the Theory of Statistics, Chap. XVII, which assumes normal distribution, so that they are too small. The skewness is more manifest when the extreme measurements are compared with medians at each age. It is not possible, unfortunately, to compare his group of normal children with those in the special classes since he did not use the same method of giving the test.

Since it was not important to compare the amounts of skewness in different data, I have not attempted the more elaborate calculations of coefficients of skewness. These would give the results a more elegant statistical expression. The simpler method I have here used affords more convincing evidence of asymmetry for the non-mathematical reader.

Young has published the results with Witmer's form board test on approximately two hundred Philadelphia children for each age, giving the results for the sexes separately for each half year of life-age (227). This affords 36 different groups in which he gives the median and upper and lower quintiles for the shortest time records. The lowest quintile is farther from the median in 25 cases, equal in 6 and less than the upper quintile in only 6 of the 36 comparisons. This skewness would have been even greater if children of the special classes had not been excluded from his groups.

Stenquist's results (54) with his construction test are scored in arbitrary units in which allowance is made for the quality of the score, but we should expect no constant effect on the form of the distribution from the character of these units of measurement. At ages 6 to 13 he tested from 27 to 74 pupils randomly selected from the public schools, a total of over 400. For six of these eight ages the lower quartile is farther from the median than the upper quartile, when calculated from his distribution table. The number of cases at each age, however, is so small that the largest difference, 15 units, is not three times its probable error, 6.

Smedley gave his ergograph test to about 700 school children of each of the ages we are considering. Since he tested so many more subjects than any other investigator this should provide the most valuable data on the question of distribution with a test recorded in the same physical units for the same task. Unfortunately, his results for two succeeding years are so directly contradictory to each other that they seem to have no significance for our problem. The simplest explanation of this contradiction is that the groups tested may have been selected on a different basis each year.

A casual observation of his standard percentile curves for the ergograph test at the different ages gives the impression that the distributions are decidedly skewed toward deficiency, but this impression is not justified by a careful analysis of his results (51). In the table which accompanies his standard percentile curves, giving his total results for the two years, we find that there is a sharp disagreement between the distributions of the boys and the girls. The distributions for the boys at each age between 6 and 13 years show a greater distance, measured in kilogram-centimeters, from the median to the 80-percentile than from the median to the 20-percentile, in 5 ages out of 8. The total difference is also slightly greater between the median and the upper 80-percentile. On the other hand, the table for the girls at these ages shows the 20-percentile farther from the median in 5 out of 8 ages, with a total difference considerably greater than that shown for the boys. Usually the differences were small compared with their errors. With the boys only at age 13 was the difference in favor of the 80-percentile three times its probable error, while with the girls the four oldest ages show the distance of the 20-percentile greater by three times its probable error.

A comparison with the reports of Smedley on this test for the previous year (Report No. 2), leaves his results still more uncertain. While he does not give the medians at each age, we may make less satisfactory comparisons between the distance of the 10-percentile from the 25-percentile and the distance of the 90-percentile from the 75-percentile. If we do this, we find the distance is uniformly greater at the upper end of the distributions for each age both for the boys and girls. The Smedley results are, therefore, decidedly contradictory. The first year shows distributions skewed toward excellence and total results for two years show distributions skewed mainly toward deficiency.

Broadly considered, the Binet records with school children point to a skewed distribution toward deficiency when large allowance is made for the difference in value of the year units. It is extremely rare to find a child testing 4 years in advance of his life-age, while 15-year-old idiots are presumed to test 12-year-units or more under a mature standard.

Pearson believes that “the Gaussian curve will be found to describe effectively the distribution of mental excess and defect” for intermediate ages as measured by Jaederholm's form of the Binet scale. The data on which Pearson places reliance are Jaederholm's results in testing 261 normal children 6-14 years of age in the Stockholm schools and 301 backward children in the special help classes of the same city. The best fit of a normal curve to the data was obtained with a group of 100 8-year-old children, in which case the chances were even that samples from a normal distribution would fit. With his larger normal and backward groups combined in proper proportions in one population the chances were 20 to 1 that such a distribution as was actually found would not fit into the Gaussian distribution. He admits that “this is not a very good result,” although it is better than when the Gaussian curve is fitted to either the normal or the backward group alone. In a subsequent paper he gives each child a score relative to the standard deviation of the normal child of his own age, a method comparable to his treatment of Norsworthy's data. He then finds that “10% to 20% or those from 4 to 4.5 years and beyond of mental defect could not be matched at all from 27,000 children” (164, p. 46). In each case the distributions actually found were skewed somewhat toward deficiency. Furthermore, when he suggests that -4 S. D. may be used as a borderline for tested deficiency, he recognized that the mental ability of children is skewed so far as the empirical data are concerned. With a normal distribution there would not be two children in 100,000 who would fall below this borderline. Nevertheless, the normal curve serves for most practical purposes to describe the middle ranges of ability.

Pearson thinks that the skewed distributions of his data may possibly be explained by the drawing off of older children of better ability to the “Vorgymnasium,” or to the higher-grade schools, by the incompleteness of the higher age testing, or by the “possibility of the existence of a really anomalous group of mental defectives, who, while continuously graded inter se, and continuously graded with the normal population as far as intelligence tests indicate, are really heterogeneous in origin, and differentiated from the remainder of the mentally defective population” (164, p. 34). The last hypothesis, of course, supposes that mental ability is skewed and suggests the cause. He supplements this explanation by stating that the heterogeneous cause of the “social inefficiency” of the deficients may not be connected directly with the intellect but affect rather the conative side of the mind. A skewed distribution under biological principles of interpretation supposes a single cause or group of causes especially affecting a portion of the population.

It is also to be noted that the apparent form of distribution may be the result of the nature of the test and the units in which it is scored. Some tests might not discriminate equally well a difference in ability at the lower and at the upper ranges of ability. If the test were too easy the group might bunch at the upper portion of the scale and the distribution appear to be skewed toward the lower extreme where there were only a few cases. If too difficult a test were used the form of distribution might shift in the opposite direction, most of the group ranking low. It is extremely difficult to formulate mental tests so that they will equally well measure differences at each degree of ability. This objection should not hold, however, if the scoring were in units of time for the same task, as with the form board test. The essential characteristics of a test in order that it may indicate the form of a distribution is that the units of scoring shall be objectively equal under some reasonable interpretation and that they shall be fine enough to discriminate ability at each position on the scale. Under such conditions the variations in the difficulty of tests should not obscure the form of the distribution of the ability tested.

Turning to the analogy of measurements of physical growth, a strong argument may be made for the hypothesis of shifting forms of distribution. As Boas points out regarding measurements of the body at adolescence, owing to the rapid increase of the rate of growth the distribution of the amounts of growth is asymmetrical, “the asymmetry of annual growth makes also all series of measurements of statures, weights, etc., asymmetrical.” Moreover, “acceleration and retardation of growth affects all the parts of the body at the same time, although not all to the same extent.... Rapid physical and rapid mental growth go hand in hand” (80). There is no reason to suppose that the brain is free from this phenomenon of asymmetrical distribution of annual increments of growth among children of the same age when the rate of growth is changing as at adolescence. It is therefore to be expected that the separate age distributions would be skewed at early ages and at adolescence even if the distribution should be normal with a static population. The presumption from physical measurements is that the form of distribution shifts with age.

Again we may note that if some of the idiots reach an arrest of development before any of the normal individuals, as several investigators contend, this would imply that the distributions must be skewed unless there is a curious corresponding acceleration of growth on the part of geniuses to balance this lagging by idiots.

In spite of these arguments and the evidence of asymmetry of measurements at least at some periods of life it is to be noted that current opinion is probably contrary to this hypothesis, although, as I believe, because it has been concerned mainly with those who are not of extreme ability. For all large medium ranges of ability slight skewness might well be negligible. It is interesting to note that Galton says that “eminently gifted men are raised as much above mediocrity as idiots are depressed below it” (159, p. 19). Measured by intelligence quotients with the Stanford scale, Terman finds among school children that deviations below normal are not more common than those above (197, p. 555). Burt, following a suggestion of Cattell as to college men, however, seems to incline to the opinion that the general distribution of ability, like wages, is skewed toward the upper end. He adds, “In crude language, dullards outnumber geniuses, just as paupers outnumber millionaires” (85).

(d) Equivalent Units Of Development When The Form Of Distribution Is Uncertain.

For our problem of units and scales of measurement, an asymmetrical distribution sets a very difficult problem. It may be that this very difficulty has been one of the main reasons for slowness in recognizing the drift of the evidence. In order to set forth the difference in the conception of measurement when distributions become asymmetrical I have presented this hypothesis in connection with the curves of development in Fig. 5. It will be noted that if the distributions of mental capacity vary in symmetry, the units of standard deviation change in significance from one form of distribution to another. Minus 2 S. D. may exclude very different portions of groups differently distributed, while it would always exclude the same proportion if the distributions had the same symmetry, or skewness.

Under conditions of variable symmetry there is a sense in which the same relative physical score in units running from zero ability to the best ability would always have an equivalent objective meaning, but this might not express equivalent development conditions at different ages. For example, with shifting forms of distribution, to say that a child of six years had reached three-fifths of the best development for his age on an objective scale might give no significant indication of how nearly he was keeping pace with those three-fifths of the best ability of another age. Neither would his position in units of the deviation of ability at his age give this information without knowledge of the form of the distribution of ability at his age. With varying forms of distribution at different stages of development this would afford an insurmountable difficulty.

Fig. 5. Hypothetical Development Curves (Changing Forms of Distribution)

With unknown or varying types of distribution it is desirable to utilize percentiles as equivalent units for comparing individuals at different stages of development. They differ somewhat from ranks in an order of noticeable differences. With an indefinitely large group, such ranks would mark off only those cases which were indistinguishable in merit. These units would be numbered in order from the highest to the lowest in ranks of just distinguishable merit, a different number of individuals conceivably occurring at the single steps. Psychologically the percentiles are somewhat less significant because they are not conceivable in steps of just noticeable differences. Percentiles have less value in comparing abilities in the same distribution, but have decided advantages when comparing corresponding abilities in different distributions. Except at points where merit is indistinguishable, they signify that a certain proportion of a group is ahead in the struggle for existence. They are thus units of relative rank. Moreover, they are directly translatable into units of the deviation in case the form of the distribution of ability has been determined. This is a special advantage if the forms of distribution turn out to be normal or even uniform.

In using percentiles it is to be remembered that equal differences between percentiles are not comparable in the same distribution except in the sense of the same extra proportions of the group to be met in competition. A change in the degree of ability from the lowest percentile to the lowest 2 percentile would be very different from the change in the degree represented by the 50 percentile to the next percentile above. Differences in the ability of individuals ranking near each other in the middle of the same percentile series would be distinguished with difficulty while it would be easy to make such discriminations at the extremes.

The special value of the percentile units in measurement of ability lies in the comparison of individuals of corresponding position in corresponding groups in which the ability may not be assumed to distribute alike. The concept that 995 out of every 1000 randomly selected individuals at his age are ahead of a particular individual in the struggle for existence has very definite and significant meaning which is quite comparable from one period of life to another regardless of the form of the distribution. We shall return to this question of equivalent units in distributions of unlike symmetry when we compare the definitions of the borderlines of deficiency in terms of intelligence quotient, coefficient of intelligence, standard deviation and percentage. Corresponding percentages of corresponding groups have a more useful definite significance of equivalence than any other units of measurement of mental ability available when the forms of distribution vary at different stages of development or are uncertain, as seems to be true with tested abilities.

B. The Curves of Mental Development.

When we endeavor to make our ideas of mental development more definite, we are assisted by thinking of the various stages in graphic form. This is especially true when trying to think of the position of the deficient individuals, relative to the average individuals and to genius.

In diagrammatically presenting these concepts in Fig. 3 and Fig. 5 we do not wish to assume that all the principles on which the developmental curves have been plotted have been decided. If they make clearer the points still under discussion and direct the discussion to specific features so that more data may be brought to bear upon the empirical determination of their characteristics, they will serve a useful purpose. For our present ends, we shall consider only certain features which have a bearing upon the interpretation of developmental scales and the quantitative definition of the borderline.

In the graphic presentation of the curves of development in Figures 3 and 5 the relative position at various ages has been suggested hypothetically for those of the best ability and median, or middle ability, as well as the borderline of the deficients.

It is evident that these graphs should represent equivalent ability at each stage of development measured by as objective a scale of measurement as possible. In the graphs this scale is assumed to be composed of physical units with its zero at zero ability. The deficient group is distinguished by the portion with a grated shading. The distribution curves of individual ability we have already mentioned in connection with scales of measurement. Fig. 3 is constructed on the assumption of a normal distribution of ability at each age extending to the same zero ability. Fig. 5 on the assumption of distributions of varying form.

Otis has given a very able logical analysis of certain concepts underlying the testing of mental development (163). His discussion differs from the present in its aim to determine the proper mental age for particular tests, a question which I have not considered. It also supplements the present discussion by showing the changing value of the same intelligence quotient with normal distributions of ability under certain assumptions as to range of ability and decrease in the annual increments of ability with age.

(a) The Significance Of Average Curves Of Development.

Some investigators are apparently inclined to question the significance of any curve of mental development on account of the very different forms of development which they have found in particular cases. A quotation from Goddard will state this problem:

“It seems to me that there is considerable evidence that there are a good many children that develop at a normal rate up to a certain age and then slow down; some slowing down gradually and others rapidly. This is possibly accounted for by accidental conditions. Dr. Healy's case of traumatic feeble-mindedness is a good illustration of this. We have quite a good many cases, not a large percentage as yet, where it is pretty clear that they have developed very nearly normally up to the age of seven, eight or nine, so that I am very skeptical as to the possibility of formulating a rule for determining the rate of development. Many cases are uniform in slowness while others vary a great deal; some slow up more rapidly than others as has already been stated....

“Morons are not usually discovered until twelve or fourteen years of age. The picture to me of the development of the feeble-minded is rather that these different types develop each in his own way very much as the physical side develops. Different families have different determiners of development. Just as it was determined before I was born that I should be five feet, ten inches tall, I developed that height and no further. In the same way, probably, that determiner carries with it the determination of the rate of development and the time. This carries with it the fact that I should have been an average boy from birth. As a matter of fact I was very much under-size until I was fifteen or sixteen years of age. Then I shot up. Other cases are over-size. It may be a false analogy, but it seems to me to illustrate the rate at which these cases develop” (111).

This view raises clearly the question how far the curve of average development represents a common tendency of different individuals in development. Are the individual curves of development so varied in form that an average curve does nothing but obscure their significance? The study of individual curves of growth in height and weight by Baldwin indicates that the bigger children tend to develop earlier, the smaller later (73). The individual curves of mental development may be analogous. If so, the average curves may not adequately represent the common tendencies of development. Nevertheless, it is to be remembered that with height and weight the average curves do retain a decided usefulness, which nobody, I suppose, would seriously question.

An analogous problem arises when we consider the question of variations in the maturity of different mental processes. Besides the question whether the average curve is useful in view of the variation among individuals in their rates of maturity for the same process, the psychologists have a still more difficult problem about curves of general ability. These curves are built by combining the results of numerous psycho-physical tests which are very different in type. We need to raise the question whether the type of process measured by memory for digits, for example, matures at the same rate as those processes measured by other memory tests: in general, how much a single test or combination of tests represents a common process. Furthermore, we need to inquire whether processes measured by memory tests mature like those measured by tests emphasizing reasoning, imagination, motor ability and other groups of activities. We thus have the problems of the different rates of maturity of the different tested processes in the same individual and of common tendencies among these specific processes.

In order more clearly to present this problem of the significance of developmental curves for different processes, I have brought together the age norms from 8 to 14 years for 40 tests as given by different investigators. No norms were included which were not based on tests of at least 25 individuals. After 14 years the data which have been collected are open to the objection that the norms for the older ages would be seriously affected by the fact that they were obtained upon children remaining in school, usually in the elementary school, i. e., upon groups, among which a large portion of those of better or of poorer ability had been eliminated. The relative position of the norms for older ages are, therefore, not comparable with those of children who are of the ages of compulsory attendance. The results published are inadequate below 8 years for most of the tests, so I have not extended the curves to earlier ages. In 14 instances the data for boys and girls were only given separately. In these I have used the norms for the boys. A prepubertal break in a combined curve may, therefore, indicate a sex difference. In most cases the norms were given for the sexes combined, and the difference is unimportant for the points considered.

The variation in age norms with different tests is shown graphically in Figures 6, 7 and 8. In order that the various tests may be plotted on the same scale, so as to compare changes in development for the different tested processes, I have used the average increase in ability from 8 to 9 years of age for each test as a common measure and arbitrarily plotted the slant of the curve between these ages at 45 degrees. The increase from 8 to 9 is represented by 10 units on the objective scale to the left of the graphs. On this basis it is possible roughly to compare changes in the absolute annual increase at different ages for the same test and for different tests. It assumes that the units in which each test is scored are equivalent for that test. An average difference between the basal ages or between any two ages cannot be assumed to be accompanied by the same distribution of increases. Moreover, the 8-year norm is at different distances from zero for the different tests so that the relative increase from 8 to 9 cannot be regarded alike for the different tests. The method, however, is sufficiently accurate for illustrating the very different forms of the developmental curves which might be expected if they were measured by absolute increases from year to year. Even the variation in the slant of the lines at the different ages gives a graphic picture which will assist in interpreting the significance of average curves of general ability. As the curves stand, they show the norms for each age for any test, as if placed on its own objective scale, and the various objective scales have been harmonized on the assumption that the norms at 8 and 9 years are accurate. We thus have a simple representation of the absolute changes in the abilities tested from age to age by the same tests relative to a single objective scale. It will not give a seriously erroneous picture for any tested ability so long as the units in which the particular test is scored may be presumed to be objectively equal.

The tests on which Figures 6, 7, and 8 were based included practically all which were reported in the researches used. They were as follows: Norsworthy (159), perception of 100-gram weight, cancelling A's (boys), ideas remembered from four simple sentences, memory of related and of unrelated words, part-wholes, genus-species, opposites and reverse of opposites given the next day, “a-t” test. J. Allen Gilbert (108), taps in 5 seconds, fatigue in tapping, visual reaction time, color-discrimination reaction time, reproduction of 2-second interval. Smedley (51, No. 3), strength of right-hand grip (boys), taps in 30 seconds (boys), ergograph; visual, auditory, audio-visual, and audio-visual-articulatory memory for digits. W. H. Pyle, Standards of Mental Efficiency (J. of Educ. Psychol., 1913, IV., 61-70), uncontrolled association, opposites, part-wholes, genus-species, digit-symbol and symbol-digit substitution, memory for concrete and for abstract words, memory of Marble Statue selection, (only boys' norms used for each). Pyle and Anderson combined by Whipple (220) two word-building tests (boys). Anderson as given by Whipple memory for letter squares. D. F. Carpenter, Mental Age Tests (J. of Educ. Psychol., 1913, IV., 538-544), substitution of colors in forms and of numbers in forms, perception time in marking A's, concentration, i. e., difference in time of last test under distraction, memory of pictures of objects, all tests devised by Carrie R. Squire. Stenquist (54), construction test. Sylvester (191), form-board test.

Fig. 6. Tests of the Development of Memory Processes. Medians at Each Age of the Central Tendencies of the Tests.

Fig. 7. Different Types of Development. Medians at Each Age of the Central Tendencies of the Tests.

Fig. 8. Forty Curves of Development. Distribution at Each Age of the Central Tendencies of the Tests.

In Fig. 6 curves A and B are Smedley's tests; curve C includes in addition Norsworthy's unrelated words, Pyle's memory for concrete and abstract terms, Anderson's letter-squares, Carpenter's memory for pictures, and Gilbert's for the time interval; curve E includes Pyle's two and Carpenter's two substitution tests; curve F includes Pyle's Marble Statue and Norsworthy's memory for related words and for sentences; curve S is Norsworthy's; curve D is the combination of these 17 tests.

In Fig. 7 curve H includes Gilbert's visual reaction time, Norsworthy's A and a-t tests, Carpenter's two A tests; curve I includes Gilbert's and Smedley's tapping tests; curve J is the median of the central tendencies of all 40 tests; curve K includes Norsworthy's two opposites and her part-whole and genus-species tests, the Pyle opposites, genus-species and part-whole tests; curve L is the same as D, curve M includes Smedley's strength of grip and ergograph tests and Gilbert's fatigue of tapping; curve N includes Pyle and Anderson's word building tests and Pyle's uncontrolled word association test.

In Fig. 8 curve P is Gilbert's visual reaction time test, curve S is Norsworthy's test for memory of unrelated words, the other curves are the median and quartiles for the central tendencies of all 40 tests after each was expressed at each age in terms of the gain from 8 to 9 years taken as a unit.

Several points are to be noted about the nature of the curves for different tests. In Fig. 6 showing the curves for different forms of memory tests, that for the memory of digits is very different in character from that for memory of related material. The most extreme differences in the time of maturity are shown by the test for memory for digits presented orally and the substitution of color in forms, the former continues to increase so rapidly relative to the absolute increase from 8 to 9 years that it cannot be represented in the graph reaching 539 units of the scale by 14 years of age, while improvement in ability in the latter is not measured after 9 years. We cannot take time to discuss how much of the differences between the various curves may be due to the nature of the tests themselves, the form of scoring the results, or the condition under which they were given, selection of subjects, etc. The conclusion is safe, however, that when groups of three or four tests of similar type show such marked differences as those for memory of digits and memory for related material we may expect similar differences in the rates of maturity of the corresponding processes.

From Fig. 7 we may learn that tests emphasizing functions such as speed of motor or perceptual motor reaction, curves H and I, are notably different in their form from curves for tests of imaginative processes, curve N. As we group tests together covering larger ranges of activity we approach the median curve for general ability. Note the median curve for 17 memory tests (curve L) compared with the median for the 40 tests (curve J). By empirical studies we might pick out types of tests which would most closely represent the maturity of average ability. For example, the median for the substitution tests, curve E, resembles the median for the memory tests, curve D, more closely than does that of the 4 digit tests, curve B. Curve K, for 7 association tests, resembles the median for the 40 tests, curve J, much more closely than the curve for the perceptual-motor speed tests, curve H. This difference can not be explained by the use of 7 instead of 5 tests in calculating the central tendency of the group. It probably means that the sort of psycho-physical processes usually tested more closely represent on the average the abilities shown in association tests than they do the abilities shown by speed of motor reaction. The significance of this sort of analysis for those constructing a scale for measuring intellectual ability is obvious.

Fig. 8 shows the median and quartile range for the central tendencies of the 40 tests and gives examples of two extremely different tests, visual reaction time and memory for unrelated words. How closely these particular tests represent fundamental differences in the maturity of different processes, we cannot, of course, be sure without prolonged research; but nobody would question that analogous differences would be found in different processes. When we think of curves of general ability we must, therefore, keep in mind the light which might be thrown on them by an analysis of the various processes tested in the particular scale used.

Another feature of all developmental curves which is apparent as soon as the causes of development are considered, is that growth in an individual is the result of several factors. These include the native capacity, the rate at which that capacity manifests itself instinctively, and the external stimuli which encourage or retard that manifestation. To some extent these factors vary independently. Our curves of development will never completely express all the facts until they analyse out all these factors for each of the processes. In the meantime we shall be able to think of general trends of development by considering average curves. The fact that they represent combinations of unanalyzed factors must, however, make us very cautious in interpreting our norms.

(b) Changes In The Rate Of Development.

There has been considerable discussion of the form of the curves of mental development. The logical aspects of the curves on the assumption of normal distribution of ability at each age and uniform age of maturity have been treated by Otis (163) and the bearing of these assumptions upon the Binet scale pointed out. Thorndike has plotted the developmental curves for a dozen tests on the basis of the variability at 12 years of age used as unit and gives a chapter in his Educational Psychology to the changes with maturity (198, Chap. XI). Bobertag suggests that the rates of development of normal and deficient children are analogous to the upward progress of two projectiles fired from such different heights that the force of gravity would retard the lower projectile more than the upper (81). This analogy supposes that the rate of maturity would continually decrease and that those who were feebler mentally would be arrested in their developmental earlier. Bobertag, Kuhlmann (137, 138) and Otis give evidence from the results of Binet testing that the rate of development decreases with age. The percentages of older children passing certain positions on the Binet scale or certain tests taken from it were found to change less at year intervals for the older ages. This evidence is not conclusive unless we know that the positions compared are at the same point in the distributions of ability at the beginning of the periods of growth. The same percentage change at a point farther away from the central tendency would mean a larger growth than at the middle of the distribution, when judged either in reference to a physical scale or to units of deviation.

While recognizing that the complete curve of mental development is logarithmic in form Pearson contends that, when measured by Jaederholm's adaptation of the Binet scale, development is adequately represented by a straight line from 6 to 15 years of age (164). As this conclusion is based upon the use, as equivalent units, of years of excess and deficiency at all these ages the data lacks the cogency of a scale of equal physical units.

With the Point Scale it is not known whether the units in different parts of the scale are equivalent. Without assuming that they are equal it is impossible to discover the form of curves of development from the records of children at a series of ages. Yerkes and Wood publish a curve of the increase of intellectual ability based upon point-scale measurements, which resembles in form the hypothetical curves. They say:

“The point-scale method has the merit of indicating directly the rate, or annual increments of intellectual growth. We do not claim for our measurements a high degree of accuracy, especially in the case of the early years of childhood. But even the roughly determined curve of intellectual growth from four to eighteen years, which we present below, has considerable interest for the genetic psychologist and for the psychological examiner. We have ascertained that whether measured by the ratio of the increment of increase, year by year, to the norm for the appropriate year or by the ratio of the extreme range of scores to appropriate year norms, intellectual development rapidly diminishes in rate, at least from the fifth year onward” (169, p. 603).

Waiving the question whether annual increases or the range of measurements relative to the age norms would be satisfactory indications of the change in the rate of growth, it seems to be fairly clear that neither of these criteria would be adequate unless we first knew that the units in which they were measured were equivalent at different portions of the scale. To show that the point scale units are even theoretically equivalent it would seem to be necessary to assume, on the basis of normal distribution of ability, that each unit of the deviation for each age distribution either equaled the same number of scale units or the same proportion of the total distance from lowest to highest ability at each age measured in the point-scale units. The originators of the scale do not seem to have planned it with this in view. Moreover, the difficulty of empirically demonstrating such equivalence of units on a point scale or any form of the Binet scale prevents its use for indicating curves of mental development, however serviceable it may be for other purposes.

The simplest demonstration of the form of the development curves is applying the same test, scored in equal physical units, to children of different ages. In Figs. 6, 7, and 8 the evidence from tests was assembled for ages 8 to 14 inclusive. It is probable, however, that the form of these development curves, when the unit of measurement was anything but time taken for the same task, has been affected by the difference in the real value of units called by the same name, e. g., giving the opposite of one word is not always equal to giving the opposite of another.

The best developmental curves empirically determined are probably those for the form board presented by Sylvester (191), Wallin (212) and Young (227) since in each of these cases the same test was presented at all ages and the scores were in equal physical units of seconds. It can hardly be supposed, however, that the form board curves alone would be typical of average mental development. To know something about the general curve of mental development we need a combination of a number of mental tests scored on scales of equal units. These may be either equal physical units or units on scales for mental development similar to those of Thorndike and others for measuring educational products, handwriting, arithmetic, spelling, etc.

That either a straight line or a simple curve would represent the development of ability from birth to maturity is very doubtful. When we consider the entire developmental curve from birth nobody doubts that there is a change in the rate of development at the time of the arrest of instinctive changes at adolescence. There are probably fluctuations in the rate before this final arrest. Pintner and Paterson also assume a complex curve of development (44). Whether the fluctuations should be allowed for in the description of the borderline of deficiency is the important question in our study. With measurements of bodily growth we noted that changes in the rate of maturity are accompanied by a skewness of distribution of ability at the ages affected. The same effect may be expected with mental measurements. The percentage method of defining the borderline of deficiency has an advantage when the form of distribution at any age is uncertain (See Chap. XIV, d.). Since the changes in the rate of development are most likely to be important at the prepubertal and adolescent ages the description of the borderline in terms of deviation or quotient may be expected to be most uncertain at this period. Moreover, none of the quantitative definitions of the borderline, except the percentage method, remain equivalent if rates of development of normal and deficient children change relative to each other, a question we shall now consider.

(c) The Question Of Earlier Arrest Of Deficient Children.

It has been assumed by Bobertag (81), Stern (88), Goddard (117) and others that deficient children reach their maturity earlier than normal children. If this were true the curves of mental development for the average and for the deficient children should not be expected to retain their same relative positions after the idiots had begun to show arrested development. Moreover, unless this arrest were compensated by some peculiar form of accelerated growth among those above normal ability, we might expect that the distributions of ability would change in form at the various ages after arrest had begun. A relative increase in the distance of older deficients from the average as compared with younger deficients may be interpreted as meaning either the earlier cessation of growth of the deficients or a change in the relative rates of growth of individuals of different mental capacity. When fully considered the present evidence from the Binet tests fails, I believe, to demonstrate the earlier arrest of the deficients, although it is undoubtedly true that the Binet scale may not be fine enough to measure the improvement of idiots. We shall take up certain investigations that bear upon this point.

Goddard has reported tests upon the same group of 346 inmates in an institution for the feeble-minded who were tested three years in succession (117). The paper suggests that the idiots, as a group increased less in absolute ability than those of higher mental age. The average gain for 55 idiots who tested I or II mentally was about half a test in the two years. In order to reach our present problem, however, we must know that the idiots, for example, developed relatively less mentally than did those of the higher grades of ability in the imbecile and moron groups of the same life-ages. This question cannot be answered from the paper. It probably cannot be adequately answered from mental age results on account of the irregularity in the value of the year units at different points on the Binet scales.

Bobertag summarizes Chotzen's data obtained by the examination of the children in the Breslau Hilfsschulen with the Binet scale. He believes that the position on an objective scale attained by the average of these retarded children is progressively lower with advancing age relative to the average position attained by normal children, assuming that the quotient for normal children remained constant at each age. The average intelligence quotients of all the children in the special schools (exclusive of those testing III or less) was 0.79 for those 8 years of age, 0.72 for those 9 years, 0.70 at 10, and 0.67 at 11-12 (81, p. 534).

Stern also compiled a table from Chotzen's results which shows this decrease in intelligence quotients with life-age separately for each group of those whom Chotzen by his expert diagnosis regarded as imbeciles, morons, doubtful, and not feeble-minded although attending the special schools (188, p. 80). This table is reproduced here as Table XX. On the surface it suggests that the quotients of the extreme groups are nearer together at the older ages, instead of being farther apart. The objection to this evidence from the Binet scale is that the norms are not equivalent for different ages on the scale used. Since the objective norms on the Binet scale are more difficult to attain at the older ages this variation would tend to make older children show lower quotients than the same children would show at younger ages, so that such tables are quite uncertain in significance.

TABLE XX.
Average Intelligence Quotients of Children of Different Ability. (From Chotzen's Tables X & XI.)
Life-Age Not Feeble-Minded Doubtful Defect Morons Imbeciles
8 0.92 0.84 0.76 0.71
9 0.85 0.81 0.77 0.67
10 (0.80) (0.80) 0.74 0.62
11 (0.73) (0.68) 0.71 (0.64)
12 (0.75) (0.75) (0.73) (0.61)
13 (0.73)

The Jaederholm data with his form of the Binet scale, as treated by Pearson, shows a straight regression line for the backward children which falls below the normal development line on the average four months of mental age for each additional year of life from 7-14 (167). Accepting Pearson's interpretation that a year of excess or deficiency and a year of growth is a constant unit, we find that the deficient group from special classes was falling continually behind the normals with increase of age a relatively greater distance from any rational reference point. Pearson accounts for this change in the distance between the two groups of normal and backward children, as I understand his paper, by supposing that with increase in age more and more normal children become deficient. It would seem that this data would be more easily explained by supposing that the distributions became skewed toward deficiency for the older ages, rather than that the distributions remained normal and became flatter.

The best evidence as to the relative positions of the curves for deficients and those for average ability would be provided by using psychological tests that could be adequately scored in terms of equal physical units for the same task. The position of various lower percentiles relative to the average or to an assumed reference point could then be compared on the same objective scale. I have reviewed studies of this type in discussing skewed distributions in Chap. XIII, A, c. I there reached the conclusion that the weight of the evidence was that the distributions were slightly skewed in the direction of deficiency, although the evidence was not conclusive. We are now raising the further question whether this skewness increases with age.

On account of the difficulty of determining the points for zero ability in terms of the physical scales used, let us see what conclusion might be reached if we calculated the relative distance of median and low ability of equivalent degree from the scores of the same higher degree of ability assumed as a reference point at the various ages. There seems to be no reason in the theory of measurement why the highest score instead of the lowest score in random samples might not be used for a reference point for comparing the distances between normal and deficient children at different ages. Instead of using the highest single score, it would be better to use the upper quartile or quintile since it would be less affected by a chance error in giving the test.

Applying this method to determining the relative position of median and retarded ability I have calculated the data for the form board test cited previously from Sylvester (191) and from Young (227). This affords the only adequate evidence of which I know, derived from tests scored in equal physical units given to sufficiently large groups to indicate whether or not the retarded group changes its relative position from the normal group at different ages. The comparison is shown in Fig. 9. With Sylvester's data the distance of the lower quartile in ability from the median is compared with the distance of the upper quartile from the median, the latter distance being taken as a unit. With Young's data for Witmer's form board the quintile is used instead of the quartile and each sex is given separately. Since Young's table shows the scores for half ages, it was necessary to take the average of the two scores, thus giving the approximate score for the middle of the complete age group. The graph discloses no pronounced tendency for the retarded group to fall relatively farther behind the median with increase in age. There are, however, notable fluctuations in the relative positions of the groups so that at 7 years with Young's data for boys and at 13 years for Sylvester's curve the retarded group is twice as far from the median relative to the distance between the median and the corresponding better group as it is at some other times. It is possible that the curves for the older groups of those of poorer ability are too high since it is likely that more of the actually deficient children tend to be dropped from the public school classes with increase in age. Nevertheless, so far as the evidence at present goes it is not sufficient to determine whether the backward and the corresponding better group show a general change in their relative distances from the median with approach to maturity.

Fig. 9. Relative Positions at Each Age of the Median and of Corresponding Bright and Retarded Children with the Form Board Test.

On the other hand the curves indicate the tendency for the distributions to be skewed toward deficiency and for the relative distances to fluctuate as we should expect if the accelerations in growth occurred at different ages for those of different ability. The data of Young suggest that there may be sex differences in the age of acceleration, the backward girls showing accelerations, relative to the upper group at ages 7 and 12, a year or more before the boys. For Sylvester's data the ratio of the distance between the median and the lower quartile divided by the distance between the median and the upper quartile for each of the age groups is as follows: 5 yrs. 1.8, 6 yrs. 2.4, 7 yrs. 3.0, 8 yrs. 2.0, 9 yrs. 2.2, 10 yrs. 2.4, 11 yrs. 2.0, 12 yrs. 1.8, 13 yrs. 3.0, 14 yrs. 2.1. For Young's data the corresponding ratios are—Boys: 6 yrs. 1.5, 7 yrs. 1.9, 8 yrs. 1.5, 9 yrs. 0.8, 10 yrs. 1.6, 11 yrs. 1.2, 12 yrs. 1.4, 13 yrs. 1.0, 14 yrs. 1.3. Girls: 6 yrs. 1.7, 7 yrs. 1.0, 8 yrs. 1.5, 9 yrs. 0.9, 10 yrs. 1.0, 11 yrs. 1.3, 12 yrs. 0.9, 13 yrs. 1.5, 14 yrs. 1.4. Changes in the rate of growth causing asymmetrical distributions are to be expected throughout the periods of growth. A fundamental skewness toward deficient mental capacity, therefore, would be indicated only if it were found at maturity or at ages when the average rate is decreasing, when the more capable individuals would theoretically approach relatively nearer the deficients if the latter accelerated later.

So far as physical growth is concerned Baldwin (74, 75) has shown with repeated annual measurements on the same group of children that the period of adolescent acceleration shifts from 12½ years for the tallest boy to 16 years for the shortest boy. For the tallest girl the maximum height was attained at 14½, for the shortest at 17 years, 3 months. Maturity may be reached at 11 years by a tall well nourished girl, while with a short girl light in weight it may be delayed until 16. “Children above medium height between the chronological ages of 6-18 grow in stature and in physiological maturity in advance of those below the medium height, and they may be physiologically from one to four or five years older than those below the medium height. Those above the medium height have their characteristic pubescent changes and accelerations earlier than those below; there is a relative shifting of the accelerated period according to the individuals' relative heights” (74).

Doll presents evidence from the physical measurements of a large feeble-minded group in institutions which he suggests shows that the shorter among them cease growing earlier. When the height of these feeble-minded is measured in relation to the Smedley percentiles of the height of normal children of their corresponding ages, he finds a correlation of -.20 between age and percentiles of height, the taller relative to normals being younger. He says: “This confirms Goddard's similar conclusion, but negatives for the feeble-minded at least, the theory affirmed by some writers, that children who grow at a retarded rate continue their growth to a later age” (98 p. 51). On the contrary this minus correlation is more likely to mean only that the Smedley norms on school children are too high for the older ages because of the excess of taller children who remain for the high school work. This would give the minus correlation without supposing that the taller individuals continue their growth to a later age, as he thinks.

Moreover, a total longer period of physical growth for smaller, less normal, children has been demonstrated. Boas (80) says: “Among the poor the period of diminishing growth which precedes adolescence is lengthened and the acceleration of adolescence sets in later; therefore, the whole period of growth is lengthened but the total amount of growth during the larger period is less than during the shorter period of the well-to-do” (80). A reversal in growth tendency between brain capacity and size of body, which is supposed when the mentally deficient are said to arrest earlier, would be one of the most puzzling paradoxes in the study of development. We should, therefore, be exceedingly cautious before accepting the hypothesis of the earlier maturity of deficient children.

A complicated situation is presented when we come to represent graphically the effect on the distributions of these differences in growth among those of different intellectual capacity. In the hypothetical diagrams, Fig. 5, it is shown how arrest of development might be presented graphically in relation to the distribution curves, ability being measured on the same physical scale. The earlier acceleration and earlier maturity of those of better ability are indicated. The distributions are shown as skewed at all ages after birth. Equivalent units of mental development at different ages can be found only in corresponding percentages of the groups, not in the units of the deviation or in development quotients relative to the averages at different ages. In other words the lowest 0.5% continues to be an equivalent unit while -3 S. D. measures different portions of the group and different portions of the distance from lowest to highest ability. Corresponding percentages retain one common significance, namely, that the same proportion of the group is ahead in the struggle for survival, regardless of the form of the distribution.

It is hoped that the discussion of the statistical problems connected with the quantitative study of mental development has given more meaning to the different attempts to devise scales for measuring mental ability. It should be noted that the same relative development at different ages, expressed relative to the distance from lowest to highest ability measured in equal objective units, does not correspond to the same relative development measured in percentages of the groups, as soon as the forms of the distributions change. The theoretical considerations show that we have available at once a perfectly definite and clear method of stating relative development in terms of corresponding percentages of corresponding groups. If the groups distribute normally these units are translatable into units of the standard deviation of the group. If the distributions change in symmetry the only equivalent units of deficiency available are in terms of corresponding percentages reading from either end of the group. On the other hand percentile units are not equivalent in amount of change for the same distribution, so they are of most importance for comparing different age distributions of uncertain forms.

Until we have a scale of equal objective units for mental ability, it is not possible to obtain a measure of relative development which shall take into account the amount of relative change. We must be content to measure the change in percentile rank (changes in serial position) of an individual relative to those of his own age.

Having clarified our conceptions of mental development and brought them into harmony with certain suppositions regarding the distribution of ability and its change from year to year, we are in a better position to evaluate in the following chapter the different objective methods of defining the borderline of feeble-mindedness.

CHAPTER XIV. QUANTITATIVE DEFINITIONS OF THE BORDERLINE

On the basis of the detailed conception of the developmental curves and distributions of ability at different ages, which we have been considering, we can now compare the percentage method with other quantitative methods of describing the borderline on developmental test scales.

A. Different Forms of Quantitative Definitions

The earliest form of the quantitative description of the borderline on a scale of tests, was in terms of a fixed unit of years of retardation. This was taken over apparently from the rough method of selecting school children to be examined for segregation in special classes by choosing those who were two or three grades behind the common position for children of their ages. As this amount of school retardation was greater for older children, an additional year of retardation was required after the child had reached 9 years of age. I believe that nobody would seriously defend a practice of making an abrupt turning point of this kind, except on grounds of practical convenience. The theory of stating the borderline in terms of a fixed absolute unit of retardation is so crude that it has now been generally superseded by methods which make the amount of retardation a function of the age.

In order to relate the definition to the age of the child, at least during the period of growth, Stern suggested the “intelligence quotient,” consisting of the tested age divided by the life-age (188). This has been adopted by Kuhlmann with his revision of the Binet scale (139) and by Terman with the new Stanford scale (197). With the Point scale Yerkes utilized a similar ratio method for stating borderlines by what he calls a “coefficient of intelligence.” He defines it as “the ratio of an individual's point-scale score to the expected score, or norm” (226, p. 595). Haines also uses these coefficients, dividing the individual's score on the Point scale by the average number of points scored by those of his age (26). The difference between the “quotient” and the “coefficient” seems to be mainly empirical since they are theoretically alike in principle provided the scales by which they are determined are composed of equal units. Empirically, however, the units of the point scale would have to be compared with the 0.1 year units of the Binet scale to determine which showed the greater uniformity within its own scale. The coefficient has an advantage over the quotient in that the scale norms for the different ages would automatically become readjusted with additional data, and that physiological age norms could be more readily stated if they were ever available.

The suggestion of defining the borderline of tested deficiency in terms of a multiple of the standard deviation of ability of children who are efficient in school was made by Pearson in 1914. Tested inefficients did not with him include all inefficients, as he recognized other sources of deficiency. He had previously suggested a scale of mental ability in units called “mentaces”, 100 of which were equivalent to a unit of the standard deviation of all ability assumed to be normally distributed. On this scale of mentaces the imbeciles were 300 mentaces or more below average ability and would be expected to occur once among 1000 individuals chosen at random. Very dull, including some mentally defective individuals, were also to be found from 208 to 300 mentaces below the average (166, p. 109). Defining the borderline in terms of the deviation of a normal population was definitely forecasted by Norsworthy, although she did not specifically discuss the problem of the borderline. She indicated that if children tested below -5 P.E., they might be regarded as outside the normal group.

The following quotation from Pearson will make the method of stating the borderline in terms of a multiple of the deviation clearer:

“Now the question is, what we mean by a 'special or differentiated race': I should define it to mean that we could not obtain it by any selection from the large mass of the normal material. Now in the case of the mentally defective, we could easily obtain children of their height, weight, and temperature among the normals. We could, out of 50,000 normal children, obtain children practically with the same powers of perception and memory as the feeble-minded, as judged by Norsworthy's data. But not out of 50,000, nor out of 100,000 normal children, could we obtain children with the same defect of intelligence as some 50% of the feeble-minded children. In other words, when the deviation of a so-called feeble-minded child from the average intelligence of a normal-minded child is six times the quartile or probable deviation of the group of normal children of the same age, it falls practically outside the risk of being an extreme variation of the normal population. Now six times the quartile variation is almost exactly four times the standard deviation or the variability in intelligence of the normal child, and in the next material I am going to discuss [Jaederholm's], we have shown that the standard deviation in intelligence of the normal child is just about one year of mental growth” (164, p. 35).

With the Jaederholm data obtained in testing children in the regular and in the special classes in Stockholm by a modified form of the Binet scale, Pearson found that a year of excess or defect in intelligence was practically a uniform unit from 7 to 12 years of age and was about equivalent to the standard deviation of normal children measured in these year units. He, therefore, uses a year unit and the standard deviation as interchangeable for these data. He does not, however, always make it clear whether he means that the equivalence of the year units is determined by the standard deviation of the children of all these ages grouped together in one distribution, as it is in determining the regression lines, or by the equivalence of the standard deviations of the separate ages, especially when these two deviations are not equal in terms of the year units on the scale. I shall assume, however, that he would use the deviations of the separate years in case of such an inequality of the two concepts.

The quotation from Pearson, which we have given above, indicates that he would determine the borderline on the scale by the standard deviation of 'normal' children. In his case he actually used children who were efficient in school, as contrasted with those in special classes. On the other hand, he argues at length that all mental ability, including that of the social inefficients, is distributed in the form of the normal curve (167). Under this assumption it is, therefore, little theoretical change in his position to suppose that the borderline might be described in terms of the standard deviation of a random sample of the population. Defining the borderline in terms of a multiple of the deviation of a random sample at each age thus becomes directly comparable with the other forms of the quantitative definition, supposing that all refer to conditions to be found in a completely random sample. It is in this sense that I shall refer to the method of defining the borderline in terms of a multiple of the deviation.

The percentage method of defining the borderline seems to have been the spontaneous natural working out of the problem in the minds of several investigators. At the same time that I suggested this method in a paper before the American Psychological Association (151) Pintner and Paterson had prepared a paper suggesting a percentage definition of feeble-mindedness (44) and Terman had worked out his use of the quotient so that the borderline in terms of the quotient was given equivalent form in terms of percentage. Nobody, however, seems to have attempted to work out the details of the method as in the present monograph.

As a point of detail it is to be remembered that in translating percentages into terms of the deviation, the size of the group for which the percentages are determined is important if the groups are small, since the same percentage lies above slightly different multiples of the standard deviation with different sized groups. On this point the reader may see a paper by Cajori and the references cited there (86).

B. Common Characteristics of Quantitative Definitions

In distinction from qualitative methods of describing the mentally deficient, all quantitative definitions assume that those of deficient mentality do not represent a different species of mind; but that they are only the extreme representatives of a condition of mental ability which grades up gradually to medium ability. The deficient are not an anomalous group such as we find with some mental diseases. Except for the comparatively rare cases of traumatic or febrile origin, the deficient individual is a healthy individual so far as his nervous system is concerned, even though his capacity for brain activity is below that of those who socially survive. They are not as a group abnormal in the sense of diseased, but only unusual in the sense of being extreme variations from medium ability in a distribution which is uninterrupted in continuity. This distinction has been fully discussed by Goring in his work on The English Convict, which those who are interested in a full mathematical discussion of the significance of mental deficiency are urged to read.

Schmidt urges that the deficients are qualitatively different in being “unable to plan”, and then suggests tests which most markedly bring out this distinction between deficient and normal children (178). As I have said before, however, this seems rather to be a failure to recognize that such an attempt to find tests which “qualitatively” distinguish the two groups is only an effort to pick those tests which best make measurable the differences between individuals at the extreme of mental ability. As such it is a valuable contribution to this problem. If it is intended as an attempt to set up a qualitative distinction in a mathematical or biological sense, between deficient and passable ability, it seems to me wholly to fail. As I take it, a “qualitative” distinction with Schmidt is only a bigger quantitative distinction and is intended only to mean this.

None of those who advocate quantitative definitions would contend, I believe, as some of their opponents seem to think, that such definitions afford a final diagnosis for particular cases. In attempting to place the borderlines on a scale of tests, this is always done with the clear recognition that such borders are only symptomatic of deficiency. The diagnosis of “social inefficiency,” to use Pearson's term, rests upon many facts among which the test result is only one, albeit the most important.

Other characteristics which each of the above quantitative definitions, except that of a constant absolute amount of deficiency, have in common, or might easily have if they were stated in their best forms, include the possibility of adaptation to any developmental scale, the suggestion of borderlines for both the mature and immature, the distinction of a group which might be regarded as presumably deficient from one that was of better but doubtful ability and of this from a still better group which was presumably socially efficient.

Perhaps the most curious and important thing about these definitions is that they are all substantially identical, except in their terminology so long as general mental capacity is found to distribute in the form of the normal probability curve and to extend to absolute zero ability at each age. This can easily be seen by comparing the distribution curves in Fig. 3. The position of the percentage borderline would always represent the same distance from the average in terms of the standard deviation of each age and the same ratio when the life-age of arrest of development had been determined as the largest divisor. Under these conditions, therefore, these main statements of the quantitative definition agree in supposing that the same proportion of the individuals of each life-age would test deficient. Those who advocate any of these quantitative definitions logically commit themselves to assuming that the percentage of deficients at each age is practically constant, unless they suppose the symmetry of distribution varies or does not extend to the same zero point.

If the distributions do not extend to the same zero points of lowest ability on an objective scale (see Fig. 5), the ratio is clearly at a disadvantage compared with either of the other methods, since it assumes that the same percentage of average ability is an equivalent measure. This does not hold when the lowest ability at different ages is not at the same point on the scale of objective units. For example, .7 of an average 100 units above 0 is not equivalent to .7 of an average 150 points above a zero ability of 30 points on the objective scale. The idea of regarding percentages of averages as equivalent is therefore generally avoided in mental measurement. In case the position of the absolute zero points of ability may be different, the distance from the average should be stated in terms of the deviation. In this respect the method of the deviation or the lowest percentage are equally good so long as the form of distribution does not change.

1. With the percentages fixed at the lowest 0.5% as presumably deficient and the next 1.0% doubtful, these borderlines for tested deficiency have the advantage of being more conservative than those at present advocated. On the basis of our empirical knowledge this is an important reason for urging borderlines on the scales at least as low as those suggested herein. Disregarding the extremely high borderlines which have fallen into disuse, we still find that social deficiency is often presumed for those testing above the lowest 1%. With the new Stanford scale, Terman presumes “definite feeble-mindedness” below an Intelligence Quotient of .70, below which he finds that 1% of 1000 unselected children fell. I Q's from .70 to .80 would include his uncertain group, which he describes as “border-line deficiency, sometimes classified as dullness, often as feeble-mindedness” (57, p. 79). His tables show 5% below an I Q of .78. We have no results with a random group of adults by which to judge how many would be below these borders. When the I Q has been applied to scores with other scales a larger percentage has often been found to be excluded. Fernald has shown that Haines' suggestion of a coefficient of .75 with the Point scale would exclude 16% of 100 Cincinnati girls selected at random from among those who left school at 14 years to go to work (16).

Unless the examiner wishes to assume that social inefficiency is more frequent than it has been demonstrated by the practical tests of life, the success of those who have low quotients should make him exceedingly cautious about accepting the various borderlines which have been suggested by those who have not tested their criteria by the percentage method. It is not merely that the borderlines should be lowered, but that they should be lowered under some consistent plan so that we should know as much as is possible about their significance in the prediction of ultimate social inefficiency, and that we should be able to readjust them on the basis of new data or to new scales.

With the Point scale Yerkes and Wood say regarding “the coefficient of intelligence .70, which we accept as the upper limit of intellectual inadequacy or inferiority”: “Our data indicate that grades of intellectual ability measured by the coefficient .70 or less are socially burdensome, ineffective, and usually a menace to racial welfare” (226). With the most reliable part of their data, that for children from 8-13, this coefficient excludes the lowest 8.39%. Moreover, the lowest group for which they suggest a borderline, the dependents, falls at .50 or below and includes 1.05%.

2. A second practical advantage of the percentage borderlines on the scale is that they make no assumption as to the uniformity of the norms for the different ages. Except for the Stanford and the Jaederholm scales, there is little evidence that the age norms exclude equivalent portions of the children at the different life ages.

Goddard's Table I gives the data from which the following percentages of those who pass the norm are calculated, not counting those above 11 years, since the older groups are clearly affected by selection:—5 yrs., 88%; 6 yrs., 79%; 7 yrs., 81%; 8 yrs., 51%; 9 yrs., 60%; 10 yrs., 73%; 11 yrs., 44%. Kuhlmann's figures when using his own revised scale with public school children including the seventh grade, are:—6 yrs., 100%; 7 yrs., 95%; 8 yrs., 90%; 9 yrs., 87%; 10 yrs., 81%; 11 yrs., 80%; 12 yrs., 57%. It is clear that any change in the test norm from age to age must disturb the quotient which is based on these norms, although it would not affect the intelligence coefficient with the Point scale.

3. A third advantage of the percentage method arises from the fact that we cannot presume that the same ratio in terms of the scale units will exclude the same degrees of ability at different ages even when the norms for these ages are properly adjusted. The earlier results with the Stanford revision show a large variation as to the percentage excluded by the same I Q at different ages. For example, an I Q of .76 would have shut out 1% of 117 non-selected 6-year-olds, 2% of 113 9-year-olds and 7% of 98 13-year-olds. The lowest 1% of the last group was below a borderline of .66 (197).

With widely varying norms of the other scales, the I Q borderlines show much greater variation. In a recent review of the evidence, including Descoudres' report (96) on retesting the same children for several years Stern recognizes that an I Q index is not constant after 12 years (187). Doll records decided changes in quotients for the same individual at different ages (99). So far as the 1908 scale is concerned, using Goddard's data, our Table V shows that at five years of age the lowest 1.8% would fall at or below a quotient of .40, at eight years the lowest 1.9% would show a quotient of .62 or less, and at 15 years the lowest 2.8% fall below a quotient of .75. The rough tentative approximation of scale limits which I have suggested for the lowest 1.5% shows that a series of quotients for children from 5 to 15 years of age would be below .75 at every age and below .65 for half of these ages. For the presumably deficient group the quotients would be still lower in order to be as conservative as the borderlines that I have suggested with the Binet scale as at present standardized.

With the coefficient of intelligence and the Point scale, the Yerkes and Wood data show that their borderline of .70 excluded 13% of 196 children 8 and 9 years of age, while it excluded only 5% of each of the next two groups of double ages. With the group of 237 18-year-old Cincinnati working girls it excluded only 3% (226).

The data at present available thus indicate that we should not expect to find the same ratio at different ages excluding similar percentages. If the ratios have a value for comparing individuals of different ages, they seem to fluctuate so decidedly from age to age that they can hardly be trusted for stating the borderlines of deficiency without empirical confirmation for each age.

Pearson found that the children of the older ages in the special classes were more and more deficient, measured in terms of the standard deviation of the normal group. This shift on the average was four months of mental age downward for each year of life during the period 7-14 which he studied. It makes uncertain the definition of the borderline in terms of a constant multiple of the deviation or of a constant quotient, unless this shift is shown to be due to imperfections of the tests which can be corrected, or to changes in the selection of the tested groups at advanced ages.

Pearson's suggestion of -4 S. D. as a borderline with the Jaederholm data gives some very curious results with the group of children in the special schools at Stockholm. Under his interpretation at life-ages 8-11 from 0 to 5.2% of the pupils in these classes would be regarded as deficient, while for life-ages 12-14, 15.2% to 44.4% are beyond -4 S. D. In passing it is to be noted that if one accepted Pearson's suggestion that the borderline should be fixed at -4 S. D., in case the distribution of mental capacity were strictly normal, only four children in 100,000 would be found deficient, according to the probability tables.

With the method of the standard deviation it would be necessary either to show that the deviation was constant in terms of the year units or else to restate the borderline for different ages in terms of the scale units. The irregularity of the norms with the Binet scale could also be allowed for, of course, by stating different quotients for the different ages, but when this readjustment is required for either the ratio or the deviation in terms of the scale units, these methods lose all their advantage of simplicity. Instead of one ratio or one multiple of the years of deviation, we might have a different statement for each life-age. With the percentage method there would be only one statement of the borderline for all ages in terms of percentage, although the scale positions change which shut out the same lowest percentage.

4. All the quotient methods of defining the borderline encounter a serious practical difficulty in fixing the borderline for the mature, so that it will be equivalent to that for the immature. With the Stanford scale in calculating the quotient for adults, no divisor is used over 16 years. Yerkes and Bridges also think that this is about the time that the development of capacity ceases. Kuhlmann and others use 15 as the highest divisor. Wallin objects to either of these ages being used as the age of arrest of mental development (15, p. 67). Both the methods of the standard deviation and percentage have a similar difficulty, in that the borderline for the mature has to be empirically determined on a test scale. In this dilemma, however, the data collected with the random group of 15-year-olds in Minneapolis and published in the present study, places the borderline for the mature on either the 1908 or 1911 Binet scale in a much safer position, so far as empirical data is concerned, than the borderline for the mature for any other scale. This is true whether that borderline be then stated in terms of either the quotient or percentage methods. Translated into terms of the quotient, our percentage borderlines for the mature with these scales, below X for presumably deficient and below XI for the uncertain, would amount to quotients .60 and .66 on the basis of our findings with this random group of children who have presumably about reached adult development. Pearson does not attempt to define any borderline for the adults on the basis of the deviation, since Jaederholm tested only children. Moreover, this is not possible empirically with our group of 15-year-olds, since we tested only the lower extreme of this group.

Unfortunately, the borderlines of the mature for the Stanford and other scales depend upon empirical results obtained not with random groups, but upon a composite of selected groups of adults built up by the investigator on an estimate that this combined group represents a random selection among those with a typical advance in development, an almost superhuman task. Fortunately the empirical determination of this borderline for the mature might be improved later by obtaining data on less selected groups. The clearer significance of the empirical data for the borderline for the mature which I have presented for the Binet 1908 and 1911 scales from a random group of 15-year-olds seems to be an important practical advantage. It provides an empirical basis for judging the implication of test results with adults. It gives adults the benefit of the doubt if they improve after 15 years of age.

5. Compared as to their popular significance, there is no doubt that the lowest 0.5% of the individuals of a particular age has very much more significance to those not familiar with detailed statistical practise than a coefficient or a multiple of the standard deviation. A statement that an adult has only the tested ability of a child of 7 years is certainly much more impressive than his score in other quantitative terms. It will probably always be desirable, therefore, to supplement any other method of scoring by a statement of the individual's test age.

D. Theoretical Advantage of the Percentage Method with Changes in the Form of the Distributions

With our present series of tests, the percentage method will best provide a concept of the equivalence of the borderlines at different ages provided the form of the distribution does not remain uniform. I discussed this question briefly in connection with units of measurement. In considering curves of development, I assembled some of the evidence which makes the assumption of normal distribution or even of a constant skewness at least uncertain. In my opinion the weight of the evidence is against the hypothesis that the distributions retain a constant form during the period of development. If this were clearly demonstrated, both the ratio methods and deviation would fail to express equivalent borderlines for the different ages with the Binet scales. A fixed multiple of the standard deviation or a fixed quotient would exclude different percentages of the population at each age when the skewness varied. By reference to Figures 3 and 5, it can be seen that, if our physical units in which we expressed the measurement were uniform and ability always extended to the same absolute zero point, it is true that .01 of the physical units reached by the best at each age would be the same relative amount of ability of the best at each age, stated in physical units, regardless of the form of the distributions. Such a concept, however, has an unknown biological or social significance so far as I can see, except for a constant form of distribution. The same relative physical score compared with the highest at each age, theoretically might exclude the lowest 40% of one age group, for example, and only 10% of another group provided the distribution varied enough in form. The concept of the same relative amount of ability measured in physical units, so soon as the form of distribution varies from age to age, thus loses significance in terms of the struggle for existence. In that struggle, a vital question is—do the individuals at different ages have to struggle to overcome the same relative number of opponents of better ability at their age? If they do, the individuals might properly be regarded as in equivalent positions in the struggle for social survival, disregarding how far the next better individual is above them on the objective scale. This is the concept accepted by the percentage definition of the borderline as the best available under uncertain forms of distribution.

The recent rapid perfection of objective scales to measure educational products, like ability in handwriting, etc., in equal units running to an absolute zero of ability, suggests that it might be possible ultimately to state the borderline of deficiency in terms of the same relative objective distance between the best and zero ability at each age on a scale of general ability. This ideal could be approached, for example, with the Sylvester form-board test in which the units are seconds required to complete the same task, if we could agree upon a maximum number of seconds without success which should mean no ability, and if this zero should remain the same at each age. It would only be necessary to take, for example, the best position or the median or the upper quartile at each age as the other point of reference. We could then say that a borderline in physical units was always, for example, .01 of the median record at each age above zero. Such a method would provide relatively equal objective borderlines at each age and it would afford a measure which would take into account the ability of the individuals to be competed against instead of merely counting them as the percentage method must. It would be better than a description in units of the standard deviation in that its significance would be more easily understood if the form of distribution varied with age.

To demonstrate its worth, however, this method of defining the borderline in terms of the same proportion of the physical difference between zero and the median at each age, would also have to provide a better prediction of ultimate social failure. It would have to be shown that individuals below the relative objective borderline at maturity were below the same relative objective borderline during immaturity. Moreover, it would have to be shown that this relationship was closer than it would be with percentile records. It is a form of this relative objective measurement which Otis advocates in his “absolute intelligence quotient,” which he proposes as logically the best measure of ability. It consists of the ratio of the score of the individual measured in equal absolute units of intelligence, divided by his age (163).

While a relative objective borderline might under certain circumstances afford a better criterion than the same lowest percentage of individuals, there are two very serious practical difficulties which at present make it impossible. In the first place, with the exception of a few motor tests, there are no test results with children of different ages measured in terms of equal objective units for the same task. Even if the Binet year units are equal, as applied to the same task, there is no accurate means of dividing the year units into smaller physical units on the basis of scores with the tests. This makes the use of the Binet scale impossible and we should be forced back upon such tests as the form-board, the ergograph, etc., for which we should have to agree upon an absolute zero of ability. Moreover, mental tests do not lend themselves to measurement in terms merely of rapidity in doing the same task or in terms of other equal physical units since the quality of the work also has to be evaluated and this is usually done in units assumed arbitrarily to measure equivalent degrees of perfection.

The second practical difficulty which at present makes a relative objective borderline impossible is that we know nothing as to the prediction of social failure and success from relative positions on the objective scale used even with the few isolated tests that might be made available. Until we have data on this question, as well as scales of tests for native ability that are measurable to zero ability in objective terms, the percentage method affords the only available way of stating equivalent borderlines when the form of distribution changes.

If the age of arrest of development shifts either earlier or later with different degrees of capacity, then there seems to be no logical escape from a change in the form of distribution. Stern recognized this when he concluded that idiots reach an arrest of development earlier than those better endowed, so he stated that his quotient would not hold for them. He said:

“The feeble-minded child, it must be remembered, not only has a slower rate of development than the normal child, but also reaches a stage of arrest at an age when the normal child's intelligence is still pushing forward in its development. At this time, then, the cleft between the two will be markedly widened.

“From this consideration it follows that the mental quotient can hold good as an index of feeble-mindedness only during that period when the development of the feeble-minded individual is still in progress. It is for this reason that there is no use in calculating the quotient for idiots, because, in their case the stage of arrested development has been entered upon long before the ages at which they are being subjected to examination” (188).

Perhaps the most interesting characteristic of the percentage method is that it automatically adjusts itself to any form of distribution. In case the distributions of ability turn out to be normal for each age and the arrests of development for different degrees of ability distribute alike, then the borderline fixed by the percentage method becomes identical with the corresponding borderlines by the quotient, deviation, or relative objective distance. It can be directly translated into a quotient or a multiple of the standard deviation. This fact affords a good check upon the empirical borderlines fixed by the percentage method for different ages. If the distribution is normal, the lowest 1.5% and 0.5% would be identical with -2.17 S. D. and -2.575 S. D. in samples of 10,000 cases. We may check these percentage borderlines by Goddard's results for ages 5-11 tested with the 1908 Binet scale. I have given the standard deviation for the ages 5-11 with this data in Chap. XIII a, 2. Applying the criterion of 2.575 S. D. to these deviations, we find that to be in the lowest 0.5%, if the distribution were normal, would be about a year less of deficiency than we have suggested, while Pearson's borderline of -4 S. D. would be close to that we suggest. The empirical data thus suggest that the assumption of a normal distribution is faulty at the borderline or else Goddard's data is incorrect for fixing the limits on the scales. I have already given the evidence for supposing that the distribution is skewed during the years of growth.

When approximately random samples are not available, a multiple of the deviation of an efficient group such as -4 S. D. at the particular age seems to afford a practical way of discovering a tentative borderline until a random sample can be measured. The serious theoretical objections to such a procedure as a regular method is that the efficient group would be selected by the subjective standard of somebody's opinion and that the form of distribution of ability may vary from age to age.

Recalling the practical advantages of the percentage method which we enumerated in the preceding section, we can now better understand the value of a method that is not disturbed by the form of distribution of mental capacity which may ultimately be found to prevail at different ages. It is safer at present to assume that the distributions do change enough in form at the lower end seriously to affect the borderlines of deficiency as defined by other methods. If, however, the form of distribution remains uniform, it would first be necessary for those advocating the use of any of the other quantitative definitions to show that the units of their scales are equal under some reasonable hypothesis. A ratio or a deviation statable only in scale units which are not demonstrably equal is a hazard, with the chances badly weighted against its reliability. So far as both the Binet and the Point scales are concerned we have found that the units are not equal. A quotient or coefficient arrived at by assuming their equality is sure to mean seriously erroneous fluctuations in the borderlines.

Referring to the percentage method, Yerkes and Wood say: “Frequency of occurrence is unquestionably a useful datum, which should be presented, if not instead of, then in addition to, certain other statistical indices which possess greater scientific value” (226). These other indices require both equal scale units and uniform distributions from age to age. The ratio and deviation methods fail at present in both of these particulars, so that it seems necessary to depend upon the percentage definition of tested deficiency, incomplete as that may be.

This leaves us in the unfortunate situation that the borderline positions on the scale will have to be stated separately for each age and will have to be found empirically. Moreover, we shall need to determine more accurately in what lowest percentage an individual must test in order reasonably to predict that he will require social care for the good of himself and society.

As soon as anybody can discover a means of defining the borderline, which is equally accurate and significant, and which, in addition to counting the proportion of better individuals to be met in the competition of life, will also evaluate the distance they are above the borderline, we all shall be eager to accept this better criterion of deficiency. A form which it might take is that of relative objective distance between zero and median ability. If measurable in equal objective units, this would be independent of the form of distribution and would improve the quantitative description of equivalent deficiency, provided that it also forecasted future social failure as well as the percentage method.

What form of stating the borderline of tested deficiency may ultimately meet with approval, a verbal definition of feeble-mindedness will never remain an ideal scientific statement until it finds expression in quantitative terms.

                                                                                                                                                                                                                                                                                                           

Clyx.com


Top of Page
Top of Page