Three Systems for Grading

Grading on the Highest Score

This system uses the highest score as the standard for everyone. The system has bizarre properties and should not be used. If any instructor tells you they plan to use this method, please direct them to read this Website.

To use this system, divide each score by the maximum and then multiply by 100, as follows:

T(i) = 100*X(i)/X_MAX

where X(i) is the score on the test for student i and T(i) is the transformed score for the same person.

Next, use the following standards:

If T(i) >= 90, assign A
If T(i) >= 80, assign B
IF T(i) >= 70, assign C
IF T(i) >= 60, assign D
IF T(i) < 60, assign F

This system easily gives 50% Fs. For example, suppose that a test has a median of 50 and a standard deviation of 15. Suppose the highest score is 90, and the minimum is 10. Because the median is 50, that means that 50% of the students scored below 50. The transformed value of 50 is T = 100*50/90, which is 55.56, which means that more than half of the class will receive the grade of F.

This system says that every grade depends on just one person, so if that person should drop the class, or be murdered by the other students, then all other grades would increase. And if the best student should transfer into a section, all other grades will go down. No wonder that students often do not want to admit that they know the right answer-- if the instructor uses this method, that student is hurting everyone else by doing well.

There is an incentive to harass or bully the best student, because that would improve all other grades. In sports, would a team want to get rid of its best player? Would soldiers want to get rid of the best marksman from their squad? In many real-life situations, we want the best person to do well, but this system encourages students to do everything possible to reduce the score of the best student.

Grades under this system are NOT invariant under simple changes in the exam. For example, the grade depends on whether the instructor uses two alternative forced choice or five alternative forced choice problems. To see this, let us assume the simple "know or guess" model. In this model, if a person knows the answer, they will get it correct, and if they do NOT know the answer, they guess and have a probability of 1/n of guessing correctly, where n is the number of alternatives.

For example, if the top student knows 90% of the material on a 100 item true-false test, that student would get 90 (what she knows) + half of the 10 remaining items are guessed correctly, giving the top score as 95.

Suppose the median student knows 50% of the material, and if the test has two choices (true/false), then the median correct would be 50 (what they know) + 25 (what was guessed correctly from the other 50) = 75. So, for this true/false version of the test, the transformed value for the median student is 100*75/95 = 78.9. This corresponds to a grade of C. More than half the students will get C or above.

Now suppose the instructor decides to use a five alternative exam instead of a true/false test. In this case, the top student, who knows 90 percent of the material will get 90% right by knowing plus 2% for guessing (1/5 times 10). In other words, this student will score 92, instead of 95, as she would on the true/false version of the test. However, consider now the median student who knows 50% of the material. That student now gets 50% by knowing the answers, but gets only 1/5 of the other 50 items (i.e., 10). Thus, this student now scores 50 + 10 = 60. The transformed score for the median student is now 100*60/92 = 65.2, a grade of D. More than half the class will score D or below.

Now suppose a third instructor uses a fill in the blank format for the exam. In this case, the student cannot guess the answer, and so the high score is now 90, the median score is 50, and the transformed score is 100*50/90 = 55.55, a grade of F. So more than half the students fail, even though we assumed that people knew the same amount in each case.

The grade distribution should be independent of what type of test the instructor gives. This system does not satisfy this basic assumption, so the grades are arbitrary in this system.

Unfortunately, many students think (erroneously) that this system is grading on the curve. It is NOT. There is NO CURVE. The transformation, T, is a linear transformation. So, where is the so-called curve?

Some people think this gives higher grades. Not necessarily; as shown above, it easily gives half or more F (Failures) to a well-designed exam with normal distributions of scores. So this system is NOT necessarily lenient, either. This system is only used by people who do NOT know what they are doing and have not thought through its consequences. Unfortunately, there are teachers who do not know enough simple algebra or have not thought out the consequences of using this system.

Grading on the Curve

Grading on the curve refers to grading each student based on where that student stands relative to all others in the same section, and giving the same distribution of grades to all sections of the same class. The term "curve" refers to the bell curve, usually to the normal distribution curve.

For example, an instructor might decide to give 2.5% A+, 13.5% A, 34% B, 34% C, 13.5% D, and 2.5% F. If the test is normally distributed, then the instructor can compare each student to the entire class by calculating the standard score (z-score) for each student,

z_X(i) = (X(i) - M)/s_X

where M = the arithmetic mean of X, and s_X is the standard deviation. Then use the rule,

If z_X(i) >= 2, assign A+
If z_X(i) >= 1, assign A
If z_X(i) >= 0, assign B
If z_X(i) >= -1, assign C
If z_X(i) >= -2, assign D
If z_X(i) < -2, assign F

Other cutoffs could be chosen for stricter or more lenient standards.

This system has the advantage (compared to grading on the highest score) that EVERY score is used to calculate the mean and standard deviation of the exam. It compares each person to the AVERAGE of all of the other students in the same section.

Assuming the exam is normally distributed, there will be no change in the percentage of each grade if the instructor uses true/false, multiple choice, or fill-in type items.

Unlike grading on the highest score, this method cannot give half or more failing grades; the percentage of each grade is fixed.

This method has the drawback that students are in competition with each other, and it is in their interest to prevent others from doing well in the class. So if someone asks for help, there is an incentive to withhold help and even try to keep others from learning the material of the course. By hurting others, it raises the grade of a given individual. But in real-life, we often want our team to have the best quarterback, and the quarterback wants the linemen to do as well as possible. The soldier wants his squad members to be the best marksmen possible. So, by creating competition, this method may not be optimal training for a society in which people cooperate.

Another possible drawback of this method is that students may realize that if everyone does poorly, there will still be the same number of each grade as if they all study very hard. So a lazy student has an incentive to prevent material from being presented. By wasting class time or preventing the instructor from presenting material, it would be possible to get the same grade with less work. But poorly educated students end up as poorly paid workers, so this incentive system is not in the best interest of the students in the long run in a society that expects its workers to be educated.

Grading on the Historical Curve

In this method, the instructor calculates the mean and standard deviation for all previous classes and uses the results from PREVIOUS classes to set the standards for the current class. This method has the advantages of grading on the curve, but eliminates the incentives to harm others in the class or to prevent the instructor from presenting material.

Because students are compared to previous classes, it is possible for the entire class to get grades of A, if they all decide to work hard and learn the material to the appropriate level compared to previous semesters. People are not in competition with each other in the class, but they are in competition with the past. It is then also possible for an entire class to screw up and end up with lower grades; therefore, there is no incentive to prevent the instructor from covering the material, because the material will be on the exam whether it is covered in class or not.

Grades in this system tell how a student compares with a much larger population of people than in the simple grading on the curve method.

The first-time instructor can not use this method, unless the items have been pre-tested and selected with known distributions. However, an experienced instructor can keep item analysis information on a pool of items and select items to generate a known distribution based on previous classes. For the novice instructor, the standards will change over the first few years, and will tend to drift upwards over time as the instructor improves his or her teaching performance.

Return to Psych 101 page