# Scoring an Ordered List

My 7th graders have a question on their exam that asks them to put eight numbers (integers and fractions) in order of their distance from 0 on the number line, starting with the smallest distance.

These types of questions are tricky for me to grade, and because there are eight numbers in this sequence, the task of grading it fairly suddenly becomes thorny and irksome.

Let’s change the question to this:

Put these numbers in order from least to greatest: 5, 7, 2, 3, 1, 4, 6, 8

The correct order is 1, 2, 3, 4, 5, 6, 7, 8 (you’re welcome) — for a possible score of 8 points. How many points would this response earn?

## 1,  2,  4,  5,  3,  7,  8,  6

So, only the first two numbers — 1 and 2 — are placed correctly. Is the score just 2 out of 8 then? But I want to give some credit to 4 and 5 being next to each other, likewise with 7 and 8. I’ve tried to come up with some metrics to score this, and then I would want to apply the same metrics to different sequences to see if any would break my invisible “fairness” barometer. For example, whatever score I came up with for the above sequence, I think the below sequence should get a lower score because the 7 and 8 are farther upstream than they should be. Anyway, I have some ideas. The above two sets are Sets A and B below. I wonder if there’s a way to score an ordered list that half of us math teachers can agree upon. I’d like for my students to think about this too. Meanwhile, here is a spreadsheet with my scores if you’d like to take a look and play along. Just enter your name in row 1 (and link your name to your Twitter, if you want) and the scores you’d assign to these sets.

[02/01/18: @MrHonner had a similar question over 4 years ago: Order These Things From Least to Greatest.)

This entry was posted in Teaching and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

1. Jacob Spear
Posted January 31, 2018 at 8:00 pm | Permalink

What about the number of inversions? For each (i,j) with i<j let f(i,j) be 0 if i and j are in order and 1 if they are out of order. Let I(p) be the sum of f over all such ordered pairs of list elements given a permutation p. The highest possible I is 28. You could assign the score
8-8(I(p)/28).

This makes sense to me because it rewards the ordering for every correct binary comparison the student makes.

• Jacob Spear
Posted January 31, 2018 at 8:19 pm | Permalink

Another more involved approach that I like a bit more could be based on the probability of getting I(p) “at least this bad”. So, let T(p)=Prob( I(q) >= I(p) | arbitrary permutation q). Then you could score a permutation P as 8*T(p). I think this would produce the same order as the scoring in my original post but it would spread out the scores differently.

• Fawn
Posted February 1, 2018 at 10:00 am | Permalink

Thanks so much for helping me think through this, Jacob. I really love the probability angle that never entered my mind before, and “at least this bad.” The scores that you have on the spreadsheet, is that based on the first comment or on this comment?

• Jacob Spear
Posted February 1, 2018 at 4:22 pm | Permalink

They’re based on the first comment. I haven’t had time to sit down and calculate the probabilities yet.

2. Paul Jorgen
Posted February 1, 2018 at 5:55 am | Permalink

Does it come down to the learning target and their evidence of understanding the learning target? What evidence do they show in their order? If you looked at only the integers would they be in order? The fractions? Common fractions v. mixed numbers?

Is it possible that two of the sets above may be very similar in relative positions but different in evidence of understanding?

• Fawn
Posted February 1, 2018 at 10:09 am | Permalink

Hey Paul. Yeah, this scoring thing took on a direction of purely my curiosity on scoring an ordered list. In context, many students (thank God) were able to order the 8 numbers correctly. For the few who didn’t get it, I always learn how they think about it by talking with them directly. Thanks!

3. John Golden
Posted February 1, 2018 at 6:53 am | Permalink

I feel like you’re quizzing us.

I like holistic grading in general, but especially for this kind of situation. If their written responses show understanding, and most of the ordering, A. If the ordering or explanation shows a gap in understanding or misconception, B. If they get the gist but lack some essential understanding C. Less than that, reassess.

• Fawn
Posted February 1, 2018 at 10:31 am | Permalink

Haha! Hi John! Yes, please see my response to Paul above. This type of question, and specific to the question on their test, is about teaching and learning, and not about what score this would get.

4. Howard Phillips
Posted February 1, 2018 at 7:51 am | Permalink

A statistical approach:
Take the total of the differences between the actual order and the correct order, disregarding the signs –
1 2 3 4 5 6 7 8
2 4 1 3 5 6 8 7
differences are 1 1 2 0 0 1 1 with total 6
divide by 8, the number of items
result is 6/8 or 0.75

A more sophisticated approach:
Take the total of the squared differences, divide by 8 as before, and find the square root.
the resulting square root is 8/8 = 1

Method 1 is the mean absolute deviation and method 2 is the standard deviation or root mean square deviation, and in both cases the arithmetic is simple.

• Howard Phillips
Posted February 1, 2018 at 7:56 am | Permalink

And Hurricane Maria killed off the power and the internet for 4 and a half months !!!!

• Fawn
Posted February 1, 2018 at 11:29 am | Permalink

Hi Howard. Thank you for playing! :) Did you mean the difference for the 2nd number to be 2 (instead of 1)? Someone asked on the spreadsheet about the case where the order was reversed. And with your first scoring method, I’d get a score of 4, which I kinda like a lot. (With my metrics, it would score negative 4! I suck. Sigh.)

5. Shannon Parvankin
Posted February 1, 2018 at 8:25 am | Permalink

Since the example is only assessing ordering positive integers it would be worth 1 point. Either right or wrong. The student understands (or doesn’t understand) the distance from zero of positive integers.

When you throw in fractions, decimals, and negative integers, more is assessed. Is there an understanding of where these numbers are in relation to each other (since they are in different forms)? This question would definitely consider more time analyzing each student’s answer.

Since we have to assign points, my “fair” way would be similar to John Goldens:
*7-8 correct = 3 points
*5-6 correct = 2 points
*3-4 correct = 1 point
*0-2 correct = 0 points

• Fawn
Posted February 1, 2018 at 11:41 am | Permalink

Thank you, Shannon. I became more curious just about the scoring of any ordered list, not even tied to numbers because I completely agree with “This question would definitely consider more time analyzing each student’s answer.” More like the Queen of Fancypants wants her scarfs placed in a certain order in her closet. :)

6. Mark Stamp
Posted February 1, 2018 at 8:50 am | Permalink

Hi Fawn,
Does it have to be points that come up with an exact score? Why not a leveled system built on criteria, that may also involve conversations and observations as assessment along with the product.
Level 1- limited understanding with almost all of them in incorrect order.
Level 2- Developing understanding with a few errors in ordering.
Level 3- Consistently accurate or with possibly one error in the order.
Level 4- Order is accurate with no errors.
(I made this up on the spot and am not exactly happy with it but you get my idea)
Now if this was the one for integers and fractions you could add more into descriptions about how they showed their thinking/work. What models they used to help them order etc. If this assessment was being done over a unit then the levels can be averaged into percentage mark if that is what your system uses. We use a criteria based marking system in Ontario so we would have exemplars of each level for each question on the provincial EQAO tests. Maybe that is too much for a classroom exam but possibly less questions could be used and present what a level 1 to 4 criteria is for each question. I know it isn’t exactly what you wanted us to do by showing how we would score the points but just wanted to add my thoughts! Maybe I am way out there! LOL

• Fawn
Posted February 1, 2018 at 11:45 am | Permalink

Hi Mark! I appreciate your thoughts! Your “levels” above are pretty much what I mentally did with the scoring. Luckily, only a few students missed this question. Please see my replies to the other comments. I should have made this point clear in my post. Thanks!

7. John Golden
Posted February 1, 2018 at 9:10 am | Permalink

Now I’m wondering about an activity where you give the learners different lists and ask them to put them in order of the lister’s understanding. Maybe I teach too many preservice teachers.

• Fawn
Posted February 1, 2018 at 11:46 am | Permalink

Ooh, kinda like creating a bunch of “favorite no” ordered lists!

8. Alan Eliasen
Posted February 1, 2018 at 3:24 pm | Permalink

When Mr. Honner posted a similar problem, my suggestion for a scoring function was to calculate the “Levenshtein-Damerau edit distance” between the “correct” result and another result.

The Levenshtein-Damerau edit distance calculates the minimum number of operations needed to transform one sequence into the other, where an operation is an insertion, deletion, replacement, or swap of two adjoining entries. (The standard Levenshtein edit distance does not handle swaps, but rather handles them as a deletion plus an insertion, making a swap twice as expensive, and this problem is largely about swaps.)

This algorithm is commonly used in spell-checkers to suggest ‘likely’ substitutions. This algorithm can grade how ‘wrong’ a word was spelled, by quantifying the number of edits required to make it correct, and thus suggest more “likely” corrections first.

In this problem, we probably want to slightly modify the Levenshtein-Damerau process to not allow outright replacements of numbers.

This turns out to have nice, intuitive results. All possible scorings are obtainable.

To give reasonable results, I had to eliminate the “replacement” operation so I only allow insertion, deletion, and swaps of 2 adjacent letters. Here are the edit distances given by that procedure, assuming that ABCD is the correct answer:

ABCD 0
ABDC 1
ACBD 1
ACDB 2
BACD 1
BCDA 2
BDAC 3
BDCA 3
CABD 2
CBDA 4
CDAB 4
CDBA 4
DABC 2
DACB 3
DBAC 4
DBCA 4
DCAB 4
DCBA 5

These are interesting and have good properties: they allow all of the scores from 0 (all correct) to 5, with the only answer scoring 5 to be the total reversal DCBA. If only one pair are swapped (like BACD), the score is 1. Makes sense.

My programming language Frink has functions to efficiently calculate this edit distance, and your favorite programming language may too.

Here’s a sample Frink program that I used to calculate the above results: gradingProblem.frink which can be used to generate the above results, and can be easily modified for longer lists.

9. William Thill
Posted February 1, 2018 at 9:55 pm | Permalink

I can’t help but wonder what are the actual numbers being sorted and ranked? The context here seems to matter a great deal.

I guess I wonder this: What math understandings / actions are required to successfully rank all of them? Document them, and assess whether they were done. The ordering alone does not necessarily show this.

Examples:
* translating “set 1” into correct-enough decimal equivalents (take fractions)
* translating “set 2” into correct-enough decimal equivalents (take irrationals)
* translating “set 3” into correct-enough decimal equivalents (binary, hex)
* solving problems with a numerical answer, expressed in decimal form.
* making mathematically valid pairwise comparisons without knowing how to translate into decimals.

OR…. forgot all that… With 8 numbers, there are 56/2 = 28 correct pairwise comparisons – how many of these did they get right, maybe? But assessing by this metric, and getting an 8th grader to understand it? tough.

Comparing eight? Wow – seems like a lot. I wonder if ranking three or four might help isolate the math tasks.

10. Michael McGinnis
Posted August 21, 2018 at 11:17 am | Permalink

Check out this solution:
Programmatic Partial-Credit Put-In-Order Grading

• Fawn
Posted January 4, 2019 at 12:04 am | Permalink

Oh my, thank you so much for this, Michael. The chain length credit makes sense, and I appreciate seeing the wrinkles and how they can be ironed out. Thank you!

11. Tarrell
Posted July 26, 2019 at 7:22 am | Permalink

What do A B C D F stand for in grading? Update Cancel. … and D and F were the bad grades , is that true that is it just D and F are the bad grades , …

12. Rebecca
Posted August 16, 2019 at 5:09 pm | Permalink

Wow that was odd. I just wrote an really long comment but after I clicked submit
my comment didn’t appear. Grrrr… well I’m
not writing all that over again. Anyhow, just wanted to say
excellent blog!