My 7th graders have a question on their exam that asks them to put eight numbers (integers and fractions) in order of their distance from 0 on the number line, starting with the smallest distance.

These types of questions are tricky for me to grade, and because there are *eight* numbers in this sequence, the task of grading it **fairly** suddenly becomes thorny and irksome.

Let’s change the question to this:

Put these numbers in order from least to greatest: 5, 7, 2, 3, 1, 4, 6, 8

The correct order is 1, 2, 3, 4, 5, 6, 7, 8 (you’re welcome) — for a possible score of 8 points. How many points would this response earn?

## 1, 2, 4, 5, 3, 7, 8, 6

So, only the first two numbers —* 1 *and 2 — are placed correctly. Is the score just 2 out of 8 then? But I want to give some credit to *4* and *5* being next to each other, likewise with *7* and *8*.

I’ve tried to come up with some metrics to score this, and then I would want to apply the same metrics to different sequences to see if any would break my invisible “fairness” barometer. For example, whatever score I came up with for the above sequence, I think the below sequence should get a lower score because the *7* and *8* are farther upstream than they should be.

Anyway, I have some ideas. The above two sets are Sets *A* and *B* below.

I wonder if there’s a way to score an ordered list that half of us math teachers can agree upon. I’d like for my students to think about this too. Meanwhile, here is a spreadsheet with my scores if you’d like to take a look and play along. Just enter your name in row 1 (and link your name to your Twitter, if you want) and the scores you’d assign to these sets.

[02/01/18: @MrHonner had a similar question over 4 years ago: Order These Things From Least to Greatest.)

## 21 Comments

What about the number of inversions? For each (i,j) with i<j let f(i,j) be 0 if i and j are in order and 1 if they are out of order. Let I(p) be the sum of f over all such ordered pairs of list elements given a permutation p. The highest possible I is 28. You could assign the score

8-8(I(p)/28).

This makes sense to me because it rewards the ordering for every correct binary comparison the student makes.

Another more involved approach that I like a bit more could be based on the probability of getting I(p) “at least this bad”. So, let T(p)=Prob( I(q) >= I(p) | arbitrary permutation q). Then you could score a permutation P as 8*T(p). I think this would produce the same order as the scoring in my original post but it would spread out the scores differently.

Thanks so much for helping me think through this, Jacob. I really love the probability angle that never entered my mind before, and “at least this bad.” The scores that you have on the spreadsheet, is that based on the first comment or on this comment?

They’re based on the first comment. I haven’t had time to sit down and calculate the probabilities yet.

Does it come down to the learning target and their evidence of understanding the learning target? What evidence do they show in their order? If you looked at only the integers would they be in order? The fractions? Common fractions v. mixed numbers?

Is it possible that two of the sets above may be very similar in relative positions but different in evidence of understanding?

Hey Paul. Yeah, this scoring thing took on a direction of purely my curiosity on scoring an ordered list. In context, many students (thank God) were able to order the 8 numbers correctly. For the few who didn’t get it, I always learn how they think about it by talking with them directly. Thanks!

I feel like you’re quizzing us.

I like holistic grading in general, but especially for this kind of situation. If their written responses show understanding, and most of the ordering, A. If the ordering or explanation shows a gap in understanding or misconception, B. If they get the gist but lack some essential understanding C. Less than that, reassess.

Haha! Hi John! Yes, please see my response to Paul above. This type of question, and specific to the question on their test, is about teaching and learning, and not about what score this would get.

A statistical approach:

Take the total of the differences between the actual order and the correct order, disregarding the signs –

1 2 3 4 5 6 7 8

2 4 1 3 5 6 8 7

differences are 1 1 2 0 0 1 1 with total 6

divide by 8, the number of items

result is 6/8 or 0.75

A more sophisticated approach:

Take the total of the squared differences, divide by 8 as before, and find the square root.

the resulting square root is 8/8 = 1

(bad example, do your own!!!)

Method 1 is the mean absolute deviation and method 2 is the standard deviation or root mean square deviation, and in both cases the arithmetic is simple.

And Hurricane Maria killed off the power and the internet for 4 and a half months !!!!

Hi Howard. Thank you for playing! :) Did you mean the difference for the 2nd number to be 2 (instead of 1)? Someone asked on the spreadsheet about the case where the order was reversed. And with your first scoring method, I’d get a score of 4, which I kinda like a lot. (With my metrics, it would score negative 4! I suck. Sigh.)

Since the example is only assessing ordering positive integers it would be worth 1 point. Either right or wrong. The student understands (or doesn’t understand) the distance from zero of positive integers.

When you throw in fractions, decimals, and negative integers, more is assessed. Is there an understanding of where these numbers are in relation to each other (since they are in different forms)? This question would definitely consider more time analyzing each student’s answer.

Since we have to assign points, my “fair” way would be similar to John Goldens:

*7-8 correct = 3 points

*5-6 correct = 2 points

*3-4 correct = 1 point

*0-2 correct = 0 points

Thank you, Shannon. I became more curious just about the scoring of any ordered list, not even tied to numbers because I completely agree with “This question would definitely consider more time analyzing each student’s answer.” More like the Queen of Fancypants wants her scarfs placed in a certain order in her closet. :)

Hi Fawn,

Does it have to be points that come up with an exact score? Why not a leveled system built on criteria, that may also involve conversations and observations as assessment along with the product.

Level 1- limited understanding with almost all of them in incorrect order.

Level 2- Developing understanding with a few errors in ordering.

Level 3- Consistently accurate or with possibly one error in the order.

Level 4- Order is accurate with no errors.

(I made this up on the spot and am not exactly happy with it but you get my idea)

Now if this was the one for integers and fractions you could add more into descriptions about how they showed their thinking/work. What models they used to help them order etc. If this assessment was being done over a unit then the levels can be averaged into percentage mark if that is what your system uses. We use a criteria based marking system in Ontario so we would have exemplars of each level for each question on the provincial EQAO tests. Maybe that is too much for a classroom exam but possibly less questions could be used and present what a level 1 to 4 criteria is for each question. I know it isn’t exactly what you wanted us to do by showing how we would score the points but just wanted to add my thoughts! Maybe I am way out there! LOL

Hi Mark! I appreciate your thoughts! Your “levels” above are pretty much what I mentally did with the scoring. Luckily, only a few students missed this question. Please see my replies to the other comments. I should have made this point clear in my post. Thanks!

Now I’m wondering about an activity where you give the learners different lists and ask them to put them in order of the lister’s understanding. Maybe I teach too many preservice teachers.

Ooh, kinda like creating a bunch of “favorite no” ordered lists!

When Mr. Honner posted a similar problem, my suggestion for a scoring function was to calculate the “Levenshtein-Damerau edit distance” between the “correct” result and another result.

The Levenshtein-Damerau edit distance calculates the minimum number of operations needed to transform one sequence into the other, where an operation is an insertion, deletion, replacement, or swap of two adjoining entries. (The standard Levenshtein edit distance does not handle swaps, but rather handles them as a deletion plus an insertion, making a swap twice as expensive, and this problem is largely about swaps.)

This algorithm is commonly used in spell-checkers to suggest ‘likely’ substitutions. This algorithm can grade how ‘wrong’ a word was spelled, by quantifying the number of edits required to make it correct, and thus suggest more “likely” corrections first.

In this problem, we probably want to slightly modify the Levenshtein-Damerau process to not allow outright replacements of numbers.

This turns out to have nice, intuitive results. All possible scorings are obtainable.

To give reasonable results, I had to eliminate the “replacement” operation so I only allow insertion, deletion, and swaps of 2 adjacent letters. Here are the edit distances given by that procedure, assuming that ABCD is the correct answer:

ABCD 0

ABDC 1

ACBD 1

ACDB 2

ADBC 2

ADCB 3

BACD 1

BADC 2

BCAD 2

BCDA 2

BDAC 3

BDCA 3

CABD 2

CADB 3

CBAD 3

CBDA 4

CDAB 4

CDBA 4

DABC 2

DACB 3

DBAC 4

DBCA 4

DCAB 4

DCBA 5

These are interesting and have good properties: they allow all of the scores from 0 (all correct) to 5, with the only answer scoring 5 to be the total reversal DCBA. If only one pair are swapped (like BACD), the score is 1. Makes sense.

My programming language Frink has functions to efficiently calculate this edit distance, and your favorite programming language may too.

Here’s a sample Frink program that I used to calculate the above results: gradingProblem.frink which can be used to generate the above results, and can be easily modified for longer lists.

I can’t help but wonder what are the actual numbers being sorted and ranked? The context here seems to matter a great deal.

I guess I wonder this: What math understandings / actions are required to successfully rank all of them? Document them, and assess whether they were done. The ordering alone does not necessarily show this.

Examples:

* translating “set 1” into correct-enough decimal equivalents (take fractions)

* translating “set 2” into correct-enough decimal equivalents (take irrationals)

* translating “set 3” into correct-enough decimal equivalents (binary, hex)

* solving problems with a numerical answer, expressed in decimal form.

* making mathematically valid pairwise comparisons without knowing how to translate into decimals.

OR…. forgot all that… With 8 numbers, there are 56/2 = 28 correct pairwise comparisons – how many of these did they get right, maybe? But assessing by this metric, and getting an 8th grader to understand it? tough.

Comparing eight? Wow – seems like a lot. I wonder if ranking three or four might help isolate the math tasks.

Check out this solution:

Programmatic Partial-Credit Put-In-Order Grading

https://populi.co/blog/2014/09/programmatic-partial-credit-put-in-order-grading/

Oh my, thank you so much for this, Michael. The chain length credit makes sense, and I appreciate seeing the wrinkles and how they can be ironed out. Thank you!

## 3 Trackbacks

[…] you this because early this morning, maybe a tad too early, I read Fawn Nguyen ’s recent post “Scoring an Ordered List” ; at first I took what she was saying seriously because to me, everything she writes is solid gold. […]

[…] Nguyen ( on Twitter ) has a post on her website about a grading issue that leads to some intersting […]

[…] Nguyen, a middle school math teacher, (@fawnpnguyen on Twitter) has a post on her website about a grading issue that leads to some interesting […]