I'm trying to identify useful performance metrics for my timing evals (which other WUS members are now contributing to), but I'm finding that benchmarks of movement performance tend to vary quite a bit (COSC, METAS, GS, etc.). My background is in Statistics (quant. methods professor), so formulas for accuracy (what we call validity or bias) and precision (lack of variance or dispersion) are a dime a dozen. But I want to make sure that the ratings/scores I create are actually useful to our fellow WIS when comparing various calibers, and I can't think of anyone who might know this material better than you folks. So I need your help. My goal is to create a scoring/rating system for the WUS community that is:
1) Relevant to the watch/movement's timekeeping ability
2) Reasonably thorough (given my equipment/facilities limitations)
4) Reasonably weighted (good balance of positional precision, isochronism, etc.)
In my timing study, I take i=5 measurements (one every 12 seconds, to cover the entire sweep of the seconds hand), in each of j=6 positions, at t=2 time periods (fully wound and after 24 hours rest), for a total of N = i*j*t = 60 measurements of each watch, 30 when fully wound and 30 after 24 hours discharge.
Here are some of the summary measures I've used or considered using to assess each movement's performance. For now let's just focus on Day1 (fully wound):
ACCURACY at t=1
1) Average daily rate (hereafter ADR) across all 30 measurements
2) Weighted ADRs (that assign higher weight to common positions like DD and 9-up)
PRECISION (between positions, within-position, and overall) at t=1
1a) Range of ADR across positions, i.e. thefastest minus slowest positional ADR. Is "delta" the correct terminology for this measure, or does it refer to something else?
1b) Horizontal ADR - vertical ADR: mean(DU, DD) - mean(12, 3, 6, and 9)
1c) Max. Deviation of any position-specific ADR from the overall ADR (e.g. DD average - overall ADR)
1d) Avg. Deviation of all six positional rates from the overall ADR
2a) Max. deviation of any position-specific rate (e.g. -10) from its respective position-specific average (e.g. DD average)
2b) Avg. deviation of any position-specific rate from its respective position-specific average
3a) Avg. deviation from overall ADR across all 30 rate measurements
3b) Max. deviation between the overall ADR and the different weighted averages
Here is where I'm most uncertain. So far my only ISO measure has been the (absolute value of the) change in overall ADR between day1 (baseline) and day2 (+24 hrs). But from what I've seen, it's clear that movements do more than just speed up (or slow down) as the mainspring unwinds. Their positional stability also changes--mostly for the worse. So I see two major options for quantifying isochronism, and would love your feedback on the pros/cons of each:
A) Repeat all precision and accuracy measures shown above at t=2, then take the difference b/w the day1 and day2 measurements
B) Compute all measures shown above just once, but use the entire 60 measurement sample, which pools the day1 and day2 measurements
Option A provides direct measurements of isochronism, whereas option B) folds the iso component into the overall assessment of movement accuracy and precision. I can see the value of both, but am unclear about which approach makes the most sense.
Any and all feedback you can provide is appreciated. Which measures would you add, drop, or revise? Here is a recent capture of some of my timing results for reference (the sample has grown quite a bit since I made this table, but you get the idea).