Skip to content
VCS-Test

← Notes

Adaptive staircase methods: why one number isn't enough

May 21, 202614 min readmethodologypsychophysicsstaircase

A naive way to measure contrast sensitivity is to pick five contrast levels, show each to the observer, and ask which they can see. The procedure is brisk. It is also wrong in ways that are hard to spot. The answer depends on which five contrasts you chose, on how many trials you ran per level, on the lighting in the room, and on whether the observer's attention happened to wander during a critical trial. Most trials are wasted on contrasts that are trivially visible or clearly invisible, where the response carries almost no information about where threshold sits. The trials near threshold — the ones that matter — are too few to pin anything down.

Modern psychophysics gets around this by making the test adaptive: the next contrast to present depends on what the observer just did. Easy responses are followed by harder trials; errors by easier ones. Done correctly, the procedure converges on threshold and most of the trial budget is spent where measurement is informative. The workhorse adaptive procedure — what we run inside this app — is the 2-down-1-up transformed up-down staircase of Levitt (1971). This post explains what threshold means, why 2-down-1-up converges where it does, what the procedure looks like trial by trial, and what tradeoffs we accepted when we chose it.

What "threshold" actually means

Contrast threshold is not a hard line. There is no single contrast above which the observer always succeeds and below which they always fail. What exists instead is a psychometric function: a smooth curve mapping stimulus contrast to the observer's probability of a correct response.

In a two-alternative forced-choice (2AFC) task — "is the patch tilted left or right?" — the psychometric function starts at chance (0.5) when contrast is far below threshold, because the observer is guessing, and rises monotonically toward 1.0 at high contrast. Between these two plateaus is a steep middle region. The convention in psychophysics is to define threshold as the contrast at some specified point on this rising curve. Different procedures target different points: 50% (impossible in 2AFC), 70.7%, 79.4%, 84.1%. The target is chosen to land in the steep middle, where the curve changes quickly with contrast, because that is where each trial carries the most information about where the curve sits horizontally.

For 2-down-1-up the target is 70.7% correct, derived below. When we report a log contrast sensitivity number, we are reporting the contrast at which this observer would get 70.7% of orientation judgments right on the long run — not the contrast at which they "can see" the patch in some absolute sense.

Contrast threshold is also a property of the observer plus the task. We use 2AFC orientation discrimination because at low contrast you cannot tell tilt direction unless you can detect the patch, so orientation and detection thresholds collapse to the same number, while the response stays forced and objective (no "I'm not sure" key).

The 2-down-1-up rule

The rule is short. The math behind why it works takes a little longer.

Rule. Start at some easy contrast. After two consecutive correct responses, drop the contrast one step (make the next trial harder). After one wrong response, raise the contrast one step (make it easier). Any wrong answer resets the consecutive-correct counter. Repeat.

Why 70.7%. Consider the procedure once it has stopped drifting and is bouncing around its asymptote. At that asymptote, the long-run rate of down-moves equals the rate of up-moves — otherwise contrast would still be drifting in one direction. Group the trials into decision units, each ending in either an up-move or a down-move. A down-move requires two consecutive correct responses; its probability is p² (where p is the observer's probability of a correct response). An up-move is the complementary event. So:

P(down) = p²
P(up)   = 1 − p²

Setting the two equal — the asymptotic balance condition — gives p² = 0.5 and p ≈ 0.707. The 2-down-1-up rule therefore homes in on the contrast at which the observer is correct on 70.7% of trials. Levitt (1971) generalizes this to other transformed up-down rules: 3-down-1-up converges to p ≈ 0.794, 4-down-1-up to p ≈ 0.841, and 1-down-1-up to p = 0.5 (useless in 2AFC, where 0.5 is chance). The procedural roots go back to Cornsweet (1962), who introduced the modern adaptive staircase in its 1-up-1-down form.

Step sizes. Early in the run, the staircase needs to cover ground quickly; near threshold, the steps need to shrink for precision. The conventional move is to halve the step size at each reversal — every trial where the staircase changes direction. Our implementation starts at 0.3 log units (a factor of 2 in linear contrast), halves at each reversal, and floors at 0.05 log units.

Threshold estimate. The threshold is the geometric mean of the contrasts at the last six reversals — equivalently, the arithmetic mean of the last six log-contrasts. Geometric, not arithmetic, because contrast is a multiplicative quantity that lives on a log scale.

Termination. The procedure stops at 8 reversals or 40 trials, whichever comes first. Eight reversals are enough for the last-six average to be stable; the 40-trial cap is a safety valve for an observer whose responses are noisy enough that the staircase isn't converging.

A staircase converging to threshold

The diagram below is a simulated trace of 2-down-1-up on a single spatial frequency, with a true threshold sitting at about 1% contrast (log contrast = −2). The y-axis is log contrast, plotted with linear-contrast labels at the right edge for intuition. The x-axis is trial number. Each filled circle is a reversal. The horizontal line is the threshold estimate, taken as the geometric mean of the last six reversals.

0.0−0.5−1.0−1.5−2.0−2.5100%32%10%3.2%1.0%0.3%151015202530trial numberlog contrastcontrastthreshold estimate ≈ log contrast −2.01 (1.0%)trialreversalgeometric mean of last 6 reversals

A few things to read off the trace. The first dozen trials walk monotonically downward — the observer was correct on every pair, so contrast dropped by a full 0.3 log unit every two trials. This is the coarse-search phase; the staircase covers two log units of contrast in twelve trials. Trial 13 is the first miss, at log contrast −2.1, and the procedure registers reversal #1. The step size halves to 0.15. From here the trace oscillates near threshold, halving the step at each subsequent reversal until it hits the 0.05 floor. The threshold estimate, the geometric mean of the last six reversals, lands at log contrast −2.01 — about 1.0% — within 0.01 log units of the true threshold the simulation was built around.

The procedure is statistically efficient because it concentrates trials near threshold once it has found it. After trial 12, almost every trial sits within 0.1 log units of true threshold. A method-of-constant-stimuli design with the same budget would have wasted eight trials at obviously-visible contrasts and eight more at obviously-invisible ones.

Why this beats a fixed chart

The advantage over fixed-step grating charts (FACT, CSV-1000) is easy to summarize. A FACT chart at a single spatial frequency presents nine contrast steps, each shown once, at ~0.15 log unit spacing (Pelli & Bex, 2013). The total information per frequency is one row of nine binary responses.

Compare that to a staircase. Forty trials concentrated near threshold, with step sizes that shrink as the procedure converges, pin down a threshold to ~0.05–0.1 log units in a 2AFC contrast task on a stable observer (Wichmann & Hill, 2001). Replication studies of fixed-step grating charts in healthy adults report ceiling effects at the maximum step in 50–95% of observers at the peak of the CSF — the test cannot resolve "good" from "great" in that population.

The staircase also adapts to the observer. If true threshold is unexpectedly high — because of a visual deficit, mis-calibrated screen, or glare — the staircase walks up and finds it. A fixed-step chart simply hits floor.

The cost is trial count. You cannot estimate a five-point CSF in 40 trials with per-frequency staircases — you need 40 trials per frequency, or roughly 200 total. At a few seconds per trial, that is five to seven minutes, which is what we budget for the full mode. A quick-screen mode runs a single staircase at the most informative frequency (4–6 cpd) in about a minute.

More efficient procedures exist

The staircase is not the limit of what's possible. Bayesian adaptive methods — Watson and Pelli's QUEST (Watson & Pelli, 1983) and its multi-parameter descendant qCSF (Lesmes, Lu, Baek & Albright, 2010) — exploit the fact that the contrast sensitivity function is a smooth, parametric curve across spatial frequencies. A trial at one frequency informs the estimated sensitivity at every other frequency, because a parametric model ties them together. qCSF can estimate an area-under-log-CSF summary in about 25 trials total — not per frequency — and converges on a full four-parameter CSF in around 100 trials, with a reported test-retest correlation near 0.97 (Lesmes et al., 2010).

The price is implementation complexity. A qCSF maintains a four-dimensional posterior, computes an expected-information-gain integral over every candidate (contrast, frequency) pair before each trial, and depends on the observer's true CSF actually matching the chosen parametric form (the truncated log-parabola). Bugs in any of those layers produce thresholds that look plausible but are wrong, and are harder to spot than bugs in a staircase. The detailed comparison is in our Pelli-Robson vs FACT vs qCSF tour.

What we actually run

VCS-Test uses 2-down-1-up on Gabor-patch stimuli at multiple spatial frequencies. Per-frequency parameters, from js/staircase.js:

  • Starting contrast. 0.5 (50% Michelson), well above threshold for typical observers.
  • Step sizes. Initial 0.3 log units, halved at each reversal, floored at 0.05 log units.
  • Termination. 8 reversals or 40 trials, whichever comes first.
  • Threshold estimator. Geometric mean of the last 6 reversals (equivalently, arithmetic mean of the last 6 log-contrasts).
  • Per-eye / mode. Full mode runs one staircase per spatial frequency. Quick mode runs a single staircase at one frequency, for a screening read in under a minute.

The choice of 2-down-1-up over qCSF for v0 was deliberate. The math fits in a paragraph and is fully verifiable end-to-end. The procedure handles unusual observers — outside the parametric prior, on a noisy mobile screen — without quietly smoothing them into a model. And the precision per session is appropriate for the variance budget of consumer-hardware testing, where display calibration introduces its own floor and spending extra procedural precision on top is wasted effort. A future version may add a qCSF mode for users who want full-curve estimation in fewer trials.

The implementation is open-source and lives at /methodology; the source file is roughly 80 lines of JavaScript with no dependencies.

Try it

The math here is the engine. The full test takes about five minutes; the quick screen takes about one. The result is a per-frequency log contrast sensitivity curve plotted against age-stratified normative ranges from published cohorts, plus the same number that a Pelli-Robson chart would give you (sensitivity at the low-frequency anchor). Take the test to see what the staircase converges to on your own eyes.

Note: A contrast sensitivity test, however well-implemented, is a screening signal of overall visual function. It is not a diagnostic test for any specific condition.

References

  • Cornsweet, T. N. (1962). The staircase-method in psychophysics. American Journal of Psychology, 75, 485–491. The introduction of the modern adaptive staircase — a 1-up-1-down procedure converging to 50% correct — that Levitt later generalized.
  • Levitt, H. (1971). Transformed up-down methods in psychoacoustics. The Journal of the Acoustical Society of America, 49(2), 467–477. Derivation of the transformed up-down family, including the 2-down-1-up rule's convergence to 70.7% correct on the psychometric function.
  • Watson, A. B., & Pelli, D. G. (1983). QUEST: a Bayesian adaptive psychometric method. Perception & Psychophysics, 33(2), 113–120. The Bayesian alternative to the staircase, two to three times more efficient per trial at the cost of implementation complexity and prior dependence.
  • Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics, 63(8), 1293–1313. Empirical basis for the threshold-precision rule of thumb (0.05–0.1 log units from a manageable trial budget) cited above.
  • Lesmes, L. A., Lu, Z.-L., Baek, J., & Albright, T. D. (2010). Bayesian adaptive estimation of the contrast sensitivity function: the quick CSF method. Journal of Vision, 10(3):17. The qCSF method, extending Bayesian adaptive estimation from a single threshold to the four-parameter CSF curve.
  • Pelli, D. G., & Bex, P. (2013). Measuring contrast sensitivity. Vision Research, 90, 10–14. A methodological review of contrast sensitivity measurement, including critique of fixed-step grating charts.

Take the test.

Free, calibrated, three minutes. Runs in your browser, results stay on your device.

Take the test