Why doesn't the test just show a few fixed contrast levels and ask if I can see them?

Because most of those trials would be wasted — a level far above your threshold is trivially visible, and one far below it is invisible, so the response carries almost no information about where your threshold actually sits. An adaptive staircase instead moves the contrast toward your threshold trial by trial, concentrating measurement where it's informative.

What does '2-down-1-up' mean?

It's the rule governing how contrast changes between trials: after two consecutive correct responses, the contrast is lowered (made harder); after a single wrong response, it's raised (made easier). This rule mathematically converges on the contrast level where you answer correctly 70.7% of the time.

How is my final threshold number calculated?

It's the geometric mean of the contrast values at your last six staircase reversals — the points where the direction changed from getting harder to getting easier, or vice versa. The staircase stops once it reaches 8 reversals or 40 trials, whichever happens first.

Why not use a more advanced method like qCSF instead of a staircase?

Bayesian methods like qCSF can estimate a full contrast sensitivity curve in fewer trials, but they're more complex to implement and rely on assuming your true sensitivity curve matches a specific mathematical shape. The 2-down-1-up staircase was chosen because its logic is simple enough to verify end to end and it handles unusual observers without being constrained by a model.

Adaptive staircase methods: why one number isn't enough

A naive way to measure contrast sensitivity is to pick five contrast levels, show each to the observer, and ask which they can see. The procedure is brisk. It is also wrong in ways that are hard to spot. The answer depends on which five contrasts you chose, on how many trials you ran per level, on the lighting in the room, and on whether the observer's attention happened to wander during a critical trial. Most trials are wasted on contrasts that are trivially visible or clearly invisible, where the response carries almost no information about where threshold sits. The trials near threshold — the ones that matter — are too few to pin anything down.

Modern psychophysics gets around this by making the test adaptive: the next contrast to present depends on what the observer just did. Easy responses are followed by harder trials; errors by easier ones. Done correctly, the procedure converges on threshold and most of the trial budget is spent where measurement is informative. The workhorse adaptive procedure — what we run inside this app — is the 2-down-1-up transformed up-down staircase of Levitt (1971).¹ This post explains what threshold means, why 2-down-1-up converges where it does, what the procedure looks like trial by trial, and what tradeoffs we accepted when we chose it.

What "threshold" actually means

Contrast threshold is not a hard line. There is no single contrast above which the observer always succeeds and below which they always fail. What exists instead is a psychometric function: a smooth curve mapping stimulus contrast to the observer's probability of a correct response.

In a two-alternative forced-choice (2AFC) task — "is the patch tilted left or right?" — the psychometric function starts at chance (0.5) when contrast is far below threshold, because the observer is guessing, and rises monotonically toward 1.0 at high contrast. Between these two plateaus is a steep middle region. The convention in psychophysics is to define threshold as the contrast at some specified point on this rising curve. Different procedures target different points: 50% (impossible in 2AFC), 70.7%, 79.4%, 84.1%. The target is chosen to land in the steep middle, where the curve changes quickly with contrast, because that is where each trial carries the most information about where the curve sits horizontally.

For 2-down-1-up the target is 70.7% correct, derived below. When we report a log contrast sensitivity number, we are reporting the contrast at which this observer would get 70.7% of orientation judgments right on the long run — not the contrast at which they "can see" the patch in some absolute sense.

Contrast threshold is also a property of the observer plus the task. We use 2AFC orientation discrimination because at low contrast you cannot tell tilt direction unless you can detect the patch, so orientation and detection thresholds collapse to the same number, while the response stays forced and objective (no "I'm not sure" key).

The 2-down-1-up rule

The rule is short. The math behind why it works takes a little longer.

Rule. Start at some easy contrast. After two consecutive correct responses, drop the contrast one step (make the next trial harder). After one wrong response, raise the contrast one step (make it easier). Any wrong answer resets the consecutive-correct counter. Repeat.

Why 70.7%. Consider the procedure once it has stopped drifting and is bouncing around its asymptote. At that asymptote, the long-run rate of down-moves equals the rate of up-moves — otherwise contrast would still be drifting in one direction. Group the trials into decision units, each ending in either an up-move or a down-move. A down-move requires two consecutive correct responses; its probability is p² (where p is the observer's probability of a correct response). An up-move is the complementary event. So:

P(down) = p²
P(up)   = 1 − p²

Setting the two equal — the asymptotic balance condition — gives p² = 0.5 and p ≈ 0.707. The 2-down-1-up rule therefore homes in on the contrast at which the observer is correct on 70.7% of trials. Levitt (1971) generalizes this to other transformed up-down rules: 3-down-1-up converges to p ≈ 0.794, 4-down-1-up to p ≈ 0.841, and 1-down-1-up to p = 0.5 (useless in 2AFC, where 0.5 is chance).¹ The procedural roots go back to Cornsweet (1962), who popularized the adaptive staircase in psychophysics in its 1-up-1-down form — though the up-down method itself is older, and Levitt cites earlier statistical work behind it.²

Step sizes. Early in the run, the staircase needs to cover ground quickly; near threshold, the steps need to shrink for precision. The conventional move is to halve the step size at each reversal — every trial where the staircase changes direction. Our implementation starts at 0.3 log units (a factor of 2 in linear contrast), halves at each reversal, and floors at 0.05 log units.

Threshold estimate. The threshold is the geometric mean of the contrasts at the last six reversals — equivalently, the arithmetic mean of the last six log-contrasts. Geometric, not arithmetic, because contrast is a multiplicative quantity that lives on a log scale.

Termination. The procedure stops at 8 reversals or 40 trials, whichever comes first. Eight reversals are enough for the last-six average to be stable; the 40-trial cap is a safety valve for an observer whose responses are noisy enough that the staircase isn't converging.

A staircase converging to threshold

The diagram below is a simulated trace of 2-down-1-up on a single spatial frequency, with a true threshold sitting at about 1% contrast (log contrast = −2). The y-axis is log contrast, plotted with linear-contrast labels at the right edge for intuition. The x-axis is trial number. Each larger filled dot marks a reversal. The horizontal line is the threshold estimate, taken as the geometric mean of the last six reversals.

A simulated 2-down-1-up staircase converging on contrast threshold. Each pair of correct responses lowers the contrast (harder); a single error raises it (easier), so the rule homes in on the 70.7%-correct point. The coarse search drops fast for the first dozen trials; the first error near 1% contrast triggers reversal #1, and from there the step size halves at every reversal (the marked turning points) while the trace oscillates in a narrowing band around threshold. The dashed line is the threshold estimate — the geometric mean of the last six reversals — at log contrast −2.0 (about 1% contrast). Illustrative trace of the transformed up-down method of Levitt (1971); the response sequence is schematic, not measured data.Illustrative; 2-down-1-up transformed up-down method after Levitt (1971), adaptive staircase after Cornsweet (1962). Step sizes start at 0.3 log units and halve at each reversal (0.05 floor); threshold = geometric mean of the last 6 reversals — the parameters the post describes.

A few things to read off the trace. The first dozen trials walk monotonically downward — the observer was correct on every pair, so contrast dropped by a full 0.3 log unit every two trials. This is the coarse-search phase; the staircase covers nearly two log units of contrast in a dozen trials. Trial 13 is the first miss, at log contrast −2.1, and the procedure registers reversal #1. The step size halves to 0.15. From here the trace oscillates near threshold, halving the step at each subsequent reversal until it hits the 0.05 floor. The threshold estimate, the geometric mean of the last six reversals, lands at log contrast −2.0 — about 1.0% — right at the true threshold the simulation was built around.

The procedure is statistically efficient because it concentrates trials near threshold once it has found it. After trial 12, almost every trial sits within 0.1 log units of true threshold. A method-of-constant-stimuli design with the same budget would have wasted eight trials at obviously-visible contrasts and eight more at obviously-invisible ones.

Why this beats a fixed chart

The advantage over fixed-step grating charts (FACT, CSV-1000) is easy to summarize.³ A FACT chart at a single spatial frequency presents nine contrast steps, each shown once, at roughly 0.15 log-unit spacing. The total information per frequency is one row of nine binary responses.

Compare that to a staircase. Forty trials concentrated near threshold, with step sizes that shrink as the procedure converges, can pin down a threshold to roughly 0.05–0.1 log units for a stable observer in a 2AFC contrast task — the kind of precision a well-sampled psychometric function supports.⁴ Replication studies of fixed-step grating charts in healthy adults frequently report ceiling effects at the maximum step near the peak of the CSF — the test often cannot resolve "good" from "great" in that population.

The staircase also adapts to the observer. If true threshold is unexpectedly high — because of a visual deficit, mis-calibrated screen, or glare — the staircase walks up and finds it. A fixed-step chart simply hits floor.

The cost is trial count. You cannot estimate a five-point CSF in 40 trials with per-frequency staircases — you need 40 trials per frequency, or roughly 200 total. At a few seconds per trial, that is five to seven minutes, which is what we budget for the full mode. A quick-screen mode runs a single staircase at the most informative frequency (4–6 cpd) in about a minute.

More efficient procedures exist

The staircase is not the limit of what's possible. Bayesian adaptive methods — Watson and Pelli's QUEST (Watson & Pelli, 1983)⁵ and its multi-parameter descendant qCSF (Lesmes, Lu, Baek & Albright, 2010)⁶ — exploit the fact that the contrast sensitivity function is a smooth, parametric curve across spatial frequencies. A trial at one frequency informs the estimated sensitivity at every other frequency, because a parametric model ties them together. qCSF can estimate an area-under-log-CSF summary in about 25 trials total — not per frequency — and converges on a full four-parameter CSF in around 100 trials, with a reported test–retest correlation of about 0.96 (Lesmes et al., 2010).⁶

The price is implementation complexity. A qCSF maintains a four-dimensional posterior, computes an expected-information-gain integral over every candidate (contrast, frequency) pair before each trial, and depends on the observer's true CSF actually matching the chosen parametric form (the truncated log-parabola). Bugs in any of those layers produce thresholds that look plausible but are wrong, and are harder to spot than bugs in a staircase. The detailed comparison is in our Pelli-Robson vs FACT vs qCSF tour.

What we actually run

VCS-Test uses 2-down-1-up on Gabor-patch stimuli at multiple spatial frequencies. Per-frequency parameters, from js/staircase.js:

Starting contrast. 0.5 (50% Michelson), well above threshold for typical observers.
Step sizes. Initial 0.3 log units, halved at each reversal, floored at 0.05 log units.
Termination. 8 reversals or 40 trials, whichever comes first.
Threshold estimator. Geometric mean of the last 6 reversals (equivalently, arithmetic mean of the last 6 log-contrasts).
Per-eye / mode. Full mode runs one staircase per spatial frequency. Quick mode runs a single staircase at one frequency, for a screening read in under a minute.

The choice of 2-down-1-up over qCSF for v0 was deliberate. The math fits in a paragraph and is fully verifiable end-to-end. The procedure handles unusual observers — outside the parametric prior, on a noisy mobile screen — without quietly smoothing them into a model. And the precision per session is appropriate for the variance budget of consumer-hardware testing, where display calibration introduces its own floor and spending extra procedural precision on top is wasted effort. A future version may add a qCSF mode for users who want full-curve estimation in fewer trials.

The implementation is open-source and lives at /methodology; the source file is roughly 80 lines of JavaScript with no dependencies.

Try it

The math here is the engine. The full test takes about five minutes; the quick screen takes about one. The result is a per-frequency log contrast sensitivity curve plotted against age-stratified normative ranges from published cohorts, plus the same number that a Pelli-Robson chart would give you (sensitivity at the low-frequency anchor). Take the test to see what the staircase converges to on your own eyes.

Note: A contrast sensitivity test, however well-implemented, is a screening signal of overall visual function. It is not a diagnostic test for any specific condition.

Levitt H. Transformed up-down methods in psychoacoustics. J Acoust Soc Am. 1971;49(2):467–477. Derives the transformed up-down family; the 2-down-1-up rule converges on the contrast where the observer is correct 70.7% of the time (Table I confirms 3-down-1-up → 79.4%, 4-down-1-up → 84.1%, 1-down-1-up → 50%). PubMed. ↩ ↩²
Cornsweet TN. The staircase-method in psychophysics. Am J Psychol. 1962;75:485–491. Popularized the adaptive staircase in psychophysics in its simple 1-up-1-down form (which converges to 50% correct); the up-down method itself predates it. PubMed. ↩
Pelli DG, Bex P. Measuring contrast sensitivity. Vision Res. 2013;90:10–14. A methodological review of contrast sensitivity measurement, including the case for forced-choice adaptive procedures over fixed-step grating charts. PubMed. ↩
Wichmann FA, Hill NJ. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys. 2001;63(8):1293–1313. Analyses how well a psychometric function — and therefore a threshold — can be estimated from a finite set of trials. PubMed. ↩
Watson AB, Pelli DG. QUEST: a Bayesian adaptive psychometric method. Percept Psychophys. 1983;33(2):113–120. The Bayesian alternative to the staircase — more efficient per trial, at the cost of implementation complexity and dependence on a prior. PubMed. ↩
Lesmes LA, Lu Z-L, Baek J, Albright TD. Bayesian adaptive estimation of the contrast sensitivity function: the quick CSF method. J Vis. 2010;10(3):17. Extends Bayesian adaptive estimation from a single threshold to the four-parameter CSF; reports an area-under-log-CSF summary in about 25 trials and a full CSF in about 100 trials, with a test–retest correlation of roughly 0.96. PubMed. ↩ ↩²

Adaptive staircase methods: why one number isn't enough

What "threshold" actually means

The 2-down-1-up rule

A staircase converging to threshold

Why this beats a fixed chart

More efficient procedures exist

What we actually run

Try it

Frequently asked questions

keep going.

Adaptive staircase methods: why one number isn't enough

Frequently asked questions

curious where your CSF sits?

keep going.