Two friends take the same online vision test on different devices — one on a 27-inch desktop monitor, the other on a 15-inch laptop. They get different scores. The question is: is the test broken, or is one of the screens lying?
Both, but mostly the screens.
Every monitor has its own brightness, gamma, pixel pitch, viewing distance, and personal habits. A contrast sensitivity test that does not account for these is, strictly speaking, not testing the eye. It is testing the monitor — with the eye attached as a kind of biological accessory.
This post is about why our test asks you to do a handful of small calibration tasks before showing any stripes, and what each one prevents. None of it is mysterious; it is mostly geometry and the optics of LCD panels. But the magnitudes of the errors are big enough that explaining them feels honest, and the alternative — opaque "we've taken care of it" — is the move of every other tool we have polite professional disagreements with.
What changes from screen to screen
Five physical properties of a display affect what a contrast sensitivity test is actually presenting.
Pixel pitch. How many physical millimetres each pixel occupies. A 4K phone has a pitch of roughly 0.05 mm; a 24-inch 1080p desktop monitor has a pitch closer to 0.28 mm — about six times larger. A "5 cm Gabor patch" is a different physical stimulus on each one. Because contrast sensitivity is specified in cycles per degree of visual angle, you need millimetres-per-pixel to render anything correctly.
Luminance. The peak brightness of "white." A budget laptop in a sunny room maxes out around 250 nits; a premium tablet can push 600 nits or more. Contrast sensitivity itself shifts as a function of mean luminance — the visual system operates in different regimes depending on whether you're adapted to a dim or a bright field — so the same nominal stimulus is perceptually not the same stimulus on different displays.
Gamma. The curve that maps the 0-to-255 pixel value your code writes into the physical luminance your screen emits. Almost every consumer display approximates the sRGB transfer function, close to a 2.2 power gamma. The consequence: if a test writes pixel value 128 expecting "half luminance," the screen actually emits about 22% of the maximum, not 50%. A naïve test that asks for "10% Michelson contrast" without inverting that gamma physically presents about 22% — a factor-of-two error in the variable the test is trying to measure.
Viewing distance. Spatial frequency in cycles per degree depends linearly on distance. At 57 cm, a grating subtends 6 cycles per degree; at 28 cm — laptop-on-knees distance — the same grating subtends 3 cycles per degree. Self-reported "arm's length" varies by 30 cm or more across adults, and that uncertainty propagates linearly into spatial frequency.
Colour profile and dynamic range. macOS ships with P3-default displays; Windows colour profiles vary; HDR mode can re-map the luminance range above what an sRGB-aware app assumes. None of this matters for the carrier of an achromatic Gabor patch, but it matters for the mid-gray surround, the reference white, and whether the gamma model you measured holds.
Variation along all five axes is normal. The job of calibration is not to eliminate the variation — that would need lab equipment a web page doesn't have — but to estimate what matters and render the stimulus to a known specification, in degrees of visual angle and Michelson contrast, on whatever screen is in front of you.
What goes wrong if you don't calibrate
Concretely, here is what an uncalibrated remote test gets wrong:
- Wrong spatial frequency. You think you're testing 6 cpd; the user is sitting at 30 cm instead of the assumed 57 cm, so you're actually presenting 3 cpd. Threshold at 3 cpd is, in a healthy young adult, roughly twice as good as at 6 cpd. Your result is biased by an entire octave of the contrast sensitivity function.
- Wrong contrast. You think you're showing 5% Michelson contrast. The user's display has a gamma you didn't measure. Physical contrast on the screen is closer to 10%. Your "threshold" is the contrast at which the user could just see a stimulus twice as easy as you thought.
- Wrong adaptation state. The first trials are biased because the user just came from a bright phone notification screen and the visual system has not equilibrated. Calibration research emphasises that proper rendering on consumer displays is a precondition for valid contrast-threshold measurements, not an optional add-on (Allard & Faubert, 2008).
- Test-retest noise dwarfs the signal. The smallest clinically meaningful change on a validated chart is around 0.3 log units (Pelli, Robson & Wilkins, 1988); uncalibrated noise from any of the above easily exceeds that. The "result" is some number — but not a measurement of contrast sensitivity.
The first goal of calibration is to make the test the same test for every user. The second, equally important, is an honest error budget. We can't make it zero. We can make it small, characterisable, and small enough that real changes in the visual system register as real changes in the result.
Our calibration steps, explained
There are four of them, and they take a couple of minutes in total. The diagram below shows the flow; each one is doing a specific physical job.
Step 1 — Credit-card resize
A virtual rectangle on screen; you drag a handle until it matches the physical width of a credit card or ID. Credit cards and most national IDs are manufactured to ISO/IEC 7810 ID-1, which fixes the card at 85.60 × 53.98 mm with tolerances of tenths of a millimetre. Once the on-screen rectangle matches, the test knows how many pixels span 85.60 mm on your display, and pixels-per-millimetre falls out by division.
This step exists because the browser cannot ask the operating system for your screen's physical size. The devicePixelRatio API returns a ratio between device pixels and CSS pixels, but the CSS pixel is defined as an angular unit, not a physical one — roughly the visual angle of 1/96 of an inch at arm's length. No JavaScript API returns millimetres. A physical reference object is the only way out.
Step 2 — Blind-spot distance
You close one eye, fixate a small cross on the left, and slide a dot to the right until it disappears into your blind spot — the patch of retina where the optic nerve exits, which has no photoreceptors. The blind spot sits at a stable eccentricity from the fovea, roughly 13.5° on the temporal side. Once we know how far the dot was from the fixation point in millimetres (from step 1), the angular geometry — distance equals offset divided by tan(13.5°) — gives the screen-to-eye distance directly.
This is the "virtual chinrest" method validated by Li and colleagues in 2020, with a mean absolute error of about 3.25 cm at viewing distances between 43 and 66 cm — accurate enough for psychophysics on most real-world setups, and dramatically better than self-reported "arm's length" (Li, Joo, Yeatman & Reinecke, 2020).
Step 3 — Stripe-match gamma
The cleverest of the four. The test shows a small patch of very thin, one-pixel-wide alternating black-and-white stripes next to a uniform mid-gray patch. A slider adjusts the gray patch's pixel value. You step back, or squint, until the stripes and the gray blur into the same brightness; then you stop.
The physics: the stripes physically emit, on average, half of the maximum luminance — (L_max + L_min) / 2 ≈ L_max / 2, because half the pixels are at the maximum and half are at the minimum. The gray patch emits L_max · (v/255)^γ, where v is the slider's pixel value and γ is your display's gamma. At the match, those are equal: (v/255)^γ = 0.5, so γ = log(0.5) / log(v/255). With one observation, we have your display's gamma.
The trick works because the visual system spatially averages high-frequency patterns at a distance. Calibrated luminance probes do the same with a sensor; this does it with an eye. Gamma is recoverable to about ±0.1 — more than enough for our purposes. Once we have it, stimulus rendering happens in linear-light space and the inverse gamma is applied at write time. Sub-1/256 contrast steps use a dithering technique to render cleanly without quantisation banding — the canonical reference is Allard & Faubert's "noisy-bit" method, validated not to bias contrast thresholds (Allard & Faubert, 2008). Bach's Freiburg Acuity and Contrast Test (FrACT) was the first widely-used browser-based implementation to spell out these display issues, and remains the standard reference for calibration on consumer hardware (Bach, 1996).
Step 4 — Mid-gray adaptation
A thirty-second uniform mid-gray screen — the same mid-gray that will surround the test stimulus. This is not a calibration of the device; it is a calibration of you. Retinal sensitivity is set by the average luminance you have recently been adapted to, and the visual system has both light/dark and contrast adaptation timescales. The early seconds of any new luminance environment carry a transient. A test that drops you from a colourful instructions page straight into trials is measuring partly your contrast sensitivity and partly that transient. Thirty seconds settles it.
What we can't fix from a web app
The honest part. There are limits to remote calibration, and we hit them.
Really bad displays can't be rescued. A glossy laptop screen in direct sunlight, with ambient veiling glare added to both the bright and dark parts of a stimulus, presents a smaller effective contrast than the calibration assumed. We can detect "very bright" with a webcam frame if you grant access, but we cannot undo glare in software.
OS-level filters silently change luminance. macOS Night Shift, Windows Night Light, true tone and adaptive-brightness features — all of them shift the white point or brightness in ways our calibration doesn't account for. We ask you to turn them off for the duration of the test. Not life advice; only for this measurement. Dark mode the rest of the time is fine.
HDR mode complicates the gamma model. If your display is in HDR, the luminance range exceeds what an sRGB gamma assumes. We ask you to switch HDR off for the test. Most users never know they have it on.
Ambient lighting matters. Aim for a normal indoor setting — dim but not dark, no direct sunlight on the screen. Full darkness shifts you into a different visual regime (mesopic/scotopic), where contrast sensitivity is markedly different from the photopic baseline our normative bands are drawn from.
Residual variance remains. Even after all of the above, the same person testing in different rooms at different times will see a few tenths of a log unit of drift. That is the visual system doing what it does — fluctuating with fatigue, pupil size, attention, time of day. Take the test more than once; a personal trend on a single setup is the most reliable thing this kind of test gives you.
Why we tell you all this
Trust, mostly. A measurement people don't understand is a measurement they can't act on — they either dismiss it or grant it more authority than it deserves. We'd rather you take the test, roughly understand what each step is doing to make the number real, and read the result knowing what it can and cannot tell you.
The alternative is the marketing-flavoured version: "advanced calibration ensures accurate results." We've read that sentence on enough product pages to find it actively suspicious. The four steps above are not magic; they are credible solutions to specific physical problems, drawn from published methods, that bring a remote test within the same neighbourhood of error as a clinic instrument — not equal to it, but useful and honestly characterised.
If you want to skip the calibration: please don't. Those first two minutes are most of what separates this test from a "click yes/no on a chart" tool. With it, the result is a measurement of your contrast sensitivity. Without it, the result is mostly a measurement of your laptop.
Take the test
Take the test now — the four calibration steps run in the first couple of minutes; the test proper takes three to seven depending on quick mode or full curve. No signup; results stay on your device. The how-to post walks through what taking it feels like, and the primer explains what contrast sensitivity is.
References
- Pelli, D. G., Robson, J. G., & Wilkins, A. J. (1988). The design of a new letter chart for measuring contrast sensitivity. Clinical Vision Sciences, 2, 187–199. Foundational clinical CS chart paper; source of the test-retest repeatability figure (~0.15 log units) and the often-cited smallest clinically meaningful change (~0.30 log units) used to anchor the calibration error budget.
- Bach, M. (1996). The Freiburg Visual Acuity Test — automatic measurement of visual acuity. Optometry and Vision Science, 73(1), 49–53. The original FrACT paper; subsequent FrACT documentation laid out browser-display gamma, dithering, and per-device calibration considerations that remote vision tests still build on.
- Allard, R., & Faubert, J. (2008). The noisy-bit method for digital displays: converting a 256 luminance resolution into a continuous resolution. Behavior Research Methods, 40(3), 735–743. Validates the spatial-dithering approach used to render sub-1/256 contrast steps cleanly on 8-bit displays without biasing contrast thresholds.
- Li, Q., Joo, S. J., Yeatman, J. D., & Reinecke, K. (2020). Controlling for participants' viewing distance in large-scale, psychophysical online experiments using a virtual chinrest. Scientific Reports, 10:904. Peer-reviewed validation of the blind-spot-based virtual chinrest used as step 2 of our calibration. Mean absolute error ≈ 3.25 cm across 43–66 cm distances.
Standards
- ISO/IEC 7810:2019. Identification cards — Physical characteristics. Specifies the ID-1 form factor (85.60 × 53.98 mm) used by credit cards, debit cards, and most national identity cards. Normative source for the physical reference object in step 1.