Why does a contrast sensitivity test need to calibrate my screen first?

Because several properties of a display — pixel pitch, peak brightness, gamma, and how far you're sitting from it — all change what a stimulus physically looks like. Without accounting for these, two people on different devices would see different physical stimuli labeled with the same nominal numbers, so the test would mostly measure their screens instead of their eyes.

What are the calibration steps and what does each one fix?

There are four: a credit-card resize step that establishes pixels-per-millimeter, a blind-spot distance test that estimates how far you're sitting from the screen, a stripe-match gamma test that measures your display's brightness response curve, and a 30-second mid-gray adaptation period that settles your visual system before trials begin.

How accurate is the blind-spot method for estimating viewing distance?

It's based on a validated 'virtual chinrest' technique with a published mean absolute error of about 3.25 centimeters across viewing distances of 43 to 66 centimeters — accurate enough for meaningful psychophysics and far better than a self-reported guess like 'arm's length.'

Can calibration fix a bad monitor or a glare-filled room?

No. Calibration estimates and corrects for known, measurable factors, but things like direct sunlight glare, OS-level color filters, and HDR mode still need to be manually turned off or avoided, and some residual variance in results always remains.

Why Screen Calibration Matters for Vision Tests

Two friends take the same online vision test on different devices — one on a 27-inch desktop monitor, the other on a 15-inch laptop. They get different scores. The question is: is the test broken, or is one of the screens lying?

Both, but mostly the screens.

Every monitor has its own brightness, gamma, pixel pitch, viewing distance, and personal habits. A contrast sensitivity test that does not account for these is, strictly speaking, not testing the eye. It is testing the monitor — with the eye attached as a kind of biological accessory.

This post is about why our test asks you to do a handful of small calibration tasks before showing any stripes, and what each one prevents. None of it is mysterious; it is mostly geometry and the optics of LCD panels. But the magnitudes of the errors are big enough that explaining them feels honest, and the alternative — opaque "we've taken care of it" — is the move of every other tool we have polite professional disagreements with.

What changes from screen to screen

Five physical properties of a display affect what a contrast sensitivity test is actually presenting.

Pixel pitch. How many physical millimetres each pixel occupies. A 4K phone has a pitch of roughly 0.05 mm; a 24-inch 1080p desktop monitor has a pitch closer to 0.28 mm — about six times larger. A "5 cm Gabor patch" is a different physical stimulus on each one. Because contrast sensitivity is specified in cycles per degree of visual angle, you need millimetres-per-pixel to render anything correctly.

Luminance. The peak brightness of "white." A budget laptop in a sunny room maxes out around 250 nits; a premium tablet can push 600 nits or more. Contrast sensitivity itself shifts as a function of mean luminance — the visual system operates in different regimes depending on whether you're adapted to a dim or a bright field — so the same nominal stimulus is perceptually not the same stimulus on different displays.

Gamma. The curve that maps the 0-to-255 pixel value your code writes into the physical luminance your screen emits. Almost every consumer display approximates the sRGB transfer function, close to a 2.2 power gamma. The consequence: if a test writes pixel value 128 expecting "half luminance," the screen actually emits about 22% of the maximum, not 50%. A naïve test that asks for "10% Michelson contrast" without inverting that gamma physically presents about 22% — a factor-of-two error in the variable the test is trying to measure.

Viewing distance. Spatial frequency in cycles per degree depends linearly on distance. At 57 cm, a grating subtends 6 cycles per degree; at 28 cm — laptop-on-knees distance — the same grating subtends 3 cycles per degree. Self-reported "arm's length" varies by 30 cm or more across adults, and that uncertainty propagates linearly into spatial frequency.

Colour profile and dynamic range. macOS ships with P3-default displays; Windows colour profiles vary; HDR mode can re-map the luminance range above what an sRGB-aware app assumes. None of this matters for the carrier of an achromatic Gabor patch (the grey, fuzzy-edged patch of stripes used as the test stimulus), but it matters for the mid-gray surround, the reference white, and whether the gamma model you measured holds.

Variation along all five axes is normal. The job of calibration is not to eliminate the variation — that would need lab equipment a web page doesn't have — but to estimate what matters and render the stimulus to a known specification, in degrees of visual angle and Michelson contrast (the standard 0-to-1 measure of the light–dark difference in a striped pattern), on whatever screen is in front of you.

What goes wrong if you don't calibrate

Concretely, here is what an uncalibrated remote test gets wrong:

Wrong spatial frequency. You think you're testing 6 cpd; the user is sitting at 30 cm instead of the assumed 57 cm, so you're actually presenting 3 cpd. Threshold at 3 cpd is, in a healthy young adult, roughly twice as good as at 6 cpd. Your result is biased by an entire octave of the contrast sensitivity function.
Wrong contrast. You think you're showing 5% Michelson contrast. The user's display has a gamma you didn't measure. Physical contrast on the screen is closer to 10%. Your "threshold" is the contrast at which the user could just see a stimulus twice as easy as you thought.
Wrong adaptation state. The first trials are biased because the user just came from a bright phone notification screen and the visual system has not equilibrated. Calibration research emphasises that proper rendering on consumer displays is a precondition for valid contrast-threshold measurements, not an optional add-on.¹
Test-retest noise dwarfs the signal. The smallest clinically meaningful change on a validated contrast chart² is generally taken as around 0.3 log units — about two chart steps, a figure that traces to Elliott, Sanderson and Conkey's 1990 reliability work rather than the original Pelli–Robson design paper;³ uncalibrated noise from any of the above easily exceeds that. The "result" is some number — but not a measurement of contrast sensitivity.

The first goal of calibration is to make the test the same test for every user. The second, equally important, is an honest error budget. We can't make it zero. We can make it small, characterisable, and small enough that real changes in the visual system register as real changes in the result.

Our calibration steps, explained

There are four of them, and they take a couple of minutes in total. The diagram below shows the flow; each one is doing a specific physical job.

step 1

credit-card resize

fixes pixel pitch

→ mm per pixel

step 2

blind-spot distance

fixes viewing distance

→ cm from screen

step 3

stripe-match gamma

fixes brightness response

→ display gamma

step 4

mid-gray adaptation

prepares your visual system

→ steady baseline

Step 1 — Credit-card resize

A virtual rectangle on screen; you drag a handle until it matches the physical width of a credit card or ID. Credit cards and most national IDs are manufactured to ISO/IEC 7810 ID-1, which fixes the card at 85.60 × 53.98 mm with tolerances of tenths of a millimetre.⁴ Once the on-screen rectangle matches, the test knows how many pixels span 85.60 mm on your display, and pixels-per-millimetre falls out by division.

This step exists because the browser cannot ask the operating system for your screen's physical size. The devicePixelRatio API returns a ratio between device pixels and CSS pixels, but the CSS pixel is defined as an angular unit, not a physical one — roughly the visual angle of 1/96 of an inch at arm's length. No JavaScript API returns millimetres. A physical reference object is the only way out.

Step 2 — Blind-spot distance

You close one eye, fixate a small cross on the left, and slide a dot to the right until it disappears into your blind spot — the patch of retina where the optic nerve exits, which has no photoreceptors. The blind spot sits at a stable eccentricity from the fovea, roughly 13.5° on the temporal side. Once we know how far the dot was from the fixation point in millimetres (from step 1), the angular geometry — distance equals offset divided by tan(13.5°) — gives the screen-to-eye distance directly.

This is the "virtual chinrest" method validated by Li and colleagues in 2020, with a mean absolute error of about 3.25 cm at viewing distances between 43 and 66 cm — accurate enough for psychophysics on most real-world setups, and dramatically better than self-reported "arm's length." (Their method adopts a 13.5° temporal blind-spot eccentricity, the value used above.)⁵

Step 3 — Stripe-match gamma

The cleverest of the four. The test shows a small patch of very thin, one-pixel-wide alternating black-and-white stripes next to a uniform mid-gray patch. A slider adjusts the gray patch's pixel value. You step back, or squint, until the stripes and the gray blur into the same brightness; then you stop.

The physics: the stripes physically emit, on average, half of the maximum luminance — (L_max + L_min) / 2 ≈ L_max / 2, because half the pixels are at the maximum and half are at the minimum. The gray patch emits L_max · (v/255)^γ, where v is the slider's pixel value and γ is your display's gamma. At the match, those are equal: (v/255)^γ = 0.5, so γ = log(0.5) / log(v/255). With one observation, we have your display's gamma.

The trick works because the visual system spatially averages high-frequency patterns at a distance. Calibrated luminance probes do the same with a sensor; this does it with an eye. Gamma is recoverable to about ±0.1 — more than enough for our purposes. Once we have it, stimulus rendering happens in linear-light space and the inverse gamma is applied at write time. Sub-1/256 contrast steps use a dithering technique to render cleanly without quantisation banding — the canonical reference is Allard & Faubert's "noisy-bit" method, validated not to bias contrast thresholds.¹ Bach's FrACT — first published in 1996 as the Freiburg Visual Acuity Test and later extended to contrast — is one of the longest-standing freely available acuity and contrast tools, and its documentation was among the earliest to spell out display gamma, dithering, and per-device calibration for computer-based vision testing.⁶

Step 4 — Mid-gray adaptation

A thirty-second uniform mid-gray screen — the same mid-gray that will surround the test stimulus. This is not a calibration of the device; it is a calibration of you. Retinal sensitivity is set by the average luminance you have recently been adapted to, and the visual system has both light/dark and contrast adaptation timescales. The early seconds of any new luminance environment carry a transient. A test that drops you from a colourful instructions page straight into trials is measuring partly your contrast sensitivity and partly that transient. Thirty seconds settles it.

What we can't fix from a web app

The honest part. There are limits to remote calibration, and we hit them.

Really bad displays can't be rescued. A glossy laptop screen in direct sunlight, with ambient veiling glare added to both the bright and dark parts of a stimulus, presents a smaller effective contrast than the calibration assumed. We can detect "very bright" with a webcam frame if you grant access, but we cannot undo glare in software.

OS-level filters silently change luminance. macOS Night Shift, Windows Night Light, true tone and adaptive-brightness features — all of them shift the white point or brightness in ways our calibration doesn't account for. We ask you to turn them off for the duration of the test. Not life advice; only for this measurement. Dark mode the rest of the time is fine.

HDR mode complicates the gamma model. If your display is in HDR, the luminance range exceeds what an sRGB gamma assumes. We ask you to switch HDR off for the test. Most users never know they have it on.

Ambient lighting matters. Aim for a normal indoor setting — dim but not dark, no direct sunlight on the screen. Full darkness shifts you into a different visual regime (mesopic and scotopic vision — the dim-light and near-dark states where your rod photoreceptors increasingly take over), where contrast sensitivity is markedly different from the photopic (normal daylight) baseline our normative bands are drawn from.

Residual variance remains. Even after all of the above, the same person testing in different rooms at different times will see a few tenths of a log unit of drift. That is the visual system doing what it does — fluctuating with fatigue, pupil size, attention, time of day. Take the test more than once; a personal trend on a single setup is the most reliable thing this kind of test gives you.

Why we tell you all this

Trust, mostly. A measurement people don't understand is a measurement they can't act on — they either dismiss it or grant it more authority than it deserves. We'd rather you take the test, roughly understand what each step is doing to make the number real, and read the result knowing what it can and cannot tell you.

The alternative is the marketing-flavoured version: "advanced calibration ensures accurate results." We've read that sentence on enough product pages to find it actively suspicious. The four steps above are not magic; they are credible solutions to specific physical problems, drawn from published methods, that bring a remote test within the same neighbourhood of error as a clinic instrument — not equal to it, but useful and honestly characterised.

If you want to skip the calibration: please don't. Those first two minutes are most of what separates this test from a "click yes/no on a chart" tool. With it, the result is a measurement of your contrast sensitivity. Without it, the result is mostly a measurement of your laptop.

Take the test

Take the test now — the four calibration steps run in the first couple of minutes; the test proper takes three to seven depending on quick mode or full curve. No signup; results stay on your device. The how-to post walks through what taking it feels like, and the primer explains what contrast sensitivity is.

Allard R, Faubert J. The noisy-bit method for digital displays: converting a 256 luminance resolution into a continuous resolution. Behav Res Methods. 2008;40(3):735–743. Validates the spatial-dithering ("noisy-bit") approach for rendering sub-1/256 contrast steps on 8-bit displays without biasing contrast thresholds. PubMed. ↩ ↩²
Pelli DG, Robson JG, Wilkins AJ. The design of a new letter chart for measuring contrast sensitivity. Clin Vis Sci. 1988;2(3):187–199. The foundational clinical contrast-chart paper. (Published in Clinical Vision Sciences, which is not indexed in PubMed and carries no registered DOI, so no stable external link is available.) ↩
Elliott DB, Sanderson K, Conkey A. The reliability of the Pelli-Robson contrast sensitivity chart. Ophthalmic Physiol Opt. 1990;10(1):21–24. The actual source of the widely quoted repeatability figures: scores are repeatable to within about ±0.15 log units (±1 chart step), so a change of roughly ±0.30 log units (±2 steps) is generally taken as clinically meaningful. Often mis-cited to the 1988 design paper. PubMed. ↩
ISO/IEC 7810:2019, Identification cards — Physical characteristics. Specifies the ID-1 form factor (85.60 × 53.98 mm) used by credit cards, debit cards, and most national identity cards — the normative source for the physical reference object in step 1. (ISO standards are not freely web-hosted at a stable public URL, so no link is given.) ↩
Li Q, Joo SJ, Yeatman JD, Reinecke K. Controlling for participants' viewing distance in large-scale, psychophysical online experiments using a virtual chinrest. Sci Rep. 2020;10:904. Peer-reviewed validation of the blind-spot-based virtual chinrest used as step 2 of our calibration: it adopts a 13.5° temporal blind-spot eccentricity and, across laboratory distances of 43–66 cm, estimated viewing distance with a mean absolute error of about 3.25 cm. PubMed. ↩
Bach M. The Freiburg Visual Acuity test — automatic measurement of visual acuity. Optom Vis Sci. 1996;73(1):49–53. The original FrACT paper (later extended to contrast); its documentation was among the earliest to spell out display gamma, dithering, and per-device calibration for computer-based vision testing. PubMed. ↩

Why your screen settings matter: calibration in remote vision testing