Extracting text from a 1935 federal document — two engines compared

Historical spatial research depends on archival documents that exist only as scanned images. OCR converts those images into machine-readable text. This page runs the same crop through Tesseract.js (open-source, client-side) and Google Cloud Vision (neural API) so you can compare output and confidence side by side.

Tesseract: no key required — runs in your browser via WebAssembly. First run downloads ~10 MB; subsequent runs are instant.  |  Google Vision: requires an API key (1,000 free requests/month; billing account required). Your key goes directly from your browser to Google — it is never sent anywhere else and is not stored beyond this session.

Region
Google Vision API key (optional)
Tesseract: Ready.
Google Vision: Ready — enter API key to enable.
Source document
1935 HOLC Security Map of Lansing, Michigan. Colored map showing neighborhood grades A–D. Upper-right: title block and city statistics table. Lower-right: handwritten legend with the four HOLC grades.
1935 HOLC Security Map — City of Lansing, East Lansing and Vicinity, Michigan. Published by Chamber of Commerce; compiled by Pease Engineering Co. Source: Mapping Inequality / University of Richmond.
Extracted text Tesseract.js
Output appears here. Words are color-coded by confidence: green >80%, yellow 50–80%, red <50%.
High (>80%) Medium (50–80%) Low (<50%)
Extracted text Google Vision
Enter a Google Cloud Vision API key and click Run Google Vision.
High (>80%) Medium (50–80%) Low (<50%)

Comparison

Why some text is wrong: Both engines struggle with this 1935 scan for the same reasons — faded ink, age-yellowed paper, map color zones bleeding into text, street names at angles, and handwriting in the legend. Tesseract works entirely in your browser with no preprocessing; Vision sends the image to Google's neural model, which is trained on degraded documents and typically produces higher confidence on archival material. In a production workflow you'd also binarize, deskew, and contrast-enhance the image before sending to either engine.   Google Vision docs  ·  Tesseract docs
OCR Demo — 1935 HOLC Security Map | Redlined: Lansing

Extracting text from a 1935 federal document

Historical spatial research depends on archival documents — hand-drawn maps, typed field reports, federal survey forms — that exist only as scanned images. OCR (Optical Character Recognition) converts those images into machine-readable text, making them searchable and citable.

This demo runs Tesseract.js — an open-source OCR engine — directly in your browser on the same 1935 HOLC Security Map used as the base layer in the main app. Select a region, click Run OCR, and see what the engine extracts. Words are color-coded by confidence score.

No data leaves your browser. Tesseract.js runs entirely client-side via WebAssembly. No API key required. First run downloads ~10 MB of language data; subsequent runs are instant.

Select a region and click Run OCR to begin.
Source document
1935 HOLC Security Map of Lansing, Michigan. A large colored map showing neighborhood grades. The upper-right contains a title block and city statistics. The lower-right contains a handwritten legend with the four HOLC grades.
1935 HOLC Security Map — City of Lansing, East Lansing and Vicinity, Michigan. Published by Chamber of Commerce; compiled by Pease Engineering Co. Source: Mapping Inequality / University of Richmond.
Extracted text
Extracted text will appear here. Words are color-coded: green = high confidence, yellow = medium, red = low confidence.
High confidence (>80%) Medium (50–80%) Low (<50%)

What the engine found

Why some text is wrong: Tesseract performs best on clean, modern, high-contrast printed text. This 1935 scan presents several challenges: faded ink and age-yellowed paper reduce contrast; the map's color zones bleed into text regions; street names are printed at angles following roads; and the handwritten legend at the bottom right is outside Tesseract's printed-text training data entirely. In a production archival workflow, preprocessing steps — binarization, deskew, contrast enhancement — would improve accuracy. For degraded historical documents, Google Cloud Vision or Azure Document Intelligence typically outperform open-source engines significantly. Google Vision includes 1,000 free requests/month (billing account required); paid usage is $1.50 per 1,000 pages.