Home Blog How our ML works

How Our ML Works: Tabular & Vision CNN (and What’s Next)

A practical look at what powers AuthentiScan today—and what we’re training next for video and general AI content detection.

Published Oct 6, 2025 · by AuthentiScan · ~6 min read

AuthentiScan combines two complementary machine learning approaches for AI-image detection today: a Tabular model that ingests engineered forensic signals, and a Vision CNN that learns visual patterns directly from pixels. We fuse and calibrate their outputs to present a transparent, human-readable result.

Tabular model (engineered forensic signals)

The tabular model aggregates measurable signals that often differ between camera-native photos and model-generated images. Examples include:

These features are normalized and fed to a lightweight classifier (e.g., gradient-boosted trees or logistic ensemble). The tabular model is fast, robust to many image sizes, and excels when metadata and compression traces survive. It’s also highly interpretable—great for our “why did we think this?” breakdown.

Vision CNN (learned pixel patterns)

Our vision model is a convolutional neural network trained on diverse real vs. AI-image datasets (multiple generators, prompts, and styles). It ingests resized crops/patches and leverages data augmentation (resize, slight blur, JPEG re-encode, small color jitter) to reduce overfitting to any single source.

The CNN shines when metadata is missing or images have been re-encoded: it captures subtle textural cues that are hard to hand-engineer. We still surface limitations (“uncertain” zones) where both models disagree or cues are weak.

How we combine them

We compute both scores, check agreement, then apply a small calibrator (stacking) to produce the final estimate with confidence. When models disagree, we show you the evidence (e.g., ELA overlays, FFT heatmaps) so you can weigh context and decide. Results are probabilistic—we never present a single opaque verdict.

Evaluation (ongoing)

Privacy note

We process files transiently to compute signals and model outputs. We don’t keep uploads for user scans. For our internal training, we use curated datasets and opt-in collections. See Privacy for details.

Known limitations

What we’re training next

1) AI video detection

We’re training a video pipeline that samples frames and short clips, then aggregates evidence across time. The stack includes:

Output will show a per-frame timeline plus a clip-level estimate, highlighting where artifacts concentrate.

2) General AI content detection

Beyond images and video, we’re expanding to broader AI-generated content signals:

As always, we’ll present transparent rationales and uncertainty—detection is one input to your judgment, not a final arbiter.

Try AuthentiScan

Upload an image or paste a link to get a transparent breakdown.

Open app