How It Works — AuthentiScan

At-a-glance visuals #

A quick peek at how we calibrate the score and why interpretable cues matter.

Calibration overview showing how detector scores and confidences blend into a final probability. — Calibrated blend of detectors → final probability.

AuthentiScan comparing an AI image and a real image with interpretable cues. — Interpretable cues: ML + forensics for practical judgement.

Ensemble at a glance #

We compute several detector scores and fuse them into one overall estimate. The ensemble currently includes:

Our ML image model (live): calibrated classifier over engineered and deep features.
Our ML text model (live): a supervised block classifier that powers the paragraph heatmap for documents.
Media forensics: ELA, FFT-based texture analysis, JPEG grid/cues, EXIF/C2PA checks.
Platform metadata signal (video): lightweight parsing of creator/title/description/tags for declared “AI-generated” hints.

Scores are combined with weights and confidence into a single probability. We keep the breakdown visible so you can see why a result moved up or down.

Images #

In-house ML (primary): image-level features + calibrated forensics.
ELA to highlight recompression anomalies.
FFT to surface periodic textures.
JPEG cues including block/grid behavior and chroma handling.
Metadata & provenance (EXIF presence/consistency; surface C2PA if present).

Video — what we do (developer view) #

Segment selection: for social URLs (YouTube/TikTok) we fetch four segments, one per quarter of the timeline (Q1–Q4). This gives broad coverage on long videos without scraping everything.
Codec sanitization: downloaded segments are re-encoded with FFmpeg to H.264 (yuv420p, ~1s GOP, scene-cut disabled). This removes mid-GOP artifacts that can break frame seeking and avoids decoder warnings.
Frame sampling: we read the stitched clip and sample up to ~2 fps, with a duration-aware target (≈16–40 frames depending on clip length). After each seek we discard a few “warm-up” frames to avoid residual corruption.
Diversity filter: frames are kept only if their HSV histogram distance clears a threshold (Bhattacharyya > 0.22). This yields distinct frames across scene changes, not near-duplicates.
Per-frame analysis: each frame runs through the same image pipeline (our ML model + forensics + EXIF-style cues where applicable).
Metadata signal: we scan video metadata (title/description/tags/uploader) for declared AI markers (e.g., “AI-generated”, tool names, #aivideo). If present, it contributes a modest positive weight as “Platform AI label”.
Aggregation: for each detector we take the median across frames, then blend detectors with weights and confidence using a calibrated sigmoid. The top contributing signals are exposed in the UI.

Today: frame-level analysis with platform metadata cues. Next: temporal ML (motion/consistency models) to catch dynamic artifacts that single frames can’t.

Text #

Documents are processed with an ML block classifier trained on real and AI-written samples. We split the text into small blocks, score each block, and then:

Overall score: a calibrated blend of block scores summarises the document.
Estimated AI-written fraction: the share of blocks that exceed our “AI-like” threshold (≥ 60%).
Paragraph heatmap: each paragraph gets a score by overlap-weighting the ML block scores inside it.

If the ML text model is unavailable, we fall back to a heuristic analysis (burstiness, repetition, TTR, punctuation, entropy), but ML is used whenever possible.

Web & mixed content #

For URLs and documents, we extract embedded media/text and run the appropriate detectors. As we expand ML coverage, web, audio, and mixed content will receive specialized models.

How we combine signals #

Each detector outputs a score and confidence. We weight contributions and blend them into an overall probability, then calibrate. Image ML carries a strong weight; text ML for documents; forensic and metadata signals act as corroboration. Strong camera provenance (e.g., consistent EXIF) can counterbalance AI-like cues.

Interpretation #

Outputs are probabilistic—not definitive. Combine detector signals with context, source credibility, and any available provenance. For borderline cases, review the per-signal breakdown and paragraph heatmap shown in the app.