Home How it works

How AuthentiScan Works

A transparent, multi-signal ensemble that combines our in-house machine-learning models with classic media forensics.

Frame-level video analysis is live • Temporal video ML is on the roadmap

At-a-glance visuals #

A quick peek at how we calibrate the score and why interpretable cues matter.

Calibration overview showing how detector scores and confidences blend into a final probability.
Calibrated blend of detectors → final probability.
AuthentiScan comparing an AI image and a real image with interpretable cues.
Interpretable cues: ML + forensics for practical judgement.

Ensemble at a glance #

We compute several detector scores and fuse them into one overall estimate. The ensemble currently includes:

  • Our ML image model (live): calibrated classifier over engineered and deep features.
  • Our ML text model (live): a supervised block classifier that powers the paragraph heatmap for documents.
  • Media forensics: ELA, FFT-based texture analysis, JPEG grid/cues, EXIF/C2PA checks.
  • Platform metadata signal (video): lightweight parsing of creator/title/description/tags for declared “AI-generated” hints.

Scores are combined with weights and confidence into a single probability. We keep the breakdown visible so you can see why a result moved up or down.

Images #

  • In-house ML (primary): image-level features + calibrated forensics.
  • ELA to highlight recompression anomalies.
  • FFT to surface periodic textures.
  • JPEG cues including block/grid behavior and chroma handling.
  • Metadata & provenance (EXIF presence/consistency; surface C2PA if present).

Video — what we do (developer view) #

  1. Segment selection: for social URLs (YouTube/TikTok) we fetch four segments, one per quarter of the timeline (Q1–Q4). This gives broad coverage on long videos without scraping everything.
  2. Codec sanitization: downloaded segments are re-encoded with FFmpeg to H.264 (yuv420p, ~1s GOP, scene-cut disabled). This removes mid-GOP artifacts that can break frame seeking and avoids decoder warnings.
  3. Frame sampling: we read the stitched clip and sample up to ~2 fps, with a duration-aware target (≈16–40 frames depending on clip length). After each seek we discard a few “warm-up” frames to avoid residual corruption.
  4. Diversity filter: frames are kept only if their HSV histogram distance clears a threshold (Bhattacharyya > 0.22). This yields distinct frames across scene changes, not near-duplicates.
  5. Per-frame analysis: each frame runs through the same image pipeline (our ML model + forensics + EXIF-style cues where applicable).
  6. Metadata signal: we scan video metadata (title/description/tags/uploader) for declared AI markers (e.g., “AI-generated”, tool names, #aivideo). If present, it contributes a modest positive weight as “Platform AI label”.
  7. Aggregation: for each detector we take the median across frames, then blend detectors with weights and confidence using a calibrated sigmoid. The top contributing signals are exposed in the UI.
Today: frame-level analysis with platform metadata cues. Next: temporal ML (motion/consistency models) to catch dynamic artifacts that single frames can’t.

Text #

Documents are processed with an ML block classifier trained on real and AI-written samples. We split the text into small blocks, score each block, and then:

  • Overall score: a calibrated blend of block scores summarises the document.
  • Estimated AI-written fraction: the share of blocks that exceed our “AI-like” threshold (≥ 60%).
  • Paragraph heatmap: each paragraph gets a score by overlap-weighting the ML block scores inside it.

If the ML text model is unavailable, we fall back to a heuristic analysis (burstiness, repetition, TTR, punctuation, entropy), but ML is used whenever possible.

Web & mixed content #

For URLs and documents, we extract embedded media/text and run the appropriate detectors. As we expand ML coverage, web, audio, and mixed content will receive specialized models.

How we combine signals #

Each detector outputs a score and confidence. We weight contributions and blend them into an overall probability, then calibrate. Image ML carries a strong weight; text ML for documents; forensic and metadata signals act as corroboration. Strong camera provenance (e.g., consistent EXIF) can counterbalance AI-like cues.

Interpretation #

Outputs are probabilistic—not definitive. Combine detector signals with context, source credibility, and any available provenance. For borderline cases, review the per-signal breakdown and paragraph heatmap shown in the app.

Ready to try AuthentiScan?
Open app