EVALUATION REPORT

Private Identity® Facial Age Estimation Algorithm Accuracy - 2025

Prepared by: PrivateID ML Team August 24, 2025

Purpose and Background

This report provides a detailed regulatory evaluation of Private Identity's Facial Age Estimation Algorithm. The analysis uses precision and recall metrics derived from an independent evaluation by the UK's Age Check Certification Scheme (ACCS). This information is essential for state regulators assessing compliance of facial age estimation technologies with standards protecting minors from inappropriate access to age-restricted products and services.

The Private Identity Facial Age Estimation (FAE) algorithms utilize advanced biometric analysis techniques to estimate a person's age from their facial features. Regulators may use this Private Identity algorithm to stringently ensure minors are protected from age-restricted platforms and services while eliminating consent requirements.

Executive Summary
Purpose & Background
Evaluation Methodology
Key Definitions & Metric Interpretation
Detailed Results
Analysis & Policy Scenarios
Operational Guidance
Privacy & Compliance Position
Conclusion

1. Executive Summary

Evaluation context. The UK Age Check Certification Scheme (ACCS) independently assessed Private Identity's Facial Age Estimation (FAE) on verified faces aged 14–25. Admission as "adult" is defined by a configurable threshold τ applied to the model's predicted age ŷ (admit if ŷ ≥ τ).

What the numbers show. The threshold behaves as an operational dial between minor protection (fewer false positives) and adult convenience (higher pass-through). Key operating points from the evaluation are below.

1.1 Comparative operating points (n=343)

Age Threshold	Minors Allowed (% of minors)	Minors in Admitted (1 − precision)	Adult Pass-Through (recall)	TP	FP	FN
18	83.33% (15/18)	5.00%	87.69%	285	15	40
19	38.89% (7/18)	3.10%	67.38%	219	7	106
20	27.78% (5/18)	2.96%	50.46%	164	5	161
21	16.67% (3/18)	2.54%	35.38%	115	3	210
≥22	0.00% (0/18)	0.00%	23.08%	75	0	250

1.2 Operational profiles

Profile A (throughput-conscious): τ = 20

Admitted-set purity: 97.04% adults (≈2.96% minors among those admitted).
Adult pass-through: 50.46% admitted immediately.
Minor protection at the gate: ~72% of minors blocked pre-fallback.
Borderline handling (common practice): treat predictions near the threshold (e.g., 20 ≤ ŷ < 21) with on-device ID verification (DOB parse + selfie↔ID portrait match) to collapse residual risk without exporting images or templates.

Profile B (safety-maximizing): τ = 22

Observed minors in admitted: 0%.
Adult pass-through: 23.08% admitted immediately.
Often paired with broader use of the same on-device fallback for users below the threshold.

1.3 Policy economics (for any chosen τ)

A simple loss functional:

L = λ_FP × FP + λ_FN × FN

maps statutory priorities to an operating point. In this evaluation, indifference between τ = 20 and τ = 22 occurs at:

λ_FP / λ_FN = ΔFN / ΔFP = 17.8

given the observed FP = -5 and FN = +89. The same calculation can be repeated for other adjacent thresholds (e.g., 19↔20, 20↔21).

1.4 Privacy posture (applies across profiles)

On-device only: capture, liveness, age estimation, and (when used) ID parsing + selfie↔ID matching execute locally.
No biometric exhaust: no images, embeddings, or templates are transmitted or stored; only anonymized decision signals are emitted for enforcement and audit.
Regulatory position: based on this design, Private Identity's position is that privacy law obligations under CCPA/CPRA, BIPA, and GDPR do not attach; operators may still layer notices/DPIA/LIA per jurisdictional preference.
Optional transitional server-assisted posture: for customers transitioning to on-device processing, facial images may be securely transmitted to Private Identity's servers for inference using the same models (same accuracy). Images are processed in volatile memory and deleted immediately after inference; no PII, facial images, templates, biometric data, or biometric information are retained by the customer or Private Identity for more than one second. Across both on-device and server-assist postures, only anonymized decision signals are emitted for audit.

2. Purpose & Background

2.1 Purpose

This report provides regulators and operators with a transparent, measurement-first view of how a configurable age threshold τ applied to Private Identity's Facial Age Estimation (FAE) translates into concrete operating characteristics: minor protection (low false positives), adult convenience (high pass-through), and privacy preservation (no biometric exhaust). The goal is to enable evidence-based selection of τ and accompanying controls using the same underlying statistics. The read-outs in this report apply to the two supported deployment postures: on-device and transitional server-assisted.

2.2 Regulatory context

Policymakers face a three-variable optimization: (i) preventing minors' access to restricted goods and services; (ii) minimizing unnecessary friction for legitimate adults; and (iii) avoiding creation or transfer of biometric data that could expand privacy risk. A thresholded FAE—with a well-defined fallback for borderline cases—exposes that trade space explicitly, allowing oversight bodies to align outcomes with statutory priorities without mandating a single prescription.

2.3 Where this is used

Online platforms & app ecosystems: account creation, feature unlocks, age-gated communities.
E-commerce & delivery: purchase and handoff of age-restricted products.
In-store & self-service: kiosks and self-checkout for regulated SKUs.
Creator/live environments: gating of monetization or audience features.

2.4 Role of FAE in the gating pipeline

Capture & liveness (on device) to confirm a live, human subject under basic capture constraints.
Face detection & alignment to normalize the region of interest.
Age estimation returning a predicted age ŷ.
Threshold decision: admit as "adult" if ŷ ≥ τ; otherwise withhold.
Borderline handling (common operational pattern): if ŷ lies in a narrow band above τ, invoke on-device ID fallback—parse date of birth from a government ID and perform an on-device selfie↔ID portrait match. Images and templates remain local throughout.
Optional transitional server-assisted posture (limited duration for migrating customers): capture (client) → encrypted transmit to Private Identity → server-side liveness/age estimation using the same models (same accuracy) → immediate deletion of facial images/templates after inference → anonymized decision signal returned.

2.5 Evaluation background

Independent evaluator: UK Age Check Certification Scheme (ACCS).
Cohort: 343 verified faces, ages 14–25, with authenticated ground-truth ages.
Procedure: Batch inference under evaluator control; no model tuning on the cohort.
Outputs reported: counts (TP, FP, FN) and rates (precision, recall) at multiple thresholds τ.

2.6 How to read the metrics

Precision = TP/(TP+FP) directly reflects minor presence among admitted users ≈ (1 − precision).
Recall = TP/(TP+FN) reflects adult pass-through prior to any fallback.
Policy lever: Adjusting τ moves both measures in predictable, monotonic ways; a narrow borderline band with ID fallback concentrates verification where uncertainty is highest while leaving clearly adult users unburdened.

2.7 Threat model and safeguards

Presentation attacks: Liveness and anti-spoof checks (e.g., print/replay/deepfake cues) run before age estimation.
Borrowed/third-party IDs: When fallback is invoked, on-device selfie↔ID portrait matching ties the documentary DOB to the live subject.
Operational misuse: Only anonymized decision signals (e.g., pass/deny, threshold bucket, reason codes) are emitted; no face images, embeddings, or templates are exported.

2.8 Privacy architecture

On-device posture: PII, images and templates never leave the device.
Transitional server-assisted posture: facial images are accepted solely to compute a decision using the same models and are deleted immediately after inference (processed in volatile memory; no disk persistence). Only anonymized decision telemetry remains for audit.

2.9 Governance and tuning

Operators typically (a) select τ consistent with their regulatory posture, (b) define a narrow borderline band for deterministic on-device fallback, and (c) monitor anonymized outcomes (decision distribution, fallback utilization, error codes) to keep admitted-set purity and adult pass-through within documented service levels.

3. Evaluation Methodology

3.1 Independent evaluator and scope

Evaluator: UK Age Check Certification Scheme (ACCS).
Scope: Offline assessment of Private Identity's Facial Age Estimation (FAE) using evaluator-controlled inputs and procedures; outputs reviewed as classification outcomes at multiple thresholds.

3.2 Cohort and ground truth

Cohort: 343 verified facial images from individuals aged 14–25.
Ground truth: Subject ages authenticated by ACCS prior to testing; adult defined as ≥18 years for this report.
Adult/minor composition: 325 adults, 18 minors.

3.3 Test harness and execution controls

Execution mode: Batch inference under ACCS control; identical model build and configuration across all runs.
Environment: Fixed runtime (documented binary hash/model ID); deterministic settings for reproducibility.
No training/tuning: Model weights and thresholds were not adapted using the cohort; thresholds were only applied for decisioning.

3.4 Inference pipeline

Capture quality & liveness checks executed locally to reject non-live or invalid frames.
Face detection & alignment to normalize geometry and illumination where applicable.
Age estimation to produce a scalar predicted age ŷ in years.
Threshold decision at τ: classify adult if ŷ ≥ τ, otherwise minor.

Note: ACCS evaluated the FAE threshold decision. Operational on-device ID fallback described here (DOB extraction from government ID + selfie↔ID portrait match) is a deployment control, and not part of ACCS scoring.

3.5 Metrics reported

For each threshold τ, ACCS recorded:

TP (true positives): Adults correctly classified as adult.
FP (false positives): Minors classified as adult.
FN (false negatives): Adults classified as minor.

From these counts:

Precision = TP/(TP+FP) — proportion of admitted users who are truly adult.
Recall = TP/(TP+FN) — proportion of true adults admitted directly.

Deployment-oriented views (derived):

Minors in admitted = FP/(TP+FP) = (1 − precision).
Minors allowed (% of minors) = FP/18 in this cohort.
Adult friction = FN/(TP+FN) = (1 − recall).

3.6 Data integrity and consistency checks

Mass balance: At τ = 18, 285+40 = 325 adults; 15+3 = 18 minors, consistent with cohort composition.
Monotonicity: As τ increases, FP decreases and FN increases as expected; reported precision/recall align with the observed TP/FP/FN.

3.7 Privacy handling during evaluation

On-device processing model: The algorithmic steps mirror production design where capture, liveness, and age estimation execute locally.
No biometric persistence: No PII, facial images, face embeddings, or templates were stored or transmitted by Private Identity as part of the evaluation; only aggregate metrics and counts were retained.

3.8 Replicability (operator audit recipe)

To reproduce the evaluation on a verified local sample:

Fix model build: Record model ID, binary hash, and configuration.
Establish ground truth: Verify subject ages (e.g., documentary checks).
Run evaluation: Execute the pipeline in evaluation mode; produce ŷ per subject and apply the threshold grid.
Report: Publish TP/FP/FN and precision/recall at each τ, along with the cohort composition and decision rules.
Retention: Store only non-identifying decision summaries for audit; do not persist images or biometric templates.

4. Key Definitions & Metric Interpretation

4.1 Classification framing

Predicted age: ŷ (years).
Threshold rule: admit as "adult" iff ŷ ≥ τ.
Ground truth: "Adult" = age ≥ 18; "Minor" = age < 18.

For any τ, the evaluation yields counts:

TP — adults admitted; FP — minors admitted; FN — adults withheld. (Consequently TN = minors withheld, derivable from the cohort.)

4.2 Primitive performance metrics

Precision = TP/(TP+FP). Read-out: purity of the admitted set (higher ⇒ fewer minors among admitted).
Recall = TP/(TP+FN). Read-out: adult pass-through prior to any fallback (higher ⇒ less friction for adults).

4.3 Deployment-oriented views (derived from the same counts)

Minors in admitted = FP/(TP+FP) = (1 − precision). Interpretation: fraction of admitted users who are minors.
Minors allowed (% of minors) = FP/18. Interpretation: fraction of minors who would pass the gate.
Adult friction = FN/(TP+FN) = (1 − recall). Interpretation: fraction of adults who would be withheld (and typically routed to fallback).

4.4 How the threshold moves the metrics (monotonic behavior)

As τ increases: FP non-increasing, FN non-decreasing.
Therefore: precision non-decreasing (admitted set becomes "cleaner"); recall non-increasing (fewer adults pass immediately).
This monotonicity allows straightforward selection of operating profiles that emphasize either admitted-set purity or adult pass-through, with the same underlying model.

4.5 Interpreting two common operating profiles (read-outs)

Profile A — τ = 20 (balanced throughput/purity, pre-fallback)

Precision ≈ 97.04% ⇒ minors in admitted ≈ 2.96%.
Recall ≈ 50.46% ⇒ roughly half of adults pass immediately.
Minors allowed ≈ 27.78% (blocked ≈ 72.22% at the gate).
Operational note: uncertainty concentrates near the threshold; many deployments treat a narrow band above τ with an on-device ID check (DOB parse + selfie↔ID portrait match) to resolve residual risk without exporting images or templates.

Profile B — τ = 22 (safety-maximizing, pre-fallback)

Precision = 100.00% (observed) ⇒ minors in admitted = 0.00% (observed).
Recall ≈ 23.08% ⇒ a smaller subset of adults pass immediately.
Operational note: commonly paired with broader use of the same on-device fallback for users below the threshold.

4.6 Decision-analytic view (cost-weighted comparison)

For any τ, a simple loss functional:

L = λ_FP × FP + λ_FN × FN

maps statutory priorities into an operating point. Using the reported counts, one can compute indifference ratios λ_FP/λ_FN between neighboring thresholds (e.g., 19↔20, 20↔21, 20↔22) to understand when a higher-purity, lower-throughput setting becomes favored.

4.7 Telemetry for audit (anonymized, model-centric)

Deployments typically retain only non-identifying signals sufficient to reconstruct these metrics: decision (pass/deny/fallback), threshold_bucket, fallback_used/result, liveness_pass, error_code, latency_ms, model_id/hash, and config_version. No images, embeddings, or face templates are transmitted or stored.

5. Detailed Results

Cohort: 343 verified faces (ages 14–25). Decision rule: admit as "adult" if ŷ ≥ τ.

5.1 ACCS-reported counts and rates

Threshold τ	Precision (%)	Recall (%)	TP	FP	FN
18	95.00	87.69	285	15	40
19	96.90	67.38	219	7	106
20	97.04	50.46	164	5	161
21	97.46	35.38	115	3	210
22	100.00	23.08	75	0	250
23	100.00	9.85	32	0	293
24	100.00	2.77	9	0	316
25	100.00	0.62	2	0	323

Read-outs: As τ increases, FP decreases (admitted set becomes purer) and FN increases (adult pass-through falls), with monotone precision↑ and recall↓.

5.2 Deployment-oriented derivations (same counts)

Threshold τ	Minors Allowed (% of minors)	Minors in Admitted (1 − precision)	Adult Friction (1 − recall)	Predicted Adults (TP+FP)	Predicted Minors*
18	83.33% (15/18)	5.00%	12.31%	300	43
19	38.89% (7/18)	3.10%	32.62%	226	117
20	27.78% (5/18)	2.96%	49.54%	169	174
21	16.67% (3/18)	2.54%	64.62%	118	225
22	0.00% (0/18)	0.00%	76.92%	75	268
23	0.00%	0.00%	90.15%	32	311
24	0.00%	0.00%	97.23%	9	334
25	0.00%	0.00%	99.38%	2	341

*Predicted Minors = 343 - Predicted Adults.

Read-outs:

Minors Allowed (% of minors) quantifies gate strength against minors.
Minors in Admitted (1 − precision) quantifies admitted-set purity.
Adult Friction (1 − recall) quantifies how many adults are withheld pre-fallback.
Predicted Adults is the admitted set size at each τ.

5.3 Two common profiles (concise summaries)

τ = 20: admitted-set purity ≈ 97.04% adults (≈2.96% minors among admitted); adult pass-through ≈ 50.46%; minors allowed ≈ 27.78% pre-fallback.
τ = 22: observed 0 minors among admitted; adult pass-through ≈ 23.08%; minors allowed 0%.

6. Analysis & Policy Scenarios

6.1 Trade-space at a glance (counts drive policy)

Raising the threshold τ monotonically decreases FP (fewer minors admitted) and increases FN (more adults withheld). The deltas between adjacent thresholds are below.

Move (↑ T)	ΔFP (minors admitted)	ΔFN (adults withheld)	Indifference ratio*
18 → 19	−8	+66	8.25
19 → 20	−2	+55	27.50
20 → 21	−2	+49	24.50
21 → 22	−3	+40	13.33

*Indifference ratio defined as ΔFN/ΔFP: the relative penalty on admitting one minor vs. blocking one adult that makes moving to the higher threshold neutral, given the observed counts.

6.2 Two frequently referenced operating profiles (read-outs)

Profile A — τ = 20 (balanced throughput/purity, pre-fallback)

Admitted-set purity (1 − minors in admitted): precision 97.04% ⇒ minors among admitted 2.96%.
Adult pass-through: recall 50.46% (roughly half of adults pass immediately).
Minors allowed at the gate: 27.78% (blocked ≈ 72.22%).
Admitted set size: 169 (TP+FP).
FP/FN counts: FP 5, FN 161.

Profile B — τ = 22 (safety-maximizing, pre-fallback)

Minors among admitted: 0.00% (observed).
Adult pass-through: 23.08%.
Minors allowed at the gate: 0.00% (observed).
Admitted set size: 75 (TP+FP).
FP/FN counts: FP 0, FN 250.

Movement between these profiles (e.g., 20 ↔ 22) can be assessed with the same delta arithmetic: from τ = 22 down to τ = 20, the system admits +89 additional adults for +5 additional minors; equivalently, 17.8 admitted adults per admitted minor.

6.3 Borderline concentration and targeted verification

Uncertainty concentrates near the chosen τ. A narrow band just above τ (e.g., 20 ≤ ŷ < 21 when τ = 20) captures a disproportionate share of borderline predictions. Deployments commonly treat this band with on-device ID verification (DOB parsed from ID + selfie↔ID portrait match on device) to compress residual risk while leaving clearly adult users unburdened. Images, embeddings, and templates remain local.

6.4 Jurisdictional and product-category read-across

Observed practice varies by statutory exposure and product class: higher-consequence categories emphasize admitted-set purity (higher τ and/or wider borderline handling), while general access flows emphasize adult convenience (mid-range τ with a tight band). The same tables in §5 and deltas in §6.1 support either posture without changing the underlying model.

6.5 Monitoring signals typically tracked (anonymized)

Admitted-set purity: precision (minors in admitted).
Adult pass-through: recall.
Borderline utilization: share of traffic entering the band; fallback conversion rate.
Latency & reliability: end-to-end times; undecidable rates.

All are reconstructible from non-identifying telemetry (decision, threshold bucket, fallback_used/result, error codes, latency, model/config IDs).

6.6 Typical controls available to regulators/operators

Risk envelope: raise/lower τ; widen/narrow the borderline band.
Circuit breakers: temporary step-up of τ or ID-first mode for designated SKUs or periods.
Post-update hygiene: re-score a verified sample after model/config updates; publish the new counts and rates using the same tables.

7. Operational Guidance (read-outs & common patterns)

7.1 Pipeline topology (on-device)

A typical deployment surfaces the following on-device stages: capture → liveness/anti-spoof → face detection & alignment → age estimation ŷ → threshold decision at τ, with an optional borderline handling step performed locally. Images, embeddings, and templates remain on the device; systems export only anonymized decision signals.

7.2 Policy parameters exposed by the system

Threshold τ: minimum predicted age to admit as "adult."
Borderline band δ: optional narrow band just above τ where additional verification is applied; δ is typically modest.
Category toggles: per-SKU or per-feature variants of τ and δ.
Safeguard switches: temporary step-ups of τ, ID-first mode, or liveness sensitivity shifts.

These parameters are sufficient to reproduce any of the operating profiles summarized in §5–§6 using the same underlying counts.

7.3 Borderline handling (common pattern)

Deployments often reserve extra scrutiny for predictions in a narrow interval just above τ. A common approach is on-device ID verification: parse date of birth from a government ID and perform an on-device selfie↔ID portrait comparison with liveness. This concentrates verification where uncertainty is highest while leaving clearly adult users unburdened. No images or templates are transmitted or stored.

7.4 Telemetry available for audit (anonymized)

Typical anonymized fields allow reconstruction of all metrics in §4–§6 without biometric exhaust:

decision (pass / deny / routed)
threshold_bucket (e.g., <T, [T, T+δ), ≥T+δ)
fallback_used and fallback_result (boolean / code)
liveness_pass (boolean) and error_code
latency_ms, device_class
model_id / hash and config_version

7.5 Rollout observations (what operators tend to evaluate)

Admitted-set purity precision for the chosen τ (with or without the band in use).
Adult pass-through (recall) at τ, jointly with fallback conversion within the band.
Band utilization: proportion of traffic landing in [τ, τ+δ).
Latency & reliability: end-to-end time to decision; undecidable rates.
Category effects: differences across SKUs/features when τ and δ vary by category.

7.6 Drift checks and retuning (signals typically watched)

Movement in precision (minors in admitted) and recall relative to historical baselines.
Shifts in band utilization (capture conditions, device mix, or user demographics).
Changes following model/runtime updates (tracked via model_id/config_version). Re-scoring a verified sample with the current τ and δ reproduces the same tables as §5 for comparison over time.

7.7 Elevated-risk periods (controls commonly employed)

When incident signals rise (e.g., spoof patterns, policy events, or product launches), deployments commonly:

increase τ for designated categories,
widen the borderline band δ, or
apply ID-first for targeted flows, and later revert to the steady-state profile once signals normalize.

7.8 Accessibility and edge conditions

Capture UX typically enforces forward gaze, eyes-open, and adequate illumination. Where users face environmental or accessibility constraints, flows often permit fast routing to the local verification band (or human support) without altering the privacy posture.

7.9 Versioning & attestations (artifacts frequently shared)

Model identity: model_id, binary hash, build date.
Policy snapshot: threshold/band settings and category toggles.
Audit packet: ACCS table (counts & rates), local verification results, and telemetry schema demonstrating absence of stored images/templates.

8. Privacy & Compliance Position

Contact: compliance@privateid.com

Context: This section describes the product's technical posture and commonly referenced legal characterizations. It is informational, and not legal advice.

8.1 Architectural posture (data minimization by design)

On-device only: Capture, liveness/anti-spoof, face detection/alignment, age estimation ŷ, and—when invoked—ID parsing + selfie↔ID portrait comparison all execute locally.
No PII or biometric exhaust: The system does not transmit or persist PII, facial images, embeddings, or templates.
Output minimization: Only anonymized decision signals are emitted (e.g., pass/deny/routed, threshold bucket, fallback_used/result, error codes, latency, model/config IDs).

8.2 Legal characterization

Private Identity's on-device position: Given the on-device architecture and absence of PII, biometric transmission, processing or retention, the solution does not "collect, capture, or store" biometric identifiers as commonly defined, and privacy law obligations under CCPA/CPRA, BIPA, and GDPR do not attach. Consent is not required for FAE gating or on-device/document verification fallback in this posture.

Alternative controller framings seen in the field:

Legitimate Interests (child protection; fraud prevention) with LIA/DPIA documenting necessity, proportionality, and safeguards.
Legal obligation where age checks are mandated.
Consent (explicit or contextual) where preferred; disclosures emphasize on-device processing and zero retention of images/templates or (during migration) sub-second server deletion and zero retention.

8.3 Jurisdictional read-outs (typical controller approaches)

United States: BIPA/analogs—avoid server-side possession; short notice language is commonly used even when consent is not strictly required under the product posture. COPPA flows (under-13) are outside this ACCS report's 14–25 evaluation scope.
EU/EEA & UK: Some controllers document an LIA/DPIA while relying on on-device processing to keep personal data off servers; DSRs typically resolve as "no data held" beyond anonymized decision logs.
Other regimes (AUS, CA-PIPEDA, BR, SG): Similar emphasis on purpose limitation, non-export of biometrics, and minimal anonymized telemetry for audit.

8.4 Data map (what exists where)

On-device posture:

Local/ephemeral: selfie frames; facial landmarks; predicted age; optional ID image and parsed DOB; selfie↔ID portrait score; liveness artifacts.
Not transmitted/Not retained: facial images, embeddings/templates, parsed PII.
Emitted (anonymized): decision outcome, threshold bucket, fallback_used/result, liveness_pass, error_code, latency_ms, device_class, model_id/hash, config_version.

Transitional server-assisted posture:

In-memory only, transient (<~1s): uploaded facial (and, if used, document) images; derived features necessary for the single inference.
Immediately deleted: all images/biometric artifacts (no disk writes; memory zeroized).
Emitted (anonymized): same non-identifying decision signals as above.

8.5 Security controls (non-exhaustive)

Transport: TLS 1.3 for policy/config and any transient uploads; optional mTLS and key pinning.
Runtime (server-assisted): ephemeral containers/VMs; in-memory processing; no media logging; strict IAM; audit trails capture only metadata about anonymized decisions.
Runtime (on-device): signed binaries; integrity/attestation hooks; anti-debug/hardening.
Config governance: signed, versioned policy artifacts (thresholds/bands, category toggles) with rollback protection.
Audit trail: append-only, hash-chained logs of anonymized events; no images/templates.

8.6 Telemetry & auditability (reconstructing metrics without biometrics)

The anonymized fields listed in §8.4 suffice to reconstruct precision/recall, minors-in-admitted, adult pass-through, band utilization, and latency distributions.
Typical retention windows are policy-bound (e.g., 12–24 months) and exclude any media or biometric templates.

8.7 Notice language commonly used (illustrative)

Inline notice: "Age is checked on your device to protect minors. No images are stored or sent to our servers."
Fallback notice: "To finish, scan your government ID and take a selfie on this device so we can confirm the ID matches you. All images stay on your device."

8.8 Artifacts frequently provided to oversight bodies

Model identity: model ID, binary hash, build date.
Evaluation tables: ACCS counts & rates; local verified-sample read-outs using the same grid of thresholds.
Policy snapshot: current threshold/band settings and category toggles.
Telemetry schema: proof that only anonymized decision signals are logged; absence of image/template fields.
Change log: model/policy revisions with pre/post read-outs using the same metrics.

9. Conclusion

The evaluation characterizes how a single policy parameter—the age threshold τ applied to predicted age ŷ—maps directly to outcomes that matter for oversight: minor protection (false positives), adult convenience (pass-through), and privacy preservation (no biometric exhaust). The observed operating profiles show:

With τ = 20, the admitted set remains high-purity (precision ≈ 97.04%; minors among admitted ≈ 2.96%) while adult pass-through is materially higher (recall ≈ 50.46%).
With τ = 22, the admitted set is observed free of minors in this evaluation (precision 100.00%), with lower immediate pass-through (recall ≈ 23.08%).

In practice, uncertainty concentrates close to the chosen threshold; many deployments address this by applying on-device verification within a narrow band above τ (document DOB parsing plus selfie↔ID portrait comparison, processed locally). This focuses additional checks where they are most informative while leaving clearly adult users unburdened and preserving the privacy posture: no images, embeddings, or templates transmitted or stored; only anonymized decision signals are emitted for enforcement and audit.

The report provides, for any τ, the counts (TP/FP/FN), the rates (precision/recall), and the deltas between neighboring thresholds, enabling straightforward policy analyses (including cost-weighted views) without altering the underlying model or data flow. These same tables, combined with anonymized telemetry, allow regulators and operators to reproduce, audit, and monitor outcomes over time using the identical measurement framework.

These read-outs are model-centric and posture-neutral. They hold whether inference runs on-device or, during a limited migration period, via a transitional server-assisted path using the same models (same accuracy). In all cases, no PII, facial images, templates, biometric data, or biometric information are retained by the customer or Private Identity for more than one second; only anonymized decision signals persist for audit.

Thank you for reading this report. We welcome your comments, thoughts and suggestions. Please contact us at: compliance@privateid.com