Privacy Principles

This specification is privacy-first:

k-anonymity (k ≥ 100): Aggregations are only exposed when large enough.
No content, no IDs: Only behavioral metadata is modeled.
Bucketed values: Account age, automation flags, and client families are categorical.
Daily-salted hashes: Duplicate detection hashes rotate daily to prevent cross-dataset correlation
Geographic limits: At most country-level, or large subdivisions (≥1M population).

Guiding Principles

No personal data. The system does not collect or expose account IDs, handles, user names, or message text.
Bucketed fields. Potentially sensitive attributes (e.g., account age, follower count) are reported only in broad, non-identifying ranges.
Aggregate-only outputs. All metrics are computed across large groups; no individual-level records are exposed.
Minimum group size thresholds. Public outputs are only generated when k-anonymity ≥ 100 is satisfied, reducing re-identification risk.
No cross-platform joins. Schema design avoids linkable identifiers across platforms, making external joining infeasible.

Small-count suppression. Metrics for small groups are suppressed to prevent triangulation attacks.
Raw-post decoupling. Public outputs are derived from aggregate computations, not individual post logs.
Internal-only provenance tags. Tags are used to compute trends and coordination patterns, but never exposed at the individual post level.

Privacy-aligned. Meets the intent of privacy regulations such as the GDPR, CCPA, and other global standards.
Transparency-compliant. Enables platforms to meet obligations under emerging transparency laws (e.g., the EU Digital Services Act) without revealing personally identifiable information.