Type Reference Overview

The civic transparency simulation core provides a structured type system for representing temporal activity patterns, content fingerprints, and aggregated metrics. This type system enables reproducible research and standardized analysis across different transparency studies.

Core Concepts

Window Aggregation

The fundamental unit of analysis is a time window containing aggregated activity data. Each window represents a slice of time (typically 10-15 minutes) with calculated metrics and content fingerprints.

Content Clustering

Content is identified through hash-based clustering. Similar content gets grouped under topic identifiers, enabling analysis of how specific topics or themes spread through systems.

Fingerprinting

Content fingerprints use multiple techniques: - SimHash: Locality-sensitive hashing for near-duplicate detection - MinHash: Set similarity estimation for clustering analysis

Temporal Patterns

Activity patterns are captured through: - Time-series data within windows - Cross-window trend analysis
- Burst detection and anomaly scoring

Type Categories

Core Types

Essential data structures for window-based analysis:

WindowAgg: Complete window aggregation with all metrics
ContentHash: Hash-based content identification
Digests: Content fingerprinting data structures

Configuration Types

Control structures for generation and analysis:

EventConfig: Configuration for temporal events and scenarios

Utility Types

Supporting structures for data handling:

ID Management: Consistent identifier schemes
I/O Schema: Serialization and database integration
Registry: Type registration and discovery

Design Principles

Immutability: Core data structures are immutable to ensure consistency across analysis pipelines.

Composability: Types can be combined and extended for different research scenarios.

Serialization: All types support JSON serialization for cross-platform compatibility.

Validation: Built-in validation ensures data integrity throughout the analysis pipeline.

Documentation: Comprehensive docstrings and type hints support IDE integration and static analysis.

Import Patterns

# Core analysis types
from ci.transparency.sdk import WindowAgg, ContentHash, TopHash

# Content fingerprinting
from ci.transparency.sdk import Digests, SimHash64, MinHashSig

# I/O and serialization
from ci.transparency.sdk import windowagg_to_json, windowagg_from_json

# Utility functions
from ci.transparency.sim.metrics import herfindahl, cv_of_bins

Next Steps

Window Aggregation - Core analysis data structure
Content Hashing - Content identification and clustering
I/O Schema - Serialization and database integration