API Reference¶

Core Modules¶

Configuration¶

civic_interconnect.paperkit.config ¶

Configuration module for paper kit metadata handling.

This module provides: - TypedDict definitions for asset and metadata configuration - Functions to load and normalize metadata from YAML files - Default file extension configurations for allowed assets

File: src/civic_interconnect/paperkit/config.py

DirectAssetTD ¶

Bases: TypedDict

TypedDict for direct asset configuration.

Attributes:

Name	Type	Description
`url`	`str`	The URL of the asset.
`filename`	`NotRequired[str]`	Optional filename for the asset.
`checksum`	`NotRequired[str]`	Optional checksum for the asset.

Source code in src/civic_interconnect/paperkit/config.py

class DirectAssetTD(TypedDict, total=False):
    """TypedDict for direct asset configuration.

    Attributes
    ----------
    url : str
        The URL of the asset.
    filename : NotRequired[str]
        Optional filename for the asset.
    checksum : NotRequired[str]
        Optional checksum for the asset.
    """

    url: str
    filename: NotRequired[str]
    checksum: NotRequired[str]

EntryMetaTD ¶

Bases: TypedDict

TypedDict for entry metadata configuration.

Attributes:

Name	Type	Description
`notes`	`NotRequired[str]`	Optional notes about the entry.
`out_dir`	`NotRequired[str]`	Optional output directory for the entry.
`assets`	`NotRequired[list[AssetTD]]`	Optional list of assets associated with the entry.

Source code in src/civic_interconnect/paperkit/config.py

class EntryMetaTD(TypedDict, total=False):
    """TypedDict for entry metadata configuration.

    Attributes
    ----------
    notes : NotRequired[str]
        Optional notes about the entry.
    out_dir : NotRequired[str]
        Optional output directory for the entry.
    assets : NotRequired[list[AssetTD]]
        Optional list of assets associated with the entry.
    """

    notes: NotRequired[str]
    out_dir: NotRequired[str]
    assets: NotRequired[list[AssetTD]]

PageAssetTD ¶

Bases: TypedDict

TypedDict for page-based asset configuration.

Attributes:

Name	Type	Description
`page_url`	`str`	The URL of the page to scrape for assets.
`allow_ext`	`NotRequired[list[str]]`	Optional list of allowed file extensions.
`href_regex`	`NotRequired[str]`	Optional regex pattern to match href attributes.
`limit`	`NotRequired[int]`	Optional limit on number of assets to collect.
`base_url`	`NotRequired[str]`	Optional base URL for relative links.

Source code in src/civic_interconnect/paperkit/config.py

class PageAssetTD(TypedDict, total=False):
    """TypedDict for page-based asset configuration.

    Attributes
    ----------
    page_url : str
        The URL of the page to scrape for assets.
    allow_ext : NotRequired[list[str]]
        Optional list of allowed file extensions.
    href_regex : NotRequired[str]
        Optional regex pattern to match href attributes.
    limit : NotRequired[int]
        Optional limit on number of assets to collect.
    base_url : NotRequired[str]
        Optional base URL for relative links.
    """

    page_url: str
    allow_ext: NotRequired[list[str]]
    href_regex: NotRequired[str]
    limit: NotRequired[int]
    base_url: NotRequired[str]

load_meta ¶

load_meta(meta_path: Path) -> MetaTD

Load metadata from a YAML file.

Parameters:

Name	Type	Description	Default
`meta_path`	`Path`	Path to the YAML metadata file.	required

Returns:

Type	Description
`MetaTD`	Dictionary containing the loaded and normalized metadata entries.

Raises:

Type	Description
`ValueError`	If the YAML file does not contain a mapping of bibkeys.

Source code in src/civic_interconnect/paperkit/config.py

def load_meta(meta_path: Path) -> MetaTD:
    """Load metadata from a YAML file.

    Parameters
    ----------
    meta_path : Path
        Path to the YAML metadata file.

    Returns
    -------
    MetaTD
        Dictionary containing the loaded and normalized metadata entries.

    Raises
    ------
    ValueError
        If the YAML file does not contain a mapping of bibkeys.
    """
    raw_text = meta_path.read_text(encoding="utf-8")
    loaded: Any = yaml.safe_load(raw_text)
    if loaded is None:
        data: MetaTD = {}
    elif isinstance(loaded, dict):
        data = cast("MetaTD", loaded)
    else:
        raise ValueError("refs_meta.yaml must be a mapping of bibkeys")

    # Normalize each entry
    for key, entry in list(data.items()):
        data[key] = _normalize_entry(entry)

    logger.info("Loaded meta for %d keys from %s", len(data), meta_path)
    return data

Bibliography¶

civic_interconnect.paperkit.bib ¶

Bibliography handling utilities for the paper kit.

This module provides functionality for loading and processing BibTeX files: - BibEntry: TypedDict for bibliography entries - BibDatabaseLike: Protocol for bibliography database objects - load_bib_keys: Function to extract citation keys from BibTeX files

File: src/civic_interconnect/paperkit/bib.py

BibDatabaseLike ¶

Bases: Protocol

Protocol for bibliography database objects.

This protocol defines the interface for bibliography database objects that contain a list of bibliography entries and support attribute access.

Attributes:

Name	Type	Description
`entries`	`List[BibEntry]`	A list of bibliography entries from the database.

Methods:

Name	Description
`__getattr__`	Provide access to additional attributes on the database object.

Source code in src/civic_interconnect/paperkit/bib.py

class BibDatabaseLike(Protocol):
    """Protocol for bibliography database objects.

    This protocol defines the interface for bibliography database objects
    that contain a list of bibliography entries and support attribute access.

    Attributes
    ----------
    entries : List[BibEntry]
        A list of bibliography entries from the database.

    Methods
    -------
    __getattr__(name: str) -> object
        Provide access to additional attributes on the database object.
    """

    entries: list[BibEntry]

    def __getattr__(self, name: str) -> object:
        """Provide access to additional attributes on the database object."""
        ...

getattr ¶

__getattr__(name: str) -> object

Provide access to additional attributes on the database object.

Source code in src/civic_interconnect/paperkit/bib.py

def __getattr__(self, name: str) -> object:
    """Provide access to additional attributes on the database object."""
    ...

BibEntry ¶

Bases: TypedDict

A bibliography entry from a BibTeX file.

Attributes:

Name	Type	Description
`ID`	`str`	The citation key/identifier for the bibliography entry.

Source code in src/civic_interconnect/paperkit/bib.py

class BibEntry(TypedDict, total=False):
    """A bibliography entry from a BibTeX file.

    Attributes
    ----------
    ID : str
        The citation key/identifier for the bibliography entry.
    """

    ID: str

load_bib_keys ¶

load_bib_keys(bib_path: Path) -> list[str]

Load citation keys from a BibTeX file.

Source code in src/civic_interconnect/paperkit/bib.py

def load_bib_keys(bib_path: Path) -> list[str]:
    """Load citation keys from a BibTeX file."""
    with bib_path.open("r", encoding="utf-8") as f:
        db_raw: BibDatabaseLike = bibtexparser.load(f)  # type: ignore[assignment]

    entries: list[BibEntry] = db_raw.entries
    keys: list[str] = [e["ID"] for e in entries if "ID" in e]

    logger.debug("Loaded %d keys from %s", len(keys), bib_path)
    return keys

Orchestration¶

civic_interconnect.paperkit.orchestrate ¶

Orchestration module for downloading and managing assets linked to bibliography entries.

This module provides: - DownloadRecord and Summary dataclasses for tracking downloads, - Functions to guess filenames, run the download process, and handle asset scraping.

File: src/civic_interconnect/paperkit/orchestrate.py

DownloadRecord `dataclass` ¶

Represents a record of downloaded assets for a bibliography entry.

Attributes:

Name	Type	Description
`bibkey`	`str`	The bibliography key associated with the entry.
`paths`	`list[Path]`	List of file paths to successfully downloaded assets.
`errors`	`list[str]`	List of error messages encountered during download.

Source code in src/civic_interconnect/paperkit/orchestrate.py

@dataclass
class DownloadRecord:
    """Represents a record of downloaded assets for a bibliography entry.

    Attributes
    ----------
    bibkey : str
        The bibliography key associated with the entry.
    paths : list[Path]
        List of file paths to successfully downloaded assets.
    errors : list[str]
        List of error messages encountered during download.
    """

    bibkey: str
    paths: list[Path] = field(default_factory=lambda: [])
    errors: list[str] = field(default_factory=lambda: [])

Summary `dataclass` ¶

Summary of the download process for bibliography entries.

Attributes:

Name	Type	Description
`processed`	`list[DownloadRecord]`	List of records for processed entries.
`skipped`	`list[str]`	List of keys that were skipped.

Source code in src/civic_interconnect/paperkit/orchestrate.py

@dataclass
class Summary:
    """Summary of the download process for bibliography entries.

    Attributes
    ----------
    processed : list[DownloadRecord]
        List of records for processed entries.
    skipped : list[str]
        List of keys that were skipped.
    """

    processed: list[DownloadRecord] = field(default_factory=list)
    skipped: list[str] = field(default_factory=list)

guess_filename_from_url ¶

guess_filename_from_url(url: str) -> str

Guess a safe filename from a URL.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL from which to extract the filename.	required

Returns:

Type	Description
`str`	A sanitized filename derived from the URL.

Source code in src/civic_interconnect/paperkit/orchestrate.py

def guess_filename_from_url(url: str) -> str:
    """Guess a safe filename from a URL.

    Parameters
    ----------
    url : str
        The URL from which to extract the filename.

    Returns
    -------
    str
        A sanitized filename derived from the URL.
    """
    base = Path(urlparse(url).path).name or "download"
    return safe_filename(base)

run ¶

run(
    bib_path: Path,
    meta_path: Path,
    out_root: Path,
    client: Any,
) -> Summary

Orchestrate the download of assets for bibliography entries.

Parameters:

Name	Type	Description	Default
`bib_path`	`Path`	Path to the bibliography file.	required
`meta_path`	`Path`	Path to the metadata file.	required
`out_root`	`Path`	Root directory for output files.	required
`client`	`any`	HTTP client for downloading files.	required

Returns:

Type	Description
`Summary`	Summary of processed entries and any errors encountered.

Source code in src/civic_interconnect/paperkit/orchestrate.py

def run(bib_path: Path, meta_path: Path, out_root: Path, client: Any) -> Summary:
    """Orchestrate the download of assets for bibliography entries.

    Parameters
    ----------
    bib_path : Path
        Path to the bibliography file.
    meta_path : Path
        Path to the metadata file.
    out_root : Path
        Root directory for output files.
    client : any
        HTTP client for downloading files.

    Returns
    -------
    Summary
        Summary of processed entries and any errors encountered.
    """
    keys = set(load_bib_keys(bib_path))
    meta = load_meta(meta_path)
    common = sorted(keys.intersection(meta.keys()))
    summary = Summary()

    if not common:
        logger.warning("No overlapping keys between .bib and meta; nothing to do.")
        return summary

    for key in common:
        rec = DownloadRecord(bibkey=key)
        entry_meta = meta[key] or {}
        subdir = entry_meta.get("out_dir")
        assets = entry_meta.get("assets", [])

        for a in assets:
            try:
                # direct file
                if "url" in a:
                    out_dir = out_root / key / (subdir or ".")
                    ensure_dir(out_dir)
                    fname = a.get("filename") or guess_filename_from_url(a["url"])
                    p = out_dir / fname
                    download_file(client, a["url"], p, a.get("checksum"))
                    rec.paths.append(p)
                # page scrape
                elif "page_url" in a:
                    logger.info("[%s] Scraping page %s", key, a["page_url"])
                    allow = a.get("allow_ext") or DEFAULT_ALLOWED_EXTS
                    rx = a.get("href_regex")
                    limit = a.get("limit")
                    resp = client.get(a["page_url"])
                    links = extract_links(resp.text, a.get("base_url") or a["page_url"], allow, rx)
                    if limit is not None:
                        links = links[: int(limit)]
                    out_dir = out_root / key / (subdir or ".")
                    ensure_dir(out_dir)
                    for u in links:
                        p = out_dir / guess_filename_from_url(u)
                        download_file(client, u, p)
                        rec.paths.append(p)
                else:
                    msg = "unknown asset type"
                    rec.errors.append(msg)
                    logger.warning("[%s] %s", key, msg)
            except Exception as exc:
                rec.errors.append(str(exc))
                logger.error("[%s] %s", key, exc)
        summary.processed.append(rec)
    return summary

HTTP Client¶

civic_interconnect.paperkit.http_client ¶

HTTP client wrapper for making GET requests with retries and logging.

This module provides the HttpClient dataclass for robust HTTP GET requests, including configurable timeout, retries, backoff, and user-agent.

File: src/civic_interconnect/paperkit/http_client.py

HttpClient `dataclass` ¶

HTTP client for making GET requests with retries, backoff, and custom user-agent.

Attributes:

Name	Type	Description
`session`	`Session`	The requests session used for HTTP requests.
`timeout`	`int`	Timeout for each request in seconds.
`retries`	`int`	Number of retry attempts for failed requests.
`backoff_seconds`	`int`	Base seconds to wait between retries (multiplied by attempt number).
`user_agent`	`str`	User-Agent header for requests.

Source code in src/civic_interconnect/paperkit/http_client.py

@dataclass
class HttpClient:
    """HTTP client for making GET requests with retries, backoff, and custom user-agent.

    Attributes
    ----------
    session : requests.Session
        The requests session used for HTTP requests.
    timeout : int
        Timeout for each request in seconds.
    retries : int
        Number of retry attempts for failed requests.
    backoff_seconds : int
        Base seconds to wait between retries (multiplied by attempt number).
    user_agent : str
        User-Agent header for requests.
    """

    session: requests.Session
    timeout: int = 30
    retries: int = 3
    backoff_seconds: int = 2
    user_agent: str = "ci-paper-fetcher/1.0"

    def get(self, url: str) -> requests.Response:
        """Perform an HTTP GET request with retries and exponential backoff.

        Parameters
        ----------
        url : str
            The URL to send the GET request to.

        Returns
        -------
        requests.Response
            The HTTP response object.

        Raises
        ------
        Exception
            If all retry attempts fail, the last exception is raised.
        """
        last_exc: Exception | None = None
        headers = {"User-Agent": self.user_agent}
        for attempt in range(1, self.retries + 1):
            try:
                logger.debug("HTTP GET %s (attempt %s)", url, attempt)
                resp = self.session.get(url, timeout=self.timeout, headers=headers)
                resp.raise_for_status()
                return resp
            except Exception as exc:
                logger.warning("HTTP GET failed for %s on attempt %s: %s", url, attempt, exc)
                last_exc = exc
                if attempt < self.retries:
                    time.sleep(self.backoff_seconds * attempt)
        logger.error("HTTP GET giving up for %s", url)
        raise last_exc if last_exc else RuntimeError("HTTP get failed unexpectedly")

get ¶

get(url: str) -> requests.Response

Perform an HTTP GET request with retries and exponential backoff.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL to send the GET request to.	required

Returns:

Type	Description
`Response`	The HTTP response object.

Raises:

Type	Description
`Exception`	If all retry attempts fail, the last exception is raised.

Source code in src/civic_interconnect/paperkit/http_client.py

def get(self, url: str) -> requests.Response:
    """Perform an HTTP GET request with retries and exponential backoff.

    Parameters
    ----------
    url : str
        The URL to send the GET request to.

    Returns
    -------
    requests.Response
        The HTTP response object.

    Raises
    ------
    Exception
        If all retry attempts fail, the last exception is raised.
    """
    last_exc: Exception | None = None
    headers = {"User-Agent": self.user_agent}
    for attempt in range(1, self.retries + 1):
        try:
            logger.debug("HTTP GET %s (attempt %s)", url, attempt)
            resp = self.session.get(url, timeout=self.timeout, headers=headers)
            resp.raise_for_status()
            return resp
        except Exception as exc:
            logger.warning("HTTP GET failed for %s on attempt %s: %s", url, attempt, exc)
            last_exc = exc
            if attempt < self.retries:
                time.sleep(self.backoff_seconds * attempt)
    logger.error("HTTP GET giving up for %s", url)
    raise last_exc if last_exc else RuntimeError("HTTP get failed unexpectedly")

Download¶

civic_interconnect.paperkit.download ¶

File download utilities with checksum validation and safe filename handling.

This module provides: - ensure_dir: Create directories recursively if they don't exist - safe_filename: Convert strings to filesystem-safe filenames - sha256_file: Calculate SHA256 hash of a file - write_bytes: Write bytes to a file with directory creation - download_file: Download files with optional checksum verification

File: src/civic_interconnect/paperkit/download.py

download_file ¶

download_file(
    client: Any,
    url: str,
    out_path: Path,
    checksum: str | None = None,
) -> Path

Download a file from a URL, save it to a path, and optionally verify its checksum.

Parameters:

Name	Type	Description	Default
`client`	`Any`	HTTP client with a .get(url) method returning a response with .content.	required
`url`	`str`	The URL to download the file from.	required
`out_path`	`Path`	The path to save the downloaded file.	required
`checksum`	`str \| None`	Optional SHA256 checksum to verify the downloaded file.	`None`

Returns:

Type	Description
`Path`	The path to the saved file.

Raises:

Type	Description
`ValueError`	If the checksum does not match.

Source code in src/civic_interconnect/paperkit/download.py

def download_file(client: Any, url: str, out_path: Path, checksum: str | None = None) -> Path:
    """Download a file from a URL, save it to a path, and optionally verify its checksum.

    Parameters
    ----------
    client : Any
        HTTP client with a .get(url) method returning a response with .content.
    url : str
        The URL to download the file from.
    out_path : Path
        The path to save the downloaded file.
    checksum : str | None, optional
        Optional SHA256 checksum to verify the downloaded file.

    Returns
    -------
    Path
        The path to the saved file.

    Raises
    ------
    ValueError
        If the checksum does not match.
    """
    logger.info("Downloading %s -> %s", url, out_path)
    resp = client.get(url)
    write_bytes(out_path, resp.content)
    if checksum:
        actual = sha256_file(out_path)
        if actual.lower() != checksum.lower():
            logger.error("Checksum mismatch for %s", out_path)
            raise ValueError(f"checksum mismatch for {out_path}")
    return out_path

ensure_dir ¶

ensure_dir(p: Path) -> None

Create the directory at the given path, including any necessary parent directories.

Parameters:

Name	Type	Description	Default
`p`	`Path`	The directory path to create.	required

Source code in src/civic_interconnect/paperkit/download.py

def ensure_dir(p: Path) -> None:
    """Create the directory at the given path, including any necessary parent directories.

    Parameters
    ----------
    p : Path
        The directory path to create.
    """
    p.mkdir(parents=True, exist_ok=True)

safe_filename ¶

safe_filename(name: str) -> str

Convert a string to a filesystem-safe filename.

Parameters:

Name	Type	Description	Default
`name`	`str`	The original filename or string.	required

Returns:

Type	Description
`str`	A sanitized, filesystem-safe filename.

Source code in src/civic_interconnect/paperkit/download.py

def safe_filename(name: str) -> str:
    """Convert a string to a filesystem-safe filename.

    Parameters
    ----------
    name : str
        The original filename or string.

    Returns
    -------
    str
        A sanitized, filesystem-safe filename.
    """
    name = unescape(name).strip()
    name = re.sub(r"[\\/:*?\"<>|\s]+", "_", name)
    name = name.encode("ascii", "ignore").decode("ascii")
    name = re.sub(r"_+", "_", name).strip("_")
    return name or "file"

sha256_file ¶

sha256_file(path: Path) -> str

Calculate the SHA256 hash of a file.

Parameters:

Name	Type	Description	Default
`path`	`Path`	The path to the file to hash.	required

Returns:

Type	Description
`str`	The SHA256 hexadecimal digest of the file.

Source code in src/civic_interconnect/paperkit/download.py

def sha256_file(path: Path) -> str:
    """Calculate the SHA256 hash of a file.

    Parameters
    ----------
    path : Path
        The path to the file to hash.

    Returns
    -------
    str
        The SHA256 hexadecimal digest of the file.
    """
    h = hashlib.sha256()
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

write_bytes ¶

write_bytes(path: Path, content: bytes) -> None

Write bytes to a file, creating parent directories if necessary.

Parameters:

Name	Type	Description	Default
`path`	`Path`	The file path to write to.	required
`content`	`bytes`	The bytes content to write.	required

Returns:

Type	Description
`None`

Source code in src/civic_interconnect/paperkit/download.py

def write_bytes(path: Path, content: bytes) -> None:
    """Write bytes to a file, creating parent directories if necessary.

    Parameters
    ----------
    path : Path
        The file path to write to.
    content : bytes
        The bytes content to write.

    Returns
    -------
    None
    """
    ensure_dir(path.parent)
    with path.open("wb") as f:
        f.write(content)
    logger.info("Saved %s", path)

Web Scraping¶

civic_interconnect.paperkit.scrape ¶

Functions for extracting and filtering links from HTML documents.

This module provides utilities to parse HTML, extract anchor links, filter them by extension and regular expression, and log the results.

File: src/civic_interconnect/paperkit/scrape.py

extract_links ¶

extract_links(
    html: str,
    base_url: str,
    allow_ext: list[str],
    href_regex: str | None,
) -> list[str]

Extract and filter anchor links from an HTML document.

Parameters:

Name	Type	Description	Default
`html`	`str`	The HTML content to parse.	required
`base_url`	`str`	The base URL to resolve relative links.	required
`allow_ext`	`list[str]`	List of allowed file extensions (e.g., ['.pdf', '.html']).	required
`href_regex`	`str \| None`	Optional regular expression to further filter hrefs.	required

Returns:

Type	Description
`list[str]`	List of filtered, absolute URLs extracted from the HTML.

Source code in src/civic_interconnect/paperkit/scrape.py

def extract_links(
    html: str, base_url: str, allow_ext: list[str], href_regex: str | None
) -> list[str]:
    """Extract and filter anchor links from an HTML document.

    Parameters
    ----------
    html : str
        The HTML content to parse.
    base_url : str
        The base URL to resolve relative links.
    allow_ext : list[str]
        List of allowed file extensions (e.g., ['.pdf', '.html']).
    href_regex : str | None
        Optional regular expression to further filter hrefs.

    Returns
    -------
    list[str]
        List of filtered, absolute URLs extracted from the HTML.
    """
    soup = BeautifulSoup(html, "html.parser")
    rx = re.compile(href_regex, re.I) if href_regex else None
    out: list[str] = []
    seen: set[str] = set()

    allow = [e.lower() for e in allow_ext] if allow_ext else []

    for a in soup.find_all("a", href=True):
        href = str(a["href"]).strip()
        full = urljoin(base_url, href)
        ext = Path(urlparse(full).path).suffix.lower()
        if allow and ext not in allow:
            continue
        if rx and not rx.search(href):
            continue
        if full not in seen:
            seen.add(full)
            out.append(full)
    logger.debug("Extracted %d links from %s", len(out), base_url)
    return out

CLI¶

civic_interconnect.paperkit.cli ¶

Command-line interface for the paperkit tool.

This module provides the main CLI entry point for fetching public data for bibliography references, including argument parsing and orchestration of the fetch process.

File: src/civic_interconnect/paperkit/cli.py

main ¶

main() -> int

Run the paperkit CLI.

Source code in src/civic_interconnect/paperkit/cli.py

def main() -> int:
    """Run the paperkit CLI."""
    ap = argparse.ArgumentParser(description="Fetch public data for .bib references")
    ap.add_argument("--bib", type=Path, default=Path("paper/refs.bib"))
    ap.add_argument("--meta", type=Path, default=Path("paper/refs_meta.yaml"))
    ap.add_argument("--out", type=Path, default=DEFAULT_OUTPUT_ROOT)
    ap.add_argument("--log-level", type=str, default="INFO", help="DEBUG, INFO, WARNING, ERROR")
    args = ap.parse_args()

    configure(args.log_level)
    logger.info("Starting paperkit fetch with bib=%s meta=%s out=%s", args.bib, args.meta, args.out)

    client: HttpClient = HttpClient(session=requests.Session())
    summary = run(args.bib, args.meta, args.out, client)

    for rec in summary.processed:
        for p in rec.paths:
            logger.info("[%s] saved %s", rec.bibkey, p)
        for e in rec.errors:
            logger.error("[%s] ERROR %s", rec.bibkey, e)
    return 0

Logging¶

civic_interconnect.paperkit.log ¶

Logging utilities for the civic_interconnect.paperkit module.

Provides a library-wide logger and optional configuration for console output.

File: src/civic_interconnect/paperkit/log.py

configure ¶

configure(level: str = 'INFO') -> None

Configure basic console output for logging.

Only used by the CLI or by applications that explicitly opt in.

Source code in src/civic_interconnect/paperkit/log.py

def configure(level: str = "INFO") -> None:
    """Configure basic console output for logging.

    Only used by the CLI or by applications that explicitly opt in.
    """
    level = level.upper()
    # If root already has handlers, do not reconfigure.
    if logging.getLogger().handlers:
        logging.getLogger().setLevel(getattr(logging, level, logging.INFO))
        return

    fmt = "%(asctime)s | %(levelname)-7s | %(name)s | %(message)s"
    logging.basicConfig(level=getattr(logging, level, logging.INFO), format=fmt)

API Reference¶

Core Modules¶

Configuration¶

civic_interconnect.paperkit.config ¶

DirectAssetTD ¶

EntryMetaTD ¶

PageAssetTD ¶

load_meta ¶

Bibliography¶

civic_interconnect.paperkit.bib ¶

BibDatabaseLike ¶

__getattr__ ¶

BibEntry ¶

load_bib_keys ¶

Orchestration¶

civic_interconnect.paperkit.orchestrate ¶

DownloadRecord dataclass ¶

Summary dataclass ¶

guess_filename_from_url ¶

run ¶

HTTP Client¶

civic_interconnect.paperkit.http_client ¶

HttpClient dataclass ¶

get ¶

Download¶

civic_interconnect.paperkit.download ¶

download_file ¶

ensure_dir ¶

safe_filename ¶

sha256_file ¶

write_bytes ¶

Web Scraping¶

civic_interconnect.paperkit.scrape ¶

extract_links ¶

CLI¶

civic_interconnect.paperkit.cli ¶

main ¶

Logging¶

civic_interconnect.paperkit.log ¶

configure ¶

getattr ¶

DownloadRecord `dataclass` ¶

Summary `dataclass` ¶

HttpClient `dataclass` ¶