Skip to content

API Reference

Core Modules

Configuration

civic_interconnect.paperkit.config

Configuration module for paper kit metadata handling.

This module provides: - TypedDict definitions for asset and metadata configuration - Functions to load and normalize metadata from YAML files - Default file extension configurations for allowed assets

File: src/civic_interconnect/paperkit/config.py

DirectAssetTD

Bases: TypedDict

TypedDict for direct asset configuration.

Attributes:

Name Type Description
url str

The URL of the asset.

filename NotRequired[str]

Optional filename for the asset.

checksum NotRequired[str]

Optional checksum for the asset.

Source code in src/civic_interconnect/paperkit/config.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class DirectAssetTD(TypedDict, total=False):
    """TypedDict for direct asset configuration.

    Attributes
    ----------
    url : str
        The URL of the asset.
    filename : NotRequired[str]
        Optional filename for the asset.
    checksum : NotRequired[str]
        Optional checksum for the asset.
    """

    url: str
    filename: NotRequired[str]
    checksum: NotRequired[str]

EntryMetaTD

Bases: TypedDict

TypedDict for entry metadata configuration.

Attributes:

Name Type Description
notes NotRequired[str]

Optional notes about the entry.

out_dir NotRequired[str]

Optional output directory for the entry.

assets NotRequired[list[AssetTD]]

Optional list of assets associated with the entry.

Source code in src/civic_interconnect/paperkit/config.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class EntryMetaTD(TypedDict, total=False):
    """TypedDict for entry metadata configuration.

    Attributes
    ----------
    notes : NotRequired[str]
        Optional notes about the entry.
    out_dir : NotRequired[str]
        Optional output directory for the entry.
    assets : NotRequired[list[AssetTD]]
        Optional list of assets associated with the entry.
    """

    notes: NotRequired[str]
    out_dir: NotRequired[str]
    assets: NotRequired[list[AssetTD]]

PageAssetTD

Bases: TypedDict

TypedDict for page-based asset configuration.

Attributes:

Name Type Description
page_url str

The URL of the page to scrape for assets.

allow_ext NotRequired[list[str]]

Optional list of allowed file extensions.

href_regex NotRequired[str]

Optional regex pattern to match href attributes.

limit NotRequired[int]

Optional limit on number of assets to collect.

base_url NotRequired[str]

Optional base URL for relative links.

Source code in src/civic_interconnect/paperkit/config.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class PageAssetTD(TypedDict, total=False):
    """TypedDict for page-based asset configuration.

    Attributes
    ----------
    page_url : str
        The URL of the page to scrape for assets.
    allow_ext : NotRequired[list[str]]
        Optional list of allowed file extensions.
    href_regex : NotRequired[str]
        Optional regex pattern to match href attributes.
    limit : NotRequired[int]
        Optional limit on number of assets to collect.
    base_url : NotRequired[str]
        Optional base URL for relative links.
    """

    page_url: str
    allow_ext: NotRequired[list[str]]
    href_regex: NotRequired[str]
    limit: NotRequired[int]
    base_url: NotRequired[str]

load_meta

load_meta(meta_path: Path) -> MetaTD

Load metadata from a YAML file.

Parameters:

Name Type Description Default
meta_path Path

Path to the YAML metadata file.

required

Returns:

Type Description
MetaTD

Dictionary containing the loaded and normalized metadata entries.

Raises:

Type Description
ValueError

If the YAML file does not contain a mapping of bibkeys.

Source code in src/civic_interconnect/paperkit/config.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def load_meta(meta_path: Path) -> MetaTD:
    """Load metadata from a YAML file.

    Parameters
    ----------
    meta_path : Path
        Path to the YAML metadata file.

    Returns
    -------
    MetaTD
        Dictionary containing the loaded and normalized metadata entries.

    Raises
    ------
    ValueError
        If the YAML file does not contain a mapping of bibkeys.
    """
    raw_text = meta_path.read_text(encoding="utf-8")
    loaded: Any = yaml.safe_load(raw_text)
    if loaded is None:
        data: MetaTD = {}
    elif isinstance(loaded, dict):
        data = cast("MetaTD", loaded)
    else:
        raise ValueError("refs_meta.yaml must be a mapping of bibkeys")

    # Normalize each entry
    for key, entry in list(data.items()):
        data[key] = _normalize_entry(entry)

    logger.info("Loaded meta for %d keys from %s", len(data), meta_path)
    return data

Bibliography

civic_interconnect.paperkit.bib

Bibliography handling utilities for the paper kit.

This module provides functionality for loading and processing BibTeX files: - BibEntry: TypedDict for bibliography entries - BibDatabaseLike: Protocol for bibliography database objects - load_bib_keys: Function to extract citation keys from BibTeX files

File: src/civic_interconnect/paperkit/bib.py

BibDatabaseLike

Bases: Protocol

Protocol for bibliography database objects.

This protocol defines the interface for bibliography database objects that contain a list of bibliography entries and support attribute access.

Attributes:

Name Type Description
entries List[BibEntry]

A list of bibliography entries from the database.

Methods:

Name Description
__getattr__

Provide access to additional attributes on the database object.

Source code in src/civic_interconnect/paperkit/bib.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class BibDatabaseLike(Protocol):
    """Protocol for bibliography database objects.

    This protocol defines the interface for bibliography database objects
    that contain a list of bibliography entries and support attribute access.

    Attributes
    ----------
    entries : List[BibEntry]
        A list of bibliography entries from the database.

    Methods
    -------
    __getattr__(name: str) -> object
        Provide access to additional attributes on the database object.
    """

    entries: list[BibEntry]

    def __getattr__(self, name: str) -> object:
        """Provide access to additional attributes on the database object."""
        ...

__getattr__

__getattr__(name: str) -> object

Provide access to additional attributes on the database object.

Source code in src/civic_interconnect/paperkit/bib.py
51
52
53
def __getattr__(self, name: str) -> object:
    """Provide access to additional attributes on the database object."""
    ...

BibEntry

Bases: TypedDict

A bibliography entry from a BibTeX file.

Attributes:

Name Type Description
ID str

The citation key/identifier for the bibliography entry.

Source code in src/civic_interconnect/paperkit/bib.py
20
21
22
23
24
25
26
27
28
29
class BibEntry(TypedDict, total=False):
    """A bibliography entry from a BibTeX file.

    Attributes
    ----------
    ID : str
        The citation key/identifier for the bibliography entry.
    """

    ID: str

load_bib_keys

load_bib_keys(bib_path: Path) -> list[str]

Load citation keys from a BibTeX file.

Source code in src/civic_interconnect/paperkit/bib.py
56
57
58
59
60
61
62
63
64
65
def load_bib_keys(bib_path: Path) -> list[str]:
    """Load citation keys from a BibTeX file."""
    with bib_path.open("r", encoding="utf-8") as f:
        db_raw: BibDatabaseLike = bibtexparser.load(f)  # type: ignore[assignment]

    entries: list[BibEntry] = db_raw.entries
    keys: list[str] = [e["ID"] for e in entries if "ID" in e]

    logger.debug("Loaded %d keys from %s", len(keys), bib_path)
    return keys

Orchestration

civic_interconnect.paperkit.orchestrate

Orchestration module for downloading and managing assets linked to bibliography entries.

This module provides: - DownloadRecord and Summary dataclasses for tracking downloads, - Functions to guess filenames, run the download process, and handle asset scraping.

File: src/civic_interconnect/paperkit/orchestrate.py

DownloadRecord dataclass

Represents a record of downloaded assets for a bibliography entry.

Attributes:

Name Type Description
bibkey str

The bibliography key associated with the entry.

paths list[Path]

List of file paths to successfully downloaded assets.

errors list[str]

List of error messages encountered during download.

Source code in src/civic_interconnect/paperkit/orchestrate.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
@dataclass
class DownloadRecord:
    """Represents a record of downloaded assets for a bibliography entry.

    Attributes
    ----------
    bibkey : str
        The bibliography key associated with the entry.
    paths : list[Path]
        List of file paths to successfully downloaded assets.
    errors : list[str]
        List of error messages encountered during download.
    """

    bibkey: str
    paths: list[Path] = field(default_factory=lambda: [])
    errors: list[str] = field(default_factory=lambda: [])

Summary dataclass

Summary of the download process for bibliography entries.

Attributes:

Name Type Description
processed list[DownloadRecord]

List of records for processed entries.

skipped list[str]

List of keys that were skipped.

Source code in src/civic_interconnect/paperkit/orchestrate.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@dataclass
class Summary:
    """Summary of the download process for bibliography entries.

    Attributes
    ----------
    processed : list[DownloadRecord]
        List of records for processed entries.
    skipped : list[str]
        List of keys that were skipped.
    """

    processed: list[DownloadRecord] = field(default_factory=list)
    skipped: list[str] = field(default_factory=list)

guess_filename_from_url

guess_filename_from_url(url: str) -> str

Guess a safe filename from a URL.

Parameters:

Name Type Description Default
url str

The URL from which to extract the filename.

required

Returns:

Type Description
str

A sanitized filename derived from the URL.

Source code in src/civic_interconnect/paperkit/orchestrate.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def guess_filename_from_url(url: str) -> str:
    """Guess a safe filename from a URL.

    Parameters
    ----------
    url : str
        The URL from which to extract the filename.

    Returns
    -------
    str
        A sanitized filename derived from the URL.
    """
    base = Path(urlparse(url).path).name or "download"
    return safe_filename(base)

run

run(
    bib_path: Path,
    meta_path: Path,
    out_root: Path,
    client: Any,
) -> Summary

Orchestrate the download of assets for bibliography entries.

Parameters:

Name Type Description Default
bib_path Path

Path to the bibliography file.

required
meta_path Path

Path to the metadata file.

required
out_root Path

Root directory for output files.

required
client any

HTTP client for downloading files.

required

Returns:

Type Description
Summary

Summary of processed entries and any errors encountered.

Source code in src/civic_interconnect/paperkit/orchestrate.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def run(bib_path: Path, meta_path: Path, out_root: Path, client: Any) -> Summary:
    """Orchestrate the download of assets for bibliography entries.

    Parameters
    ----------
    bib_path : Path
        Path to the bibliography file.
    meta_path : Path
        Path to the metadata file.
    out_root : Path
        Root directory for output files.
    client : any
        HTTP client for downloading files.

    Returns
    -------
    Summary
        Summary of processed entries and any errors encountered.
    """
    keys = set(load_bib_keys(bib_path))
    meta = load_meta(meta_path)
    common = sorted(keys.intersection(meta.keys()))
    summary = Summary()

    if not common:
        logger.warning("No overlapping keys between .bib and meta; nothing to do.")
        return summary

    for key in common:
        rec = DownloadRecord(bibkey=key)
        entry_meta = meta[key] or {}
        subdir = entry_meta.get("out_dir")
        assets = entry_meta.get("assets", [])

        for a in assets:
            try:
                # direct file
                if "url" in a:
                    out_dir = out_root / key / (subdir or ".")
                    ensure_dir(out_dir)
                    fname = a.get("filename") or guess_filename_from_url(a["url"])
                    p = out_dir / fname
                    download_file(client, a["url"], p, a.get("checksum"))
                    rec.paths.append(p)
                # page scrape
                elif "page_url" in a:
                    logger.info("[%s] Scraping page %s", key, a["page_url"])
                    allow = a.get("allow_ext") or DEFAULT_ALLOWED_EXTS
                    rx = a.get("href_regex")
                    limit = a.get("limit")
                    resp = client.get(a["page_url"])
                    links = extract_links(resp.text, a.get("base_url") or a["page_url"], allow, rx)
                    if limit is not None:
                        links = links[: int(limit)]
                    out_dir = out_root / key / (subdir or ".")
                    ensure_dir(out_dir)
                    for u in links:
                        p = out_dir / guess_filename_from_url(u)
                        download_file(client, u, p)
                        rec.paths.append(p)
                else:
                    msg = "unknown asset type"
                    rec.errors.append(msg)
                    logger.warning("[%s] %s", key, msg)
            except Exception as exc:
                rec.errors.append(str(exc))
                logger.error("[%s] %s", key, exc)
        summary.processed.append(rec)
    return summary

HTTP Client

civic_interconnect.paperkit.http_client

HTTP client wrapper for making GET requests with retries and logging.

This module provides the HttpClient dataclass for robust HTTP GET requests, including configurable timeout, retries, backoff, and user-agent.

File: src/civic_interconnect/paperkit/http_client.py

HttpClient dataclass

HTTP client for making GET requests with retries, backoff, and custom user-agent.

Attributes:

Name Type Description
session Session

The requests session used for HTTP requests.

timeout int

Timeout for each request in seconds.

retries int

Number of retry attempts for failed requests.

backoff_seconds int

Base seconds to wait between retries (multiplied by attempt number).

user_agent str

User-Agent header for requests.

Source code in src/civic_interconnect/paperkit/http_client.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
@dataclass
class HttpClient:
    """HTTP client for making GET requests with retries, backoff, and custom user-agent.

    Attributes
    ----------
    session : requests.Session
        The requests session used for HTTP requests.
    timeout : int
        Timeout for each request in seconds.
    retries : int
        Number of retry attempts for failed requests.
    backoff_seconds : int
        Base seconds to wait between retries (multiplied by attempt number).
    user_agent : str
        User-Agent header for requests.
    """

    session: requests.Session
    timeout: int = 30
    retries: int = 3
    backoff_seconds: int = 2
    user_agent: str = "ci-paper-fetcher/1.0"

    def get(self, url: str) -> requests.Response:
        """Perform an HTTP GET request with retries and exponential backoff.

        Parameters
        ----------
        url : str
            The URL to send the GET request to.

        Returns
        -------
        requests.Response
            The HTTP response object.

        Raises
        ------
        Exception
            If all retry attempts fail, the last exception is raised.
        """
        last_exc: Exception | None = None
        headers = {"User-Agent": self.user_agent}
        for attempt in range(1, self.retries + 1):
            try:
                logger.debug("HTTP GET %s (attempt %s)", url, attempt)
                resp = self.session.get(url, timeout=self.timeout, headers=headers)
                resp.raise_for_status()
                return resp
            except Exception as exc:
                logger.warning("HTTP GET failed for %s on attempt %s: %s", url, attempt, exc)
                last_exc = exc
                if attempt < self.retries:
                    time.sleep(self.backoff_seconds * attempt)
        logger.error("HTTP GET giving up for %s", url)
        raise last_exc if last_exc else RuntimeError("HTTP get failed unexpectedly")

get

get(url: str) -> requests.Response

Perform an HTTP GET request with retries and exponential backoff.

Parameters:

Name Type Description Default
url str

The URL to send the GET request to.

required

Returns:

Type Description
Response

The HTTP response object.

Raises:

Type Description
Exception

If all retry attempts fail, the last exception is raised.

Source code in src/civic_interconnect/paperkit/http_client.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def get(self, url: str) -> requests.Response:
    """Perform an HTTP GET request with retries and exponential backoff.

    Parameters
    ----------
    url : str
        The URL to send the GET request to.

    Returns
    -------
    requests.Response
        The HTTP response object.

    Raises
    ------
    Exception
        If all retry attempts fail, the last exception is raised.
    """
    last_exc: Exception | None = None
    headers = {"User-Agent": self.user_agent}
    for attempt in range(1, self.retries + 1):
        try:
            logger.debug("HTTP GET %s (attempt %s)", url, attempt)
            resp = self.session.get(url, timeout=self.timeout, headers=headers)
            resp.raise_for_status()
            return resp
        except Exception as exc:
            logger.warning("HTTP GET failed for %s on attempt %s: %s", url, attempt, exc)
            last_exc = exc
            if attempt < self.retries:
                time.sleep(self.backoff_seconds * attempt)
    logger.error("HTTP GET giving up for %s", url)
    raise last_exc if last_exc else RuntimeError("HTTP get failed unexpectedly")

Download

civic_interconnect.paperkit.download

File download utilities with checksum validation and safe filename handling.

This module provides: - ensure_dir: Create directories recursively if they don't exist - safe_filename: Convert strings to filesystem-safe filenames - sha256_file: Calculate SHA256 hash of a file - write_bytes: Write bytes to a file with directory creation - download_file: Download files with optional checksum verification

File: src/civic_interconnect/paperkit/download.py

download_file

download_file(
    client: Any,
    url: str,
    out_path: Path,
    checksum: str | None = None,
) -> Path

Download a file from a URL, save it to a path, and optionally verify its checksum.

Parameters:

Name Type Description Default
client Any

HTTP client with a .get(url) method returning a response with .content.

required
url str

The URL to download the file from.

required
out_path Path

The path to save the downloaded file.

required
checksum str | None

Optional SHA256 checksum to verify the downloaded file.

None

Returns:

Type Description
Path

The path to the saved file.

Raises:

Type Description
ValueError

If the checksum does not match.

Source code in src/civic_interconnect/paperkit/download.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def download_file(client: Any, url: str, out_path: Path, checksum: str | None = None) -> Path:
    """Download a file from a URL, save it to a path, and optionally verify its checksum.

    Parameters
    ----------
    client : Any
        HTTP client with a .get(url) method returning a response with .content.
    url : str
        The URL to download the file from.
    out_path : Path
        The path to save the downloaded file.
    checksum : str | None, optional
        Optional SHA256 checksum to verify the downloaded file.

    Returns
    -------
    Path
        The path to the saved file.

    Raises
    ------
    ValueError
        If the checksum does not match.
    """
    logger.info("Downloading %s -> %s", url, out_path)
    resp = client.get(url)
    write_bytes(out_path, resp.content)
    if checksum:
        actual = sha256_file(out_path)
        if actual.lower() != checksum.lower():
            logger.error("Checksum mismatch for %s", out_path)
            raise ValueError(f"checksum mismatch for {out_path}")
    return out_path

ensure_dir

ensure_dir(p: Path) -> None

Create the directory at the given path, including any necessary parent directories.

Parameters:

Name Type Description Default
p Path

The directory path to create.

required
Source code in src/civic_interconnect/paperkit/download.py
22
23
24
25
26
27
28
29
30
def ensure_dir(p: Path) -> None:
    """Create the directory at the given path, including any necessary parent directories.

    Parameters
    ----------
    p : Path
        The directory path to create.
    """
    p.mkdir(parents=True, exist_ok=True)

safe_filename

safe_filename(name: str) -> str

Convert a string to a filesystem-safe filename.

Parameters:

Name Type Description Default
name str

The original filename or string.

required

Returns:

Type Description
str

A sanitized, filesystem-safe filename.

Source code in src/civic_interconnect/paperkit/download.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def safe_filename(name: str) -> str:
    """Convert a string to a filesystem-safe filename.

    Parameters
    ----------
    name : str
        The original filename or string.

    Returns
    -------
    str
        A sanitized, filesystem-safe filename.
    """
    name = unescape(name).strip()
    name = re.sub(r"[\\/:*?\"<>|\s]+", "_", name)
    name = name.encode("ascii", "ignore").decode("ascii")
    name = re.sub(r"_+", "_", name).strip("_")
    return name or "file"

sha256_file

sha256_file(path: Path) -> str

Calculate the SHA256 hash of a file.

Parameters:

Name Type Description Default
path Path

The path to the file to hash.

required

Returns:

Type Description
str

The SHA256 hexadecimal digest of the file.

Source code in src/civic_interconnect/paperkit/download.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def sha256_file(path: Path) -> str:
    """Calculate the SHA256 hash of a file.

    Parameters
    ----------
    path : Path
        The path to the file to hash.

    Returns
    -------
    str
        The SHA256 hexadecimal digest of the file.
    """
    h = hashlib.sha256()
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b""):
            h.update(chunk)
    return h.hexdigest()

write_bytes

write_bytes(path: Path, content: bytes) -> None

Write bytes to a file, creating parent directories if necessary.

Parameters:

Name Type Description Default
path Path

The file path to write to.

required
content bytes

The bytes content to write.

required

Returns:

Type Description
None
Source code in src/civic_interconnect/paperkit/download.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def write_bytes(path: Path, content: bytes) -> None:
    """Write bytes to a file, creating parent directories if necessary.

    Parameters
    ----------
    path : Path
        The file path to write to.
    content : bytes
        The bytes content to write.

    Returns
    -------
    None
    """
    ensure_dir(path.parent)
    with path.open("wb") as f:
        f.write(content)
    logger.info("Saved %s", path)

Web Scraping

civic_interconnect.paperkit.scrape

Functions for extracting and filtering links from HTML documents.

This module provides utilities to parse HTML, extract anchor links, filter them by extension and regular expression, and log the results.

File: src/civic_interconnect/paperkit/scrape.py

extract_links(
    html: str,
    base_url: str,
    allow_ext: list[str],
    href_regex: str | None,
) -> list[str]

Extract and filter anchor links from an HTML document.

Parameters:

Name Type Description Default
html str

The HTML content to parse.

required
base_url str

The base URL to resolve relative links.

required
allow_ext list[str]

List of allowed file extensions (e.g., ['.pdf', '.html']).

required
href_regex str | None

Optional regular expression to further filter hrefs.

required

Returns:

Type Description
list[str]

List of filtered, absolute URLs extracted from the HTML.

Source code in src/civic_interconnect/paperkit/scrape.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def extract_links(
    html: str, base_url: str, allow_ext: list[str], href_regex: str | None
) -> list[str]:
    """Extract and filter anchor links from an HTML document.

    Parameters
    ----------
    html : str
        The HTML content to parse.
    base_url : str
        The base URL to resolve relative links.
    allow_ext : list[str]
        List of allowed file extensions (e.g., ['.pdf', '.html']).
    href_regex : str | None
        Optional regular expression to further filter hrefs.

    Returns
    -------
    list[str]
        List of filtered, absolute URLs extracted from the HTML.
    """
    soup = BeautifulSoup(html, "html.parser")
    rx = re.compile(href_regex, re.I) if href_regex else None
    out: list[str] = []
    seen: set[str] = set()

    allow = [e.lower() for e in allow_ext] if allow_ext else []

    for a in soup.find_all("a", href=True):
        href = str(a["href"]).strip()
        full = urljoin(base_url, href)
        ext = Path(urlparse(full).path).suffix.lower()
        if allow and ext not in allow:
            continue
        if rx and not rx.search(href):
            continue
        if full not in seen:
            seen.add(full)
            out.append(full)
    logger.debug("Extracted %d links from %s", len(out), base_url)
    return out

CLI

civic_interconnect.paperkit.cli

Command-line interface for the paperkit tool.

This module provides the main CLI entry point for fetching public data for bibliography references, including argument parsing and orchestration of the fetch process.

File: src/civic_interconnect/paperkit/cli.py

main

main() -> int

Run the paperkit CLI.

Source code in src/civic_interconnect/paperkit/cli.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def main() -> int:
    """Run the paperkit CLI."""
    ap = argparse.ArgumentParser(description="Fetch public data for .bib references")
    ap.add_argument("--bib", type=Path, default=Path("paper/refs.bib"))
    ap.add_argument("--meta", type=Path, default=Path("paper/refs_meta.yaml"))
    ap.add_argument("--out", type=Path, default=DEFAULT_OUTPUT_ROOT)
    ap.add_argument("--log-level", type=str, default="INFO", help="DEBUG, INFO, WARNING, ERROR")
    args = ap.parse_args()

    configure(args.log_level)
    logger.info("Starting paperkit fetch with bib=%s meta=%s out=%s", args.bib, args.meta, args.out)

    client: HttpClient = HttpClient(session=requests.Session())
    summary = run(args.bib, args.meta, args.out, client)

    for rec in summary.processed:
        for p in rec.paths:
            logger.info("[%s] saved %s", rec.bibkey, p)
        for e in rec.errors:
            logger.error("[%s] ERROR %s", rec.bibkey, e)
    return 0

Logging

civic_interconnect.paperkit.log

Logging utilities for the civic_interconnect.paperkit module.

Provides a library-wide logger and optional configuration for console output.

File: src/civic_interconnect/paperkit/log.py

configure

configure(level: str = 'INFO') -> None

Configure basic console output for logging.

Only used by the CLI or by applications that explicitly opt in.

Source code in src/civic_interconnect/paperkit/log.py
15
16
17
18
19
20
21
22
23
24
25
26
27
def configure(level: str = "INFO") -> None:
    """Configure basic console output for logging.

    Only used by the CLI or by applications that explicitly opt in.
    """
    level = level.upper()
    # If root already has handlers, do not reconfigure.
    if logging.getLogger().handlers:
        logging.getLogger().setLevel(getattr(logging, level, logging.INFO))
        return

    fmt = "%(asctime)s | %(levelname)-7s | %(name)s | %(message)s"
    logging.basicConfig(level=getattr(logging, level, logging.INFO), format=fmt)