Module utils.chunk_utils
Classes
Path(self, *args, **kwargs)
PurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
Functions
chunk_geojson_file(geojson_file: pathlib.Path, output_dir: pathlib.Path, max_features: int) -> None
Chunk a single GeoJSON file into smaller pieces in the output_dir.
Skips the file if it's a directory or already chunked.
chunk_geojson_folder(input_folder: pathlib.Path, max_features: int, output_folder: pathlib.Path) -> None
Chunk all eligible GeoJSON files in a folder.
Args: input_folder (Path): Folder containing GeoJSON files. max_features (int): Maximum features per chunk. output_folder (Path): Destination folder for chunked files.
chunk_one(path: pathlib.Path, max_features: int, output_dir: pathlib.Path)
Chunk a single GeoJSON file and write the output files.
Args: path (Path): Path to input GeoJSON file. max_features (int): Max features per chunk. output_dir (Path): Output folder to store chunks.
chunk_or_copy_file(geojson_file: pathlib.Path, max_features: int, output_dir: pathlib.Path) -> None
Decide whether to chunk a GeoJSON file or simply copy it.
Args: geojson_file (Path): The file to process. max_features (int): Threshold for chunking. output_dir (Path): Destination folder.
copy_geojson_file(src: pathlib.Path, dest: pathlib.Path) -> None
Copy a GeoJSON file from src to dest.
geojson_feature_count(path: pathlib.Path) -> int
Return the number of features in a GeoJSON file.
Args: path (Path): Path to the GeoJSON file.
Returns: int: Feature count or 0 if reading fails.
get_chunking_params() -> dict
Load chunking and simplification parameters from YAML layer configs.
Returns: dict: Dictionary with chunking parameters.
get_repo_root() -> pathlib.Path
Return the root directory of the civic-data-boundaries-us-forests repository.
This function first checks whether the file path is running from a cloned source repo (using file as a reference), and if that does not locate the repo, searches upward from the current working directory until it finds a folder containing a data-config directory.
Returns: Path: Path to the repository root.
Raises: RuntimeError: If the repo root cannot be found.
is_chunked_file(path: pathlib.Path) -> bool
Return True if the file is already a chunked GeoJSON.
Args: path (Path): Path to the file.
Returns: bool: True if the file ends with '_chunked.geojson'.
load_all_layer_configs() -> list[dict]
Load and merge all YAML layer configs into a single list.
Returns: list[dict]: List of all configured layers.
should_skip_file(path: pathlib.Path) -> bool
Determine whether a file should be skipped during chunking.
Args: path (Path): Path to the file or directory.
Returns: bool: True if the path is a directory or already chunked.