All notable changes to facetmask are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Nothing yet.
site/version.json, read from the public GitHub raw URL so it works without the site's Cloudflare Access OTP) and, if a newer build exists, shows a popup with the version, a bullet list of what's new, and Update now / Skip this version / Later — plus a link to the full changelog. Skipped versions are remembered. Help → Check for updates… runs it on demand; facetmask --version prints the build. update.py is a small Tk-free, unit-tested module (12 tests). Set FACETMASK_FAKE_UPDATE=1 to force the prompt for a UI self-test. Replaces the brittle git fetch/pull in the launchers as the update path.install.bat now detects the GPU's compute capability via nvidia-smi and auto-selects the matching PyTorch wheel — cu128 for RTX 50-series (Blackwell, compute 12.x), cu124 for older NVIDIA cards, CPU build when there's no NVIDIA GPU — instead of hardcoding cu124. This permanently fixes the "CUDA error: no kernel image is available" crash on Blackwell cards without anyone needing to know CUDA versions. Installs torch with --no-cache-dir (so pip can't silently reuse a wrong- variant cached wheel) and adds a real GPU compute test to the verify step (is_available() lies on the no-kernel trap; a trivial GPU op is the honest check). Detection validated on RTX 4070 (→ cu124) and is the right mapping for RTX 50-series (→ cu128).torch.inference_mode, casts pixel values to the model dtype, moves results off-GPU and frees the CUDA cache between frames (the fragmentation that makes long runs fail where short ones don't), and **auto-picks the processed square size from the GPU's VRAM** (~560px on 8 GB, native 1008px on 16 GB+). On a mid-run OOM it auto-steps the resolution down and reloads, so a too-high guess self-corrects instead of crashing. New --sam-image-size CLI flag (and mask_image_size pipeline kwarg) for a manual override; masks are still written at the source resolution. The launchers export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to further cut fragmentation OOMs.facetmask-app console-script, launch-app.bat). A real Windows window — no browser, no local webserver, no port — built on Tkinter/ttk (already a dependency, so zero new packages). Native OS file dialogs that belong to the window, a single-screen form (input/output, preset, interval, SAM prompts, collapsible Advanced), and a richer status panel than the web UI: a per-stage checklist (extract → mask → voronoi → reframe), a live progress bar with percent + elapsed timer, a calibration- based ETA, a per-file batch queue showing each input's detected type (360 / photo), a scrolling run log, and a working Stop button (cooperative cancel via the progress callback). Same pipeline and output as the CLI and Gradio UI. The Gradio web UI is retained as a cross-platform fallback. New module desktop.py; pure logic (run controller, stage helpers, estimator) is Tk-free and unit-tested headless. The desktop app follows the OS light/dark theme, renders crisply on high-DPI displays (process DPI-awareness + Tk scaling), keeps Advanced settings in an always-open right-hand side panel, pairs every slider with a live typable numeric field, and shows hover tooltips on the non-obvious controls.photos/ with their masks in photo_masks/ ({stem}_mask.png, Metashape convention). A single batch may freely mix 360 clips and ordinary media — a folder ripped off a device routes each input correctly. Works identically across the desktop app, the Gradio UI, and the CLI with no new flags (detection is automatic). reframe_log.json records a processing_mode per source plus passthrough_images_written / passthrough_masks_written totals.--input-dir (CLI) and the folder pickers (both UIs) collect video *and* image files (.jpg/.jpeg/.png/.webp/.tif/.tiff/.bmp), not just videos, so a mixed device dump batches in one pass.facetmask-gui console-script). Drag-drop MP4, preset dropdown, comma-separated SAM 3 prompts, collapsible advanced panel for interval / overlap / Voronoi / JPEG quality, "Process" button, live status + run log output. Designed for non-technical operators who shouldn't have to learn the CLI — same pipeline, same output structure, just point-and-click. New [gui] optional dependency group brings in Gradio 4.x / 5.x.gr.File` widget copies uploaded files to its temp dir even when `type="filepath"`, which is wasteful for multi-GB Insta360 videos. The new `gr.Textbox` accepts a full filesystem path and the pipeline reads from disk in place — no copy, works equally well for local files and NAS-mounted UNC paths.person, car`, `bus, truck, bicycle, motorcycle, sky`, `traffic cone, garbage can`) are checkboxes that pre-populate the prompt string; a separate freeform textbox accepts open-vocabulary additions. The two sources are deduplicated and concatenated automatically.gr.Progress`. The pipeline's existing `progress_callback` is now plumbed through to the UI so the operator sees `stage: current/total` updates during long runs (was: no feedback at all between click and completion).HF_TOKEN env var nor a .hf_token` file is auto- discoverable from the facetmask install location.combine_prompts()` helper plus 6 tests covering: checkbox-only, freeform-only, both, deduplication, whitespace handling, empty inputs. Plus 4 new path-validation tests (none / empty / nonexistent / directory-not-file).Test count: 121 passing.
Initial release. Feature parity with the equirect-to-COLMAP-dataset path of the alexmgee/lichtfeld-360-plugin, running standalone as a CLI tool.
presets.py — seven view-layout presets covering the common equirect-to-pinhole reframing layouts:cubemap-6 — classic 6-face cubedefault-16 — two ±35° rings of 8, no poles (alexmgee plugin's default; street-capture optimised)fibonacci-10/14/20/24 — Fibonacci-spiral sphere sampling, full-sphere coverageicosahedral-20 — face-centroid sampling, exactly uniform by constructionreframer.py — cv2.remap-based equirect → pinhole reprojection. Map caching per (preset, equirect resolution) so batch reframing of many frames with the same preset costs one map computation + per-frame cv2.remap calls. Supports overlap_degrees to widen each view's effective FOV.extractor.py — Tenengrad sharpness scoring per frame, interval partitioning with scene-change splits (threshold 0.3), pick of the sharpest frame per (sub-)chunk. Port of the alexmgee plugin's "Best" extraction mode.masker.py — SAM 3 (facebook/sam3 via transformers >= 5.0) with open-vocabulary text prompts. Lazy imports of heavy ML deps — module import is free; torch/transformers only load when Sam3Masker is instantiated. Token lookup order: explicit arg → HF_TOKEN env var → .hf_token file at cwd or repo root → error.overlap_masks.py — Voronoi sphere partitioning. For each preset view, assigns every direction on the sphere to the view whose centre direction is closest, producing per-view exclusion masks that prevent COLMAP from extracting duplicate features across overlapping views. combine_masks() helper OR-combines binary masks.pipeline.py — orchestrates extract → (optional SAM) → reframe + per-view mask projection → (optional Voronoi) → combined per-view mask output. Writes frames/, masks/, views/, view_masks/, overlap_masks/, rig_config.json, reframe_log.json.cli.py + __main__.py — argparse entry point. Subcommands: extract (run the pipeline), list-presets (show available presets and exit). Optional tqdm progress bars, falls back to periodic stderr prints if tqdm isn't installed.pyproject.toml with facetmask console-script entry point. Apache-2.0 licensed. Optional [sam] and [dev] extras.person, car, bus, truck, bicycle, produces correctly-shaped masks on the operator and visible vehicles + pedestrians, projects masks per view, combines with Voronoi, writes alexmgee-compatible output.get_vision_features() optimization that should drop multi-prompt latency by 3–4× on top of the hardware speedup.