AI Fine‑Tuning Code Dataset Creation Tool
Open‑source (planned)Build curated, deduplicated training sets for code models with fine-grained control over sources, licensing, and metadata.
- Pipeline orchestration for parsing, cleaning, and labeling
- License-aware filtering and provenance tracking
- Exports to JSONL, Parquet, and custom schemas
- CLI automation and REST integration hooks