Microsoft has open‑sourced SkillOpt, a framework that treats an AI agent’s markdown skill file as a trainable object. By applying deep‑learning‑style optimization—learning‑rate‑like edit budgets, validation gates, and momentum—SkillOpt can automatically propose, test, and accept edits to the skill document, all while keeping the underlying model frozen.
The system starts with a baseline skill, runs a batch of tasks to collect execution trajectories, and then an offline optimizer suggests structural changes. Proposed edits are filtered, ranked, and evaluated on a held‑out set; only those that improve validation scores are kept, while rejected edits feed a negative‑memory buffer to prevent repetition. This disciplined approach avoids the drift and instability that plague manual prompt engineering.
In tests across 52 model‑benchmark‑harness combinations—including GPT‑5.5, GPT‑5.4‑mini, and Qwen3.5—SkillOpt delivered an average gain of +23.5 points over a no‑skill baseline and outperformed prior prompt‑optimization methods. Gains were especially pronounced for smaller models, with some tasks seeing doubled or tripled scores. The optimized skill artifacts remain under 2,000 tokens, transfer cleanly between models and execution environments, and cost only a few dollars to train, making them a practical tool for enterprise automation of document extraction, tool use, and multi‑step workflows.



