How we turned Claude Code from a powerful assistant into a team-level engineering system.
When most teams adopt AI coding tools, they give engineers access to Claude or Copilot and call it a day. We tried that. It sort of worked. But we kept running into the same problems:
Ultimately we found that engineers spent more time fixing and debugging low-quality AI code than actually writing it themselves.
So we built Scout: an orchestration layer that turns Claude Code into an opinionated, team-aware AI engineering system designed to tackle 80-90% of day-to-day coding tasks with high confidence. This post explains what Scout is, how it works, and why we think this approach matters.


On its own, Claude is extremely capable. But it is also:
Forgetful: Claude doesn’t reliably remember your codebase conventions, your testing patterns, or even how to repeat common workflows. Even with recent memory improvements.
Generic: Claude is trained on the entire internet, but it doesn’t know your specific stack, your file structure, or your team’s way of approaching problems. It tends to drift away from the ideal approaches.
Unstructured: Process is entirely up to the human. How you plan, when you test, how you structure PRs—you have to remember and enforce all of this yourself. This becomes very tricky with AI.
Biased toward a single agent: You’re interacting with one assistant. If you want multiple perspectives or parallel exploration, you’re manually juggling multiple chats. Claude doesn’t leverage subagents on its own enough when working on more complex tasks.
For occasional or personal use, this is fine. For production engineering work, we needed something more systematic.
Scout is a toolbox built on top of Claude Code. It takes Claude code and biases it toward reliability over speed. Our goal is for engineers to architect solutions and have Claude handle the foundational code before the engineer wraps up the final 10%. (We found that getting Scout to 100% complete a task severely diminishes code quality, reliability, and speed.)
It’s built on the following core principles:
In other words, we need Scout to be a partner in an engineer’s daily workflow where they can hand off large tasks and come back to something they can work with. Today, we achieve this by implementing the following:
The most impactful Scout feature is its bias towards subagents. Why subagents? When working with a large complex codebase, you find yourself constantly hitting the 200k context. The main agent is looking at multiple files, documentation, architecture, patterns, etc., trying to figure out how to extract key information, and fit it all into a small window.
Subagents change that. At the cost of tokens, Scout will leverage Claude’s subagent feature to send smaller prompts to individual agents that then have a full 200k window to research as needed and only output exactly what’s necessary for the main agent.
You may ask, why is Scout special? This is a built-in feature in Claude. We do a few things differently:
We found large success in thoughtfully creating a subagent for each specialized area of focus. These agents are not allowed to work outside of that specialized area. This means we have a graphql-engineer, services-engineer, rspec-test-engineer, code-quality-engineer, best-practices-engineer. While it appears excessive, the quality of work improved. The full 200k context of each agent is only used to ensure it does that specific area of work well. If a generic subagent is tasked with creating a graphql + specs + a service, it will also need to fill up the context to understand the conventions of each of those. No different from what the main agent would have done anyways.
Another key aspect of Scout is designing subagents so that the end product is high quality. We do this in a few ways:
Going back to the core principles of Scout, redundant subagents fit perfectly. When working in a complex code base, it’s difficult to trust Claude will consistently get a feature right on the first try. This can lead to more time spent by the engineer reviewing and debugging ai slop code. This situation can be drastically reduced by using subagents to research in parallel, focus on specific phases of development (ex. A test subagent is only worried about tests), or having multiple subagents attempt a solution and compare their output together.
One huge value prop for subagents is that you can assign specific models for specific subagents. We take the following rule of thumb when choosing a subagent model: Opus for reasoning, Sonnet for execution and exploration, and Haiku for simple subagents.
For example:
Claude didn’t use subagents as much as we’d like at first. We ended up creating a subagents_directory.md file and pointing to it inside the claude.md and found it used subagents more often. We attempted to use a prompt injection hook and reference it there, but found Claude used subagents way too often.
We shifted our complex custom commands to be orchestrators of subagents. These are commands expected to be long running (anywhere from 10 minutes to an hour or more). Some of these commands include:
Each command leverages the approaches mentioned above.
For example, the /scout-plan command uses anywhere from 2-5 discover subagents during the exploration phase to ensure we gather all the necessary context, no matter the size of request. During the planning phase it uses the planning subagent on opus to build the plan, and validators to confirm the plan looks correct.
It also plans out the subagents each task should be assigned to. The execution phase leverages subagents in a specific order. For a react native feature you may expect the graphql subagent to figure out how to build the query, the zustand subagent to build the store, a util subagent to build the necessary utils, the component subagent to actually build the components, and a jest subagent to write the tests.
Again, this might seem excessive but our goal is to almost always have Scout write the foundational 90% of code correctly (assuming the engineer writes a quality prompt).
One huge value prop of Scout is that we can sync shared configurations and feature flag subagents, commands, plan phase steps, etc. Examples of what is synced/configured:
We found thorough documentation for our custom commands to make a drastic impact. Commands such as /scout-plan will write very thorough documentation of its research so that it can reference it later as needed. This fixes some forgetfulness and hallucinations Claude will have for long running tasks.
As engineers work on more similar features, a scout-memory subagent will discover existing docs and leverage those to save tokens + speed up discovery.
By using our own subagent to manage the memory, we are able to thoroughly customize what gets stored and can tune it as needed.
With Claude Code alone, the process is entirely up to you. Scout bakes process into commands.
As mentioned before, each command will leverage subagents heavily as needed. Many follow an approach that structures the command as phases. Plan commands are the most complex with prompt building, discovery, execution, test, validation, and qa phases.
We found the /scout-plan-max command to consistently one-shot simple bug requests (even with poor prompts), and write a majority of code for a complex feature when given clear and well architected prompts. Remember, engineering isn’t always about coding, it’s about architecting solutions. This alone results in engineers spending less time debugging several bugs or coding existing patterns. Instead engineers are working more on impactful large features.
A crucial piece to getting Scout to output high quality code is to spend time validating its input and output.
For planning some of our validators are:
For final validation we have:
We found that having multiple opinionated reviewers makes a noticeable difference. This could mean having your engineers decide on different perspectives a reviewer should have and create a subagent for each. Perfect for reviewing plans before execution.
We noticed our token usage went up due to this, especially with opus calls. But we also had a decrease in ai slop code, and a much higher success rate in a pr being approved on first review.
Scout improves every week. Whenever we find something that Scout has trouble with, we add a skill. This also includes replacing mcps with skills where possible due to reliability and speed.
Claude doesn’t do a great job in triggering these skills where needed. So we created a markdown file called subagents_directory.md with a list of skills and descriptions and keywords. We then added a hook to inject text at the end of each prompt to look at a skills markdown file to determine if any are necessary. (Scout only reads it once, it will reread it automatically when it forgets). As a result, we have seen skills consistently get triggered rather than sometimes.
Unless it’s a small task, AI is not in a place where we can blindly trust it to write code in a large scale codebase. We designed Scout with human in the loop in mind, where it makes sense. Some examples of this are:
Scout is pushed to ask questions during the planning commands. It does not start writing the plan until it has a clear understanding of the task at hand. For example, Scout might find 2 or more ways to accomplish a task and is unable to decide on the right approach, it will then use the AskUserQuestion tool to ask
This saves a considerable amount of tokens and engineering time as you steer the agent before it makes a mistake.
We designed some skills with the expectation that you just can’t expect AI to efficiently work on its own.
For example, the scout-debug skill has the engineer work alongside Scout to debug an error. The engineer will manually test and look at code while Scout handles error logging, reading errors, researching github issues, and directing which files the engineer should explore.
We found that Scout’s quality increased by designing it to do 90% of the coding and not 100%. This doesn’t mean specifically telling to leave work for the engineer, rather not letting it do more if it diminishes quality. For example, having it do its own pr review and loop on itself without human interaction will cause it to confuse itself and work on the wrong things, leading to more ai slop. Instead we shifted with the expectation of there always being some form of cleanup needed. Often this involved an engineer doing a brief review and handing off feedback to Scout to fix.
We built Scout to be useful at every part of the engineering process.
The most obvious one, we use Scout to build features. Engineers take time to understand the high level architecture, key acceptance criteria, and any key engineering considerations. They then send it off to a /scout-plan and come back when done.
Sometimes you don’t have time to build a feature but want to understand the scope of work. Scout excels at creating proofs of concept. We utilize the /scout-plan-max command with a quick prompt and observe its output. While the code isn’t always shippable, we will be able to see the surface area needed to implement a feature.
Scout can leverage subagents for deep codebase exploration. We utilize the /scout-document-architecture command to thoroughly document key areas of our application. These notes are automatically used during /scout-plan commands, or manually if you prefer to leverage Scout as a knowledge base. We are essentially “caching” what Scout would research anyways when given prompts, except with a much larger token budget.
Many times when even a /scout-plan-max struggles, doing a /scout-document-architecture beforehand would turn broken code into shippable production ready code. It is a good practice to run this command for every large project an engineer starts.
You might ask: why not just embed the codebase into a vector database and rely on similarity search? In our testing, we found that this documentation approach produces significantly fewer hallucinations and inaccuracies particularly for complex, cross-cutting questions that span many files. It doesn’t replace the need for a vector database (this approach still has a number of drawbacks). When accuracy and reliability matter, we’d rather dispatch multiple subagents to parse through architecture docs than risk an engineer spending valuable time to fill in the gaps that similarity search inevitably misses.
Scout is great as a debugging sidekick. It’s not necessarily driving, but doing busy work. We leverage the scout-debug skill to help the engineer locate the root cause of a bug. Having a paired session with Scout where the human drives leads to a higher success rate in debugging where pure AI may fail.
Alternatively, we leverage /scout-swarm which will swarm on an issue with a large number of subagents in parallel. These subagents will also research github issues and package versions thoroughly.
To summarize, Scout transforms Claude from a capable but inconsistent assistant into a reliable engineering tool. By prioritizing quality over speed through specialized subagents, opinionated workflows, and deliberate human-in-the-loop design, we drastically scale each engineer’s impact. Fewer feedback loops, less time debugging AI slop, and more time spent on the engineering work that actually matters.
Hipcamp’s mission is to get more people outside, and the limiting factors in executing that mission have always been our…
Every few years, typically between early March and late June, California's pretty poppies and ethereal desert lilies sprout in unison…
Regardless of whether you’re a first-time camper or a seasoned outdoorsperson, the campsite you choose can determine the success of…
If Valentine’s Day had a sound, it might be the clinking of cutlery in a restaurant packed with 50 other…
There’s no better place to reconnect than out in nature—away from screens, away from schedules, and with enough space to…
If you think Yosemite National Park might make for a lackluster experience in winter—think again. From mid- to late February,…