Categories: Engineering

Building Scout: Our In-House AI Agent for Engineering

How we turned Claude Code from a powerful assistant into a team-level engineering system.

When most teams adopt AI coding tools, they give engineers access to Claude or Copilot and call it a day. We tried that. It sort of worked. But we kept running into the same problems:

  • AI was ignoring our internal patterns or confusing itself with older code
  • Engineers were not able to catch up to the latest best practices with AI-driven development
  • Claude’s output quality was unreliable, often missing important logic and considerations

Ultimately we found that engineers spent more time fixing and debugging low-quality AI code than actually writing it themselves.

So we built Scout: an orchestration layer that turns Claude Code into an opinionated, team-aware AI engineering system designed to tackle 80-90% of day-to-day coding tasks with high confidence. This post explains what Scout is, how it works, and why we think this approach matters.

Scout Claude CodeScout Claude Code

The Problem with Raw AI Tools

On its own, Claude is extremely capable. But it is also:

Forgetful: Claude doesn’t reliably remember your codebase conventions, your testing patterns, or even how to repeat common workflows. Even with recent memory improvements.

Generic: Claude is trained on the entire internet, but it doesn’t know your specific stack, your file structure, or your team’s way of approaching problems. It tends to drift away from the ideal approaches.

Unstructured: Process is entirely up to the human. How you plan, when you test, how you structure PRs—you have to remember and enforce all of this yourself. This becomes very tricky with AI.

Biased toward a single agent: You’re interacting with one assistant. If you want multiple perspectives or parallel exploration, you’re manually juggling multiple chats. Claude doesn’t leverage subagents on its own enough when working on more complex tasks.

For occasional or personal use, this is fine. For production engineering work, we needed something more systematic.

What is Scout?

Scout is a toolbox built on top of Claude Code. It takes Claude code and biases it toward reliability over speed. Our goal is for engineers to architect solutions and have Claude handle the foundational code before the engineer wraps up the final 10%. (We found that getting Scout to 100% complete a task severely diminishes code quality, reliability, and speed.)

It’s built on the following core principles:

  1. An engineer’s time is valuable and should be spent more on engineering solutions than coding.
  2. If we can spend tokens to improve the reliability of the AI agent, we should.
  3. Consistent, high-quality output is far more valuable than speed.
  4. Manual coding is still crucial but not necessary for the foundational code.

In other words, we need Scout to be a partner in an engineer’s daily workflow where they can hand off large tasks and come back to something they can work with. Today, we achieve this by implementing the following:

  1. Bias toward subagents
  2. Shared configurations
  3. AI notes
  4. Opinionated workflows
  5. Heavy validation checks
  6. Custom skills
  7. Humans in the loop

1. Bias Toward Subagents

The most impactful Scout feature is its bias towards subagents. Why subagents? When working with a large complex codebase, you find yourself constantly hitting the 200k context. The main agent is looking at multiple files, documentation, architecture, patterns, etc., trying to figure out how to extract key information, and fit it all into a small window. 

Subagents change that. At the cost of tokens, Scout will leverage Claude’s subagent feature to send smaller prompts to individual agents that then have a full 200k window to research as needed and only output exactly what’s necessary for the main agent.

You may ask, why is Scout special? This is a built-in feature in Claude. We do a few things differently:

Network of Subagents

We found large success in thoughtfully creating a subagent for each specialized area of focus. These agents are not allowed to work outside of that specialized area. This means we have a graphql-engineer, services-engineer, rspec-test-engineer, code-quality-engineer, best-practices-engineer. While it appears excessive, the quality of work improved. The full 200k context of each agent is only used to ensure it does that specific area of work well. If a generic subagent is tasked with creating a graphql + specs + a service, it will also need to fill up the context to understand the conventions of each of those. No different from what the main agent would have done anyways.

Leveraging Redundant Subagents

Another key aspect of Scout is designing subagents so that the end product is high quality. We do this in a few ways:

  • Redundant subagents for exploration
  • Swarms for complex tasks
  • Categorizing each subagent to one of the following: Exploration, Planning, Execution, Testing, Validation

Going back to the core principles of Scout, redundant subagents fit perfectly. When working in a complex code base, it’s difficult to trust Claude will consistently get a feature right on the first try. This can lead to more time spent by the engineer reviewing and debugging ai slop code. This situation can be drastically reduced by using subagents to research in parallel, focus on specific phases of development (ex. A test subagent is only worried about tests), or having multiple subagents attempt a solution and compare their output together.

Using The Right Model

One huge value prop for subagents is that you can assign specific models for specific subagents. We take the following rule of thumb when choosing a subagent model: Opus for reasoning, Sonnet for execution and exploration, and Haiku for simple subagents.

For example:

  • Opus: Planning subagent, Test architect subagent
  • Sonnet: React Zustand subagent, Rspec subagent
  • Haiku: CI/CD Monitor subagent
Encouragement of subagent use

Claude didn’t use subagents as much as we’d like at first. We ended up creating a subagents_directory.md file and pointing to it inside the claude.md and found it used subagents more often. We attempted to use a prompt injection hook and reference it there, but found Claude used subagents way too often.

Complex commands are subagent orchestrators

We shifted our complex custom commands to be orchestrators of subagents. These are commands expected to be long running (anywhere from 10 minutes to an hour or more). Some of these commands include:

  • Planning: /scout-plan, /scout-plan-max, /scout-plan-lite
  • Documentation: /scout-document-architecture, /scout-read-documentation
  • Execution: /scout-swarm

Each command leverages the approaches mentioned above.

For example, the /scout-plan command uses anywhere from 2-5 discover subagents during the exploration phase to ensure we gather all the necessary context, no matter the size of request. During the planning phase it uses the planning subagent on opus to build the plan, and validators to confirm the plan looks correct. 

It also plans out the subagents each task should be assigned to. The execution phase leverages subagents in a specific order. For a react native feature you may expect the graphql subagent to figure out how to build the query, the zustand subagent to build the store, a util subagent to build the necessary utils, the component subagent to actually build the components, and a jest subagent to write the tests.

Again, this might seem excessive but our goal is to almost always have Scout write the foundational 90% of code correctly (assuming the engineer writes a quality prompt).

2. Shared Configuration and Repo Awareness

One huge value prop of Scout is that we can sync shared configurations and feature flag subagents, commands, plan phase steps, etc. Examples of what is synced/configured:

  1. Preapproved / predenied commands
  2. Turn off certain subagent categories
  3. Disable certain plan steps (ex. Note taking)
  4. MCPs
  5. Repo paths (enables agents on certain repos to research other repos as needed)

3. Institutional Memory via aiDocs

We found thorough documentation for our custom commands to make a drastic impact. Commands such as /scout-plan will write very thorough documentation of its research so that it can reference it later as needed. This fixes some forgetfulness and hallucinations Claude will have for long running tasks.

As engineers work on more similar features, a scout-memory subagent will discover existing docs and leverage those to save tokens + speed up discovery.

By using our own subagent to manage the memory, we are able to thoroughly customize what gets stored and can tune it as needed.

4. Opinionated Workflows

With Claude Code alone, the process is entirely up to you. Scout bakes process into commands.

Planning levels:
  • /scout-plan-lite – Quick exploration for smaller tasks
  • /scout-plan – Standard planning with best-practices lookup
  • /scout-plan-max – Deep exploration for complex features
Git workflows:
  • /scout-breakup-files – Split changes into logical commits for review
  • /scout-breakup-pr-into-prs – Create PR chains following our conventions and biased towards making the pr easy to review

As mentioned before, each command will leverage subagents heavily as needed. Many follow an approach that structures the command as phases. Plan commands are the most complex with prompt building, discovery, execution, test, validation, and qa phases.

We found the /scout-plan-max command to consistently one-shot simple bug requests (even with poor prompts), and write a majority of code for a complex feature when given clear and well architected prompts. Remember, engineering isn’t always about coding, it’s about architecting solutions. This alone results in engineers spending less time debugging several bugs or coding existing patterns. Instead engineers are working more on impactful large features.

5. Thorough Validation Checks

A crucial piece to getting Scout to output high quality code is to spend time validating its input and output.

For planning some of our validators are:

  • Over-engineering validator
  • Existing Patterns validator
  • Online Research validator (ex. If context was pulled from a forum, does the logic make sense? Is it safe and secure to follow the forum advice?)

For final validation we have:

  • aiDocs compliance validator
  • Code validators
  • Test/lint runners validators

We found that having multiple opinionated reviewers makes a noticeable difference. This could mean having your engineers decide on different perspectives a reviewer should have and create a subagent for each. Perfect for reviewing plans before execution.

We noticed our token usage went up due to this, especially with opus calls. But we also had a decrease in ai slop code, and a much higher success rate in a pr being approved on first review.

6. Custom Skills

Scout improves every week. Whenever we find something that Scout has trouble with, we add a skill. This also includes replacing mcps with skills where possible due to reliability and speed.

Claude doesn’t do a great job in triggering these skills where needed. So we created a markdown file called subagents_directory.md with a list of skills and descriptions and keywords. We then added a hook to inject text at the end of each prompt to look at a skills markdown file to determine if any are necessary. (Scout only reads it once, it will reread it automatically when it forgets). As a result, we have seen skills consistently get triggered rather than sometimes.

7. Humans in the loop

Unless it’s a small task, AI is not in a place where we can blindly trust it to write code in a large scale codebase. We designed Scout with human in the loop in mind, where it makes sense. Some examples of this are:

Encouraging Scout to Ask Questions

Scout is pushed to ask questions during the planning commands. It does not start writing the plan until it has a clear understanding of the task at hand. For example, Scout might find 2 or more ways to accomplish a task and is unable to decide on the right approach, it will then use the AskUserQuestion tool to ask 

This saves a considerable amount of tokens and engineering time as you steer the agent before it makes a mistake.

Designing skills around human in the loop

We designed some skills with the expectation that you just can’t expect AI to efficiently work on its own. 

For example, the scout-debug skill has the engineer work alongside Scout to debug an error. The engineer will manually test and look at code while Scout handles error logging, reading errors, researching github issues, and directing which files the engineer should explore.

The 90% AI / 10% Engineer coding rule

We found that Scout’s quality increased by designing it to do 90% of the coding and not 100%. This doesn’t mean specifically telling to leave work for the engineer, rather not letting it do more if it diminishes quality. For example, having it do its own pr review and loop on itself without human interaction will cause it to confuse itself and work on the wrong things, leading to more ai slop. Instead we shifted with the expectation of there always being some form of cleanup needed. Often this involved an engineer doing a brief review and handing off feedback to Scout to fix.

Key Use Cases For Scout

We built Scout to be useful at every part of the engineering process.

Feature Building

The most obvious one, we use Scout to build features. Engineers take time to understand the high level architecture, key acceptance criteria, and any key engineering considerations. They then send it off to a /scout-plan and come back when done.

POC Building

Sometimes you don’t have time to build a feature but want to understand the scope of work. Scout excels at creating proofs of concept. We utilize the /scout-plan-max command with a quick prompt and observe its output. While the code isn’t always shippable, we will be able to see the surface area needed to implement a feature.

Deep Codebase Search

Scout can leverage subagents for deep codebase exploration. We utilize the /scout-document-architecture command to thoroughly document key areas of our application. These notes are automatically used during /scout-plan commands, or manually if you prefer to leverage Scout as a knowledge base. We are essentially “caching” what Scout would research anyways when given prompts, except with a much larger token budget.

Many times when even a /scout-plan-max struggles, doing a /scout-document-architecture beforehand would turn broken code into shippable production ready code. It is a good practice to run this command for every large project an engineer starts.

You might ask: why not just embed the codebase into a vector database and rely on similarity search? In our testing, we found that this documentation approach produces significantly fewer hallucinations and inaccuracies particularly for complex, cross-cutting questions that span many files. It doesn’t replace the need for a vector database (this approach still has a number of drawbacks). When accuracy and reliability matter, we’d rather dispatch multiple subagents to parse through architecture docs than risk an engineer spending valuable time to fill in the gaps that similarity search inevitably misses. 

Debugging Sidekick

Scout is great as a debugging sidekick. It’s not necessarily driving, but doing busy work. We leverage the scout-debug skill to help the engineer locate the root cause of a bug. Having a paired session with Scout where the human drives leads to a higher success rate in debugging where pure AI may fail.

Alternatively, we leverage /scout-swarm which will swarm on an issue with a large number of subagents in parallel. These subagents will also research github issues and package versions thoroughly.

Conclusion

To summarize, Scout transforms Claude from a capable but inconsistent assistant into a reliable engineering tool. By prioritizing quality over speed through specialized subagents, opinionated workflows, and deliberate human-in-the-loop design, we drastically scale each engineer’s impact. Fewer feedback loops, less time debugging AI slop, and more time spent on the engineering work that actually matters.

Armaiz is a Mobile Software Engineer at Hipcamp, working on the iOS and Android apps that connect campers with unique outdoor destinations

Recent Posts

Using AI at Hipcamp: An Introduction

Hipcamp’s mission is to get more people outside, and the limiting factors in executing that mission have always been our…

1 month ago

Top Tips for Catching the 2026 California Superbloom

Every few years, typically between early March and late June, California's pretty poppies and ethereal desert lilies sprout in unison…

2 months ago

How to Choose the Best Campsite for an Easter Camping Getaway

Regardless of whether you’re a first-time camper or a seasoned outdoorsperson, the campsite you choose can determine the success of…

2 months ago

America’s Most Romantic Glamping Getaways (According to Real Hipcamp Reviews)

If Valentine’s Day had a sound, it might be the clinking of cutlery in a restaurant packed with 50 other…

3 months ago

Australia’s Most Romantic Camping & Glamping Getaways (According to Real Hipcamp Reviews)

There’s no better place to reconnect than out in nature—away from screens, away from schedules, and with enough space to…

3 months ago

Yosemite Firefall 2026: Your Guide to Horsetail Fall’s Natural Phenomenon

If you think Yosemite National Park might make for a lackluster experience in winter—think again. From mid- to late February,…

3 months ago