2026.05.19 methodology

How I wrote DECISIONS.md before any code, and shipped Spotter in 22 hours

A take-home methodology: lock nine architectural choices on paper first, then let AI execute. FMCSA HOS trip planner, 51 tests, live URL.

A US trucking AI startup sent me a take-home assessment last week. Build a working trip planner for dispatchers. Django plus React. FMCSA Hours-of-Service compliant. Live URL, GitHub repo, three-to-five minute Loom walkthrough. Standard FDE-shape submission.

The default approach is to start coding immediately. Pick a stack, scaffold a project, see what Claude generates, iterate. Most engineers will do exactly this. Most engineers also will not be able to defend every choice on a follow-up call two weeks later.

I did the opposite. Before I wrote a single line of code, I wrote a document called DECISIONS.md. It contains nine architectural choices, each with the alternative I rejected and the tradeoff I accepted. Then I wrote a separate document called CONTEXT.md explaining who the user actually is. Then BUILD-BRIEF.md describing the finished application.

Only after all three documents were locked did I open my editor.

The whole thing shipped in 22 hours. 51 passing tests. Every HOS constant cited to the federal regulation it implements. A live URL.

This post is the methodology, not the marketing.

The user is not the trucker

The first decision, before any code or framework, was who the user actually is.

The obvious answer is "trucker." The obvious answer is wrong.

A trucker doesn't plan their own routes. They execute them. The dispatcher plans the route. The dispatcher gets a phone call from a freight broker. The dispatcher has three minutes to decide whether to accept the load. Their mental math: can my driver legally complete this trip given the hours they've already used in this rolling 8-day cycle? Do I have time to reroute through a different driver? Should I pass on this revenue and wait for a better load?

Small fleets, between 1 and 50 trucks, are the bulk of US trucking. Most of them run on paper logs and Excel. The dispatcher is often the owner. The dispatcher loses money every week from bad mental math: accepting loads they can't legally complete, declining loads they could have run, mis-pricing because they didn't factor cycle costs.

This is the actual product surface. Not "trucker planning their trip." Dispatcher answering the broker's question.

CONTEXT.md captured this in one sentence: "The user isn't the trucker, it's the dispatcher fielding broker calls."

Everything else in the build was downstream of this insight.

The nine decisions

I'll walk each one fast. Each got its own template in DECISIONS.md: Choice, Why, Alternatives Rejected, Tradeoff Accepted.

1. OpenRouteService with the driving-hgv profile. ORS has a free truck-routing profile that respects bridge clearances, axle weight limits, and HazMat restrictions per road segment. Mapbox's free tier is car-only. OSRM has no built-in truck profile. For an FMCSA app, car routing would generate physically impossible truck routes. Interstate semi-trailers sent through residential streets. Decision 1 is the floor below which the build is wrong.

2. Hybrid log-sheet fidelity. The 4-row 24-hour duty-status grid is pixel-faithful to the FMCSA paper RODS format. DOT inspectors are trained on that exact layout. Reinventing the grid would alienate the trained user. Everything around the grid uses modern MUI typography. Best of both: domain credibility plus modern engineering taste.

3. Conservative HOS scope. Implement the 4 core clocks per §395.3, namely the 14-hour shift window, the 11-hour driving cap, the 30-minute break trigger after 8 cumulative driving hours, and the 70-hour rolling cycle. Defer sleeper-berth split, 34-hour restart, short-haul exemption, 16-hour big-day exception, personal conveyance, yard moves, co-drivers. Each deferral documented with the trigger condition that would justify shipping it in v2.

4. Mobile responsive single column. Real dispatchers work from their phone. Brokers call them on the move. The "can my driver take this load" question gets answered between meetings, not at a desk.

5. SQLite default. Zero deploy friction. Single-tenant single-trip per request. No concurrency story to solve. The data is fully relational, so the document-store debate was over before it started.

6. Four-model domain. Trip, Stop, LogDay, LogEntry. Each model serves one view. Stop owns the map markers with lat/lng. LogDay owns the admin block and the 24-hour-totals invariant. LogEntry owns one row in the duty-status grid. The duplication between "a pickup is also a stop" is wrapped in a helper function. Roughly 50 lines of duplicate-writes accepted to keep the read paths clean.

7. Single-page form-then-results. Form at top, summary below, map below that, log sheets stacked at the bottom. One scroll, one URL. The Loom demo flows linearly. Tabs would force a click during recording and kill the narration.

8. Greedy event-loop scheduler. HOS clocks are monotonic. They only go one direction. There is no scenario where "drive less now to drive more later" beats "drive until forced to stop, take the required reset, resume." Greedy is provably optimal. Backtracking has nothing to undo. A constraint solver would be heavyweight for a problem this shape.

9. Repo layout. git init runs inside app/, not at the parent. Planning docs (CONTEXT, DECISIONS, BUILD-BRIEF) live one level up and never touch git. The reviewer gets a clean public repo. My work surface stays private.

Nine decisions. Each defensible in under 20 seconds of speech. None of them made during the build.

The unfair advantage

The real lesson isn't any of those nine choices. It's that I wrote them down before I wrote any code.

Most engineers reverse this. They start coding, hit a fork, pick whatever feels reasonable, keep moving, hit the next fork, pick whatever feels reasonable, keep moving. Two days later they have working software and zero defense for any individual choice. "Why did you pick X over Y?" gets answered with "I don't remember exactly, it seemed right at the time."

When you write the decisions doc first, the build becomes a translation exercise. Every fork was pre-decided. The AI types, you review. The reviewer who asks "why X" gets a paragraph that was written calmly before any panic-coding happened, with the alternative explicitly named and the tradeoff named in plain English.

This is the inversion that makes AI-leveraged engineering actually work. AI accelerates execution. It does not improve judgment. If the judgment was made at code-time, under fatigue, with a deadline closing, AI just speeds up the bad code. If the judgment was made when you had time to think and the tradeoffs were visible on the page, AI is a multiplier on quality.

The Anthropic AI Fluency course frames this skill as four practices: Delegation (deciding what to hand off), Description (specifying what you want), Discernment (evaluating what came back), and Diligence (taking responsibility for what ships). Writing DECISIONS.md first is the Delegation step done well. Everything downstream follows.

What got shipped

Twenty-two hours of work. Most of it in two focused sessions.

A Django backend with five files of real logic: hos_constants.py for the §395 constants, hos_clocks.py for the four-counter state machine, hos_scheduler.py for the greedy event loop, ors_client.py for the routing wrapper, trip_builder.py for the orchestration layer that ties geocode plus route plus scheduler plus DB writes into one atomic transaction.

A React frontend with five components. TripForm with 4 inputs. TripSummary rendering the legal/illegal verdict, distance, days, cycle remaining. TripMap built on Leaflet with the ORS polyline and custom DivIcon markers. LogSheets with 3 daily log cards, expandable. LogSheetGrid as the 24-hour SVG renderer that mimics the FMCSA paper format down to the stepped duty-status line and 15-minute tick marks.

A test suite of 51 passing tests across the clocks layer, scheduler layer, ORS client, and an end-to-end worked example that locks the canonical Houston-to-Chicago sequence. The worked example asserts that for a trip with 20 cycle hours already used and a 1,324-mile route including a 236-mile deadhead, the planner produces a 3-day schedule ending with cycle used at 51.91 of 70.

A live URL. A public GitHub repo.

What it does not have: authentication, multi-trip planning, sleeper-berth split, 34-hour restart, short-haul exemption. Each of those is named in DECISIONS.md with the specific condition that would trigger its inclusion in v2.

Three things DECISIONS.md didn't predict

The doc-before-code pattern catches most forks before they happen. But "most" is not "all." Three moments during the build taught me something the planning didn't.

The average-speed constant was wrong. Decision 8 originally specified AVG_MPH = 55 as a fixed constant, on the assumption that highway trucks roughly do 55 mph. The first ORS test came back with 24 hours of drive time for 1,088 miles, which is 45 mph average. The driving-hgv profile factors in HGV speed limits, weight-based slowdowns, and route-specific limits I hadn't thought about. I rewrote the scheduler to derive avg_mph per-trip from the ORS distance and duration response. Lesson: even "obvious" constants can be wrong if they're not anchored to the real-world data source. If your data source has opinions, ask it.

The fuel stop at 2:33 AM was correct and absurd. The greedy scheduler emitted a fuel stop at 2:33 AM on Day 3 because the odometer hit exactly 1,000 miles 33 minutes into the day's first drive segment. Mathematically correct. Visually weird. A human dispatcher would have fueled during the previous 10-hour rest, not woken the driver up, driven 33 minutes, then fueled. Greedy is provably optimal for the rules as written. It is not optimal for "the schedule a human dispatcher would actually produce." v2 would add a stop-coalescing pass that bundles fuel into adjacent rests when they're within a tolerance. Lesson: optimal-by-rules and usable-by-humans are different problems. The post-pass for human readability is its own scope of work.

The mid-build amendment cost more in doc rewrites than in code. Three hours into the build, I realized the deadhead leg (Dallas to Houston, before pickup) should count against the driver's HOS clock. That's the FMCSA-correct behavior, not the simplified single-leg interpretation I'd locked in DECISIONS.md. The actual code change was about 30 minutes: extend the scheduler signature with optional deadhead_miles and deadhead_drive_hours kwargs, reuse the same HOSClocks instance across legs. The doc rewrites cost over two hours: BUILD-BRIEF, CONTEXT, the worked-example unit test, and the on-camera numbers in the demo. Lesson: locked planning docs are great until you reverse a foundational assumption. When you do, the cost is not the code. It's the cascade through everything the doc was anchoring.

What this is, what it isn't

This is a methodology for shipping defensible engineering work in the AI-leveraged era. It's a tool, not a religion. There are projects where DECISIONS.md is overkill: a one-off internal script, a learning exercise, a throwaway prototype. The cost of writing the doc is real.

But for take-home assessments, client work, anything where the work is going to be reviewed by people who ask hard questions, the doc pays for itself. Every hour writing it saves five hours of post-hoc rationalization later.

The Spotter take-home was the test of this pattern. Twenty-two hours of work, every choice defensible, AI as the executor and me as the architect. DECISIONS.md is the primary artifact. The code is the receipt.

Live: spotter-assessment-five.vercel.app
Code: github.com/rohitsux/spotter-assessment