Code Review Won't Catch Your Pricing Exploits. An Adversarial Audit Will.

There’s a category of bug that code review is structurally unable to catch.

A normal review checks: does this code do what the spec says? Are the types right? Are the tests passing? Is the SQL safe? Those are correctness questions, and a careful reviewer catches them.

What a reviewer does not catch is whether the spec itself is exploitable. If the spec says “customers can cancel any unfulfilled item in a bundle and receive a pro-rata refund,” your reviewer will check that the math implements pro-rata correctly. They will not pause to ask: what happens if a customer buys a 5-pack at $20/unit, gets one item fulfilled, then cancels the other four — do they end up with one item for less than the unbundled $25 single price? That’s not a code question. That’s an adversary question.

Adversary questions are the bugs that ship to production, get exploited by exactly one customer who notices, and then quietly bleed margin until someone notices a number trending the wrong way.

This is the checklist we now run before shipping anything that touches money. Twelve questions, every time.

Why this is worth a checklist

You will be tempted to skip it. The feature seems simple. The discount math is a fifth-grade word problem. You’re under deadline. The audit feels paranoid.

The audit is paranoid. That’s the point. The customer who exploits your pricing logic is also paranoid — they read the terms, they screenshot the receipt, they figure out that buying-and-canceling produces a better outcome than buying-correctly. You only need to be paranoid for the same fifteen minutes that they will be.

We landed on a written checklist after shipping a refund formula that — math-correct, spec-correct, tests-passing — let a customer extract value below our break-even point. Nobody on the review caught it because nobody on the review was thinking like a customer trying to win. Now we have a checklist that forces the adversarial frame.

The checklist

Run this on any feature that creates, modifies, or refunds a charge. Every question. Don’t skim.

1. What’s the cheapest path to the desired outcome?

For every flow your customer enters, ask: if I were trying to spend the least money to end up in the same final state, what would I do? Is that path the same as the “intended” path?

A bundle with a refund clause has at least two paths to “I own one item”: buy the single, or buy the bundle and cancel four. If those produce different costs, you have a problem.

2. Can this transaction be split to avoid a threshold?

If you offer a discount above $X, can a customer split one purchase into two to come in under the threshold on each one and still get something?

Less obvious: if you offer a discount above N items, can a customer cancel one item from a 6-pack to fall to 5 and trigger a different price tier than they qualified for at purchase?

3. Can canceling and re-buying produce a better outcome than just keeping it?

This is the version of #1 that bites the most. The customer doesn’t beat your system at purchase time. They beat it after fulfillment, when the partial-cancel + new-purchase path produces a strictly better state than the original order.

If your “cancel any unfulfilled item” feature lets the customer harvest the bundle discount on the items they keep and avoid paying for the items they don’t want, you’ve turned the bundle into a free option.

4. Can two discounts stack when they shouldn’t?

You probably documented “coupons cannot be combined.” Did you implement it? Are coupons checked against bundle pricing, loyalty pricing, referral credit, gift-card credit, and grandfathered legacy pricing — or just against each other?

The classic stacking bug is not “two coupon codes work simultaneously.” It’s “the coupon code is applied to a price that has already had a different discount baked in.”

5. What’s the refund path on a partial fulfillment?

If a customer pays for X, receives Y < X, and then asks for a refund — what does the refund equal? Is it (X - Y) / X * total_paid? Is it (X - Y) * unit_price? Are those two formulas the same?

They are almost never the same when the bundle is discounted. Pick one. Document which one. Verify the customer cannot toggle between them by phrasing the support request differently.

6. What happens at the boundaries — zero, one, negative, max?

A bundle priced at “buy 2-10 songs, $20 each” — what does the form do at quantity 0? At quantity 1? At quantity 11? At quantity -1 (yes, somebody will try)?

If the answer is “the validation library catches it,” verify on the server, not just in the UI. If the answer is “the schema constrains it,” verify that constraints fire on every code path that creates an order, not just the happy path.

7. What happens to existing orders when the price changes tomorrow?

You will change a price. Probably soon. Probably during a promotion. Probably while orders are mid-flight.

What does an in-progress order do when the unit price changes between purchase and fulfillment? What happens to the customer’s stored receipt? What happens to your refund formula, which probably reads the current unit price and not the purchase-time unit price?

This bug usually doesn’t show up until the second time you run a sale.

8. Can the customer move backward through the state machine in a way that re-grants benefits?

If your order has states like pending → paid → fulfilled → completed, and somewhere there’s a “request changes” or “ask for revision” button that moves the order back to an earlier state, ask: does that state transition reset any one-time grants? Promotional credit? Free-revision count? Loyalty points?

The bug is: customer pays, gets fulfilled, requests a revision, lands back in paid with a fresh revision counter, then completes. Free revision, every time, by design.

9. Is anything trustfully derived from customer-controlled input?

The price they’re charged should not be a function of the price the form submitted. The discount they got should not be a function of a hidden field on the page. The bundle they bought should not be a function of a query parameter.

This is obvious when stated. It is non-obvious in practice because frameworks make it easy to round-trip “intended price” through the form and discover three sprints later that it’s the only place the price was ever computed.

10. If the worst-case outcome is “unbounded,” fix that first.

A bug that loses you $5 per occurrence is a small problem. A bug that lets one customer extract arbitrary refunds because of a missing balance check is an unbounded problem.

When you triage adversary-mode findings, sort by worst-case dollar impact, not likelihood. The unlikely-but-unbounded bugs are the ones that wipe out a month.

11. What does the audit log show for each of these scenarios?

For every bug above: if it happens, will you know? Does your audit log capture pre- and post-discount totals? Does it capture which coupon was applied? Does it capture state-machine transitions that affect refunds?

If the answer is “we’d have to reconstruct it from Stripe webhooks and database timestamps,” you don’t have an audit log — you have a forensic project.

12. Could a refund script triggered by a support ticket execute this scenario?

The customer doesn’t have to find the exploit. A support agent acting in good faith — refunding a customer’s “wrong order” — can trigger the same code path that an adversary would. If your system gives support agents tools that bypass checks, ask whether those tools could be used to execute every bug above.

This is where most large-company refund fraud comes from: a polite-sounding ticket, an agent reaching for a refund button, a code path that didn’t anticipate that combination.

How to actually run it

We do this as a pre-merge checklist on the PR description. Each of the twelve questions, copy-pasted, with a one-sentence answer underneath. “N/A” is a valid answer when it’s truly N/A — but if every answer is “N/A,” that’s a smell that the author didn’t engage.

The whole audit takes fifteen to thirty minutes for a real money-touching feature. It is the cheapest fifteen minutes of work in the entire ship cycle. It catches the bugs that don’t show up in any test, that no reviewer is structurally able to find, and that — if you ship them — will quietly cost you money for as long as they exist.

The mindset is: code review proves the code matches the spec. The adversarial audit asks whether the spec is wrong. You need both, and only one of them is a normal engineering activity.

If you ship features that touch money and you don’t have something like this in your process: write your version of the list. Use ours, fork it, change every word. The specific list matters less than the ritual of pausing — every time, no exceptions, especially when you think it’s a small change — to spend fifteen minutes thinking like the person who wants to win against you.

Most of them won’t try. The ones who do, only have to find one thing.