Most posts open with a preamble. Let me skip mine.
Here's what we built for a client in the mining-intelligence space: a self-directed mining investor types a ticker into a box, hits go, and ninety seconds later a full diligence report lands in their inbox. Fifty sections. Synthesized judgment in every one. Lifecycle stage. Peer-relative valuation. A management credibility score. A macro meltdown dial. A live breakout status with price targets. All of it keyed specifically to the mining and metals sector — not a generic screener pretending to understand the space.
The thesis behind all of that is our client's, not ours. They came in with decades of sector memory, a precise view of the gap they wanted to close for their users, and a clear picture of what "decision-ready" should look like in this space. Our job was to take that thesis and turn it into software — reliable, fast, and true to the sector expertise behind it. This post is the story of the build, with the credit where it belongs: the product judgment is theirs, the engineering is ours.
The gap the client kept pointing at
Mining and metals investing has always lived in this strange split.
On one side, professional analysts with Bloomberg terminals, 43-101 technical reports, country-risk models, decades of sector memory. On the other, everybody else — retail investors, family-office analysts, operators with skin in the game — navigating one of the most complex sectors on the market with forums, press releases, and gut feel.
Our client had spent years inside that split, and the shape of the gap was something they'd already mapped in detail. Smaller investors miss the valley before a run-up. They hold through a meltdown that professionals saw coming three months earlier. Either way, the same pattern — the asymmetry wasn't in information (almost all of it is public), it was in who had the time and taste to connect the dots.
Their bet was that this asymmetry could be closed. Not by dumbing the sector down. Not by pushing more charts into the user's face. By automating the kind of diligence a senior analyst would do over a week, and delivering it as a single, readable report in minutes.
We agreed to build it.
The thesis was already sharp when the client brought it in. Our job was to make sure the engineering didn't blunt it — that every sector insight still read clearly after thirty data feeds and a dozen LLM calls had been through it.
The questions the client had already mapped
The easy temptation in any platform project is to build a richer dashboard — more feeds, more charts, more indicators — and call it a day. Our client walked in already past that temptation. They had spent years sitting with actual sector users. Those users were drowning in data already. What they couldn't get was synthesized judgment.
The client had narrowed it down to the same five questions, asked over and over:
- Is this company in an accumulation window, or am I buying at the top?
- Is the whole sector about to get hit by a macro event I don't have the model for?
- Is the management team actually credible, or am I looking at a revolving door of sector-adjacent names?
- Is this deposit genuinely cheap compared to peers, or does it just feel cheap?
- If gold goes to $5,000, what does this company's production actually produce in free cash?
Every one of those has an answer sitting in public data. None of them are easy to answer quickly. The brief handed to our engineering team was explicit: close the question-to-answer gap. Stop trying to be a dashboard and start being a report. One report per ticker. Decision-ready. End-to-end.
Inputs get fanned out, scored, and collapsed into a single decision-ready report. The work is in the middle pane.
Signal #1 — the lifecycle curve, their framing, our encoding
Mining companies don't behave like tech companies. An exploration junior, a mine developer, and a mid-tier producer are genuinely different animals, with different risk profiles, different capital needs, different catalysts. A generic "buy / sell / hold" screen that doesn't know about those stages is going to miss the most important call every time: where is this company on its own life cycle?
The client's central product insight was built around that question. They handed us the framework on day one — ten stages, their order, and what each stage meant for an investor who knew what they were looking at: exploration, speculation, discovery, resource definition, orphan period, development, financing, startup, operating mine, closure. Our job was to encode the curve so every company in the platform got placed on it automatically, cleanly, and consistently across every view in the product.
The ten-stage curve, simplified. Position determines signal: climbing is accumulation, at a peak is profit-taking, in the orphan valley is the classic under-the-radar buy.
The design decision that really earns its keep here is keeping the signal binary. No "hold." If a company is climbing, the directional recommendation is accumulation. If it's rolling over off a peak, it's profit-taking. If it's sitting in the orphan valley, it's the classic under-the-radar buy window. That call was the client's, and they made it with real conviction — they argued, convincingly, that forcing the engine to commit on direction is what makes the output useful, and that a soft middle option lets a system hide from the very question it exists to answer. Their sector users had told them the same thing, over and over.
Signal #2 — a dial that remembers 2008
The mining sector has a long memory of 2008. Equities fell 70–90% in an eighteen-month window, and the playbook to avoid the worst of it was visible in macro data months before the headlines. Credit spreads widening. Banking stress. Volatility regime shift. Monetary policy posture. The client had lived through it professionally, and they arrived with a very specific specification for what a "macro meltdown dial" should look like — the five phases, the threshold bands, what a user in each band ought to do.
Our job was the macro risk engine underneath. It pulls from central-bank economic data, credit spreads, banking-stress indicators, and volatility regimes, and distills all of it into a single 0–100 probability with a clear phase label and the matching recommendation the client had already defined for each band.
One dial, five phases. Each phase carries its own directional recommendation, including a buy signal during bottoming — the phase most systems leave unspoken.
Most risk systems stop at a sell signal. The client insisted this one not stop there. The phase they cared about most was the bottoming window — the pattern where risk is still elevated but falling, and monetary conditions have flipped. Backtested against 2007–2009, the dial calls a sell well before the bottom and a re-entry near it — which, historically, has preceded the 200–400% recoveries that make sector veterans rich while everyone else is paralyzed.
Our side of that was making sure the math didn't get in the way of the framing. The point, as the client put it, isn't to predict the future. It's to make sure their users are never the last ones holding the bag, and never the last ones back in.
Signal #3 — valuation that knows what a peer is
Producers and juniors live in two very different valuation universes. The client was adamant the report treat them that way — the worst thing a sector tool could do, they said, was apply the same lens to both.
For producers, the brief asked us to project annual production value across metal-price scenarios (+20%, +50%, +100%), so an investor sees instantly what the same mine looks like at $3,000 gold, $5,000 gold, or $8,000 gold. That's the "if my commodity thesis is right, what do I actually own?" question, answered in one glance.
For juniors with no production yet, the lens flips. The metric the client wanted is the one sector analysts have used forever — dollars per ounce in the ground, compared against direct sector peers.
A junior priced at $40/oz in a sector averaging $250/oz is a different story than one priced at $400/oz. That comparison, made in one glance, is the point.
A junior priced at $40/oz in a sector averaging $250/oz is a very different story than one priced at $400/oz. The report puts that comparison in the front seat. Every deposit is contextualized against direct peers on grade, AISC, balance sheet, and jurisdiction — so the comparison is real, not cosmetic.
The filter that saved the management section
Early in the build, the client caught a problem and flagged it to us bluntly: the extraction pipeline kept pulling the auditor from Deloitte and outside legal counsel into the management team and presenting them as executives. See a law-firm partner listed as CEO in a diligence report once, and the whole report loses credibility, even when everything else is right. They were right to push us on it, hard.
The fix is the piece of engineering on this project we're most proud of — a three-layer defense we designed and shipped from the ground up. Each layer does one job; each layer is boring; together they are impossible to slip past.
Three layers, each cheaply defeating one class of failure. The rejection log at layer two taught us more than the extractions themselves.
Layer one is prompt-level prevention — an explicit do-not-include list with examples of the service providers that used to slip through. Cheapest stage, catches most.
Layer two is a structural validator. It runs every extracted entry against a whitelist of genuine executive titles and a blacklist of patterns that indicate an external service provider. Names get checked for format violations — no commas inside names, no email-shaped structures, no all-caps strings that look like company names, not people. Every rejection is logged with its reason, and the rejection log is honestly where most of our real learning about the space came from.
Layer three is the safety net. If fewer than three valid executives survive — which happens with sparse or older filings — a second, tightly scoped call targets only the missing key roles: chief executive, chief financial, chief operating, chairman. Anything pulled in that way needs to clear a confidence threshold and is labeled transparently in the rendered report. We don't smuggle supplemented data past the reader.
The result: the management section is now something the client's users rely on without checking. Support tickets about mislabeled executives went to zero.
What it looks like from the engineering side
The headline fits on a napkin: take a ticker, return a 50-section report in under two minutes. The part that doesn't fit on a napkin is the list of constraints that sentence hides.
Over thirty heterogeneous data sources. LLM calls at multiple layers — extraction, classification, validation, narrative. Sector-specific analytical frameworks with custom math. A user-facing latency budget south of two minutes. And enough reliability to shrug off outages, rate limits, partial data, and the occasional malformed filing.
We built it in four tiers. The web front door takes requests. A background worker does the actual work. A data layer orchestrates the dozens of parallel external calls. A templating layer renders the final report.
Four tiers, one job. The front door never blocks. The data layer fans out. The cache pays for itself weekly.
Two decisions here are worth calling out, because they punch well above their weight.
The front door stays dumb. It validates the ticker, writes a job record, kicks off a worker, and returns a job ID. That's it. The browser then polls a status endpoint every three seconds. Everything expensive happens out of the request–response path. That separation is what lets us hold a two-minute generation time without ever showing a frozen page.
Modules are independent. Every analysis module can be called on its own, tested in a REPL, and cached in isolation. No module assumes another has run. Every module returns a well-defined shape — even on failure, even on a rate-limited upstream. Graceful degradation is a feature of the platform, not a bolted-on afterthought. When a social API flaps, the report still ships; the sentiment section just renders "data unavailable" for that source, and the composite score rebalances its weights.
Parallelization happens at the orchestrator with a thread pool. Most modules are I/O-bound, which is nearly ideal for that model. The pre-parallelization baseline was around six minutes per report. Fan-out got it under two. That single change did more for perceived quality than any UI work we've ever shipped.
The cache nobody sees
A naive version re-hits every API on every request. That's unaffordable, latency-expensive, and frankly rude to upstream providers. So we put a document-store-backed cache in front of every external call, with per-domain TTLs.
Lifecycle classifications expire in weeks because lifecycle stages change slowly. Production data expires in months because the source is annual reports. Sentiment and prices stay intraday because they must be fresh. Executive extractions persist for weeks because leadership turnover is rare and the LLM work is expensive.
The trick that matters most: every cache entry is keyed not just by subject and type, but by a schema version. Bump the version, every stale entry gets invalidated globally — without touching the bits the change didn't affect. Cache versioning as a deployment primitive is deeply underrated. We use it routinely, and we wouldn't operate without it.
# A single field change looks like this:
CACHE_SCHEMA_VERSIONS = {
"management_extraction": 7, # ← bump to invalidate
"lifecycle_classification": 3,
"breakout_detection": 4,
# ...
}
def cache_key(subject: str, kind: str) -> str:
return f"{kind}:v{CACHE_SCHEMA_VERSIONS[kind]}:{subject}"Structured LLM output, or nothing
Early versions of the pipeline parsed free-form model responses with regex, and broke in ways that were miserable to diagnose — one malformed field, five rendering steps later, a blank section in production.
The commit that fixed it was simple and absolute: every LLM-facing task declares a typed schema up front, and the model is constrained to conform. Invalid responses fail fast and retry cleanly. No silent corruption downstream.
Two small details move this approach from "usually works" to "boring and reliable":
- Default missing numerics to zero, not null. The model is less likely to hallucinate a plausible-looking wrong value when "unknown" is a valid answer.
- Default collection fields to empty lists. Downstream template code stops needing null checks, and a whole class of bugs disappears.
Small things. Boring things. Together, they moved the pipeline from "usually works" to "a thing you stop worrying about" — which is exactly where infrastructure is supposed to live.
Hybrid rules + LLM, in the right proportions
The lifecycle classifier is the cleanest example of a pattern we now use everywhere.
A rules-based pre-pass extracts deterministic features from already-fetched data — does this company have production? are they actively drilling? are they in financing? are they operating a mine? — from structured fields we already trust. We don't want an LLM guessing at facts we can already see.
The LLM then gets those pre-signals alongside a narrative summary, and emits a single structured output: the lifecycle stage, a confidence score, and a plain-English rationale the report renders directly.
That division of labor keeps each tool doing what it's best at. Rules do consistent feature extraction. The LLM does synthesis, narrative, and the edge-case judgment that hand-written rules will never fully cover. Neither tool stretches beyond its strength. And every stage, its signal direction, the color it uses, the icon in the curve, the tooltip copy — all of it lives in a single authoritative data structure. The curve visualization, the stage chip, the signal badge, and the tooltip all read from the same source. One edit propagates everywhere.
Detection is cheap — decision-ready is the work
One product principle from the client shaped the build more than any single feature.
A pattern-based breakout detector is not hard. Plenty of platforms have one. What almost none of them have is what happens after the breakout. Detection without a target is just a notification.
The breakout module pairs detection with pattern-specific target math. Consolidation patterns use the range between resistance and support. Triangles use the base width. Flags estimate a flagpole. Cup-and-handle uses cup depth. The minimum target is breakout price plus (or minus) the pattern height. Conservative and aggressive targets layer in historical resistance, long-range highs and lows, and extension levels from the recent price range.
Then the live piece: on every refresh, the current price is re-evaluated against the targets, and the system emits a status — in-progress, minimum target reached, extended target achieved, or failed breakout — with a matching recommendation.
A breakout alert is a notification.
A breakout alert with minimum, conservative, and aggressive targets plus a live status is a decision.
That distinction — detection versus decision — became the phrase the whole build orbited around. The client said it early; we mounted it above our desks. Detection is cheap; decision-ready output is what takes the work. Once you've internalized that, every feature debate answers itself.
Two voices, same data — the client's call
One product decision the client pushed hard for, late in the build, turned out to be a quiet winner: every section in the report carries two layers of commentary. A plain-English voice for newer investors, and an expert voice for sector professionals. Same data, two readers, one report.
Honestly, it sounded like scope creep when they first brought it up. It wasn't. Writing the two voices for every section took real time, but it is repeatedly the single most-cited feature in user feedback. New investors stop feeling like the report is written for someone else; professionals stop feeling like the report is condescending. The dual voice is a bridge, and the bridge is worth more than either side. Credit where credit is due: we wouldn't have proposed it. They did.
What's working
The metric the client cares about is retention — their users coming back, week after week, to generate the next report. That's been the clearest signal the product is doing its job, and watching it tick up has been the best feedback loop on our build.
The engineering side effects have been just as good. Support tickets about mislabeled executives went to zero after the three-layer filter landed. Generation time cut by two-thirds after parallel fan-out. Zero downtime caused by cache-schema drift since versioned cache keys went in. Every "boring and reliable" piece of work has paid for itself.
The referrals coming back to our client show up in surprising places. Family-office analysts recommending the report to friends-of-friends who run small positions. Sector operators running the tool against their own neighbors to gut-check what they already know. Self-directed investors saying, more or less verbatim: this is the first tool that doesn't make me feel underprepared against the institutional desks.
That sentence is basically the client's original product vision in disguise. Hearing it come back from their users was the moment we all knew we'd landed the build right.
Who the platform is for
The client defined three audiences from day one, in this order of priority:
- Self-directed investors with real positions in the sector — people making five- and six-figure allocation decisions who want institutional-grade diligence without a thirty-thousand-dollar-a-year data terminal.
- Family offices and boutique funds — teams that need to cover a universe of mining names quickly and want consistent, auditable analytical frameworks across every position.
- Sector professionals — operators, bankers, and consultants who want a fast external read on any company in the space, generated the same way every time.
That prioritization shaped every engineering tradeoff we made. Whenever a shiny feature went up against a boring reliability investment, we picked boring — because the user who clicks "generate" for the fourth week in a row is the user this was built for.
What the roadmap looks like
The core architecture has room to grow, and the client's roadmap is already shaping the next cycle of work:
- Streaming partial results — render sections as they complete, so users don't wait on the slowest module.
- Per-user portfolios — run the full engine across a holdings list, not just individual tickers.
- Change-driven alerts — re-run on a schedule, diff against the previous report, notify only when a signal has meaningfully shifted.
- Broader sector coverage — the framework generalizes, and mining was the first vertical, not the last.
Each of those is a client-led thesis about where their users go next. Our side of the table is wiring it into a codebase built from day one to absorb that kind of growth without rewriting.
When the product thesis is this sharp, the engineering job is to disappear into it. Not to invent ideas — to make sure the client's ideas land, every section of the report, every generation, every time. That's the build we wanted, and that's the build we shipped.
