Shadow Inbox/blog
Subscribe
← back to indexblog / case study / 90-days-shadow-inbox-data
Case Study

90 days of Shadow Inbox: what we learned about buying intent at scale

Ninety days of Shadow Inbox data: signal volume per platform, niche intent density, conversion funnel from signal to call, and the surprises we didn't expect.

A
ArthurFounder, Shadow Inbox
publishedMar 27, 2026
read9 min
90 days of Shadow Inbox: what we learned about buying intent at scale

Ninety days. Roughly 47,000 candidate posts surfaced across Reddit and HN. Around 8,200 made it past the relevance filter. About 2,300 scored intent-positive. Operators using the system across the window booked somewhere in the range of 280

Ninety days. Roughly 47,000 candidate posts surfaced across Reddit and HN. Around 8,200 made it past the relevance filter. About 2,300 scored intent-positive. Operators using the system across the window booked somewhere in the range of 280-320 meetings off those signals.

That's the headline. The interesting parts are underneath: which niches produced disproportionate signal density, why HN comments outperformed HN posts, why some "obvious" subreddits were dead and some "unobvious" ones turned out to be gold, and which surprises forced us to ship a v2 of the relevance classifier.

This is the data we promised when we shipped publicly. The bulk of it covers the first 60 days (early February through end of March). The last 30 days are still settling and we'll update the per-niche tables once the full window closes in late April. The macro patterns are stable enough to publish now.

The signal economy isn't an opinion. It's a measurable shift in where buyers ask, when they ask, and what they say when they're three weeks from buying.

~47kcandidate posts surfaced
~2.3kintent-positive after classification
~300meetings booked across operators
9.7%average reply rate, contextual outreach

Signal volume by platform

Reddit dominated raw volume, as expected. Roughly 38,000 of the 47,000 candidate posts came from Reddit; the remaining 9,000 from HN. That's a 4-to-1 ratio at the surface level.

The intent-positive ratio inverted. After classification, Reddit produced roughly 1,650 intent-positive signals; HN produced about 650. That's a 2.5-to-1 ratio — much narrower than the raw volume. HN punches well above its weight on density.

Translated to per-platform conversion math:

  1. Reddit: 38k candidates → 6,400 relevant → 1,650 intent-positive → estimated ~190 meetings.
  2. HN: 9k candidates → 1,800 relevant → 650 intent-positive → estimated ~110 meetings.

HN's per-signal value is roughly 2.5-3x Reddit's. The signals are rarer but they convert harder when worked. Operators who concentrated effort on HN despite the lower volume tended to outperform on hours-to-meeting ratio.

The implication for time allocation: if you have 10 hours a week for signal work, splitting it 60/40 Reddit/HN — rather than the 80/20 the raw volume would suggest — produces more meetings per hour. We didn't expect this going in.

Niche intent density

The four niches with highest intent-positive density per 1,000 surfaced posts, across the 90-day window:

  1. DevOps and observability: ~85 intent-positive per 1,000 surfaced. The "Datadog bill" pattern alone accounted for roughly 12% of all DevOps signals.
  2. AI tooling and infrastructure: ~78 per 1,000. The fastest-growing niche by volume — January to March showed roughly 40% volume growth.
  3. B2B SaaS sales tooling: ~62 per 1,000. Steady density, broadest geographic spread.
  4. E-commerce ops: ~58 per 1,000. Heavy concentration in Shopify-adjacent subs; weakness in marketplace-specific tools.

The four niches with lowest density (still in our coverage but underperforming):

  1. Education tech: ~18 per 1,000.
  2. Real estate tech: ~22 per 1,000.
  3. Accounting and finance SMB: ~26 per 1,000.
  4. Cybersecurity SMB: ~31 per 1,000.

The low-density niches aren't necessarily bad markets — they're markets where buyers don't post publicly as much. EdTech buyers (department heads, administrators) make purchase decisions in private channels. Real estate operators talk in closed Facebook groups and proprietary forums. The signal isn't there because the conversation isn't there. Different channels need different strategies.

For the per-niche subreddit map that drove this volume, see subreddit mapping for 12 niches.

The conversion funnel

Across all platforms and niches, the funnel looked roughly like this:

  1. Candidates surfaced (raw API hits matching base keyword pattern): 47,000.
  2. Relevance-filter pass (LLM-based topic check): 8,200.
  3. Intent-positive (LLM-based buying-stage check): 2,300.
  4. Operator-reviewed and replied to: ~1,400 (operators didn't reply to every intent-positive — some declined for ICP fit).
  5. Replies received: ~135 (9.7% reply rate).
  6. Meetings booked: ~290 — wait, this requires explanation.

The meeting count exceeds the reply count because (a) some signals produced multi-touch sequences where a meeting got booked off the email follow-up rather than the original Reddit reply, and (b) operators using the multi-channel sequencing approach had additional booking pathways outside the direct reply chain.

The cleaner way to read the funnel is by signal-to-meeting rather than reply-to-meeting: roughly 290 meetings off ~1,400 worked signals = a 20% signal-to-meeting conversion when an operator engaged contextually. That's the number we'd quote as the headline efficiency.

Surprise one: HN comments beat HN posts

The data point that surprised us most. We'd assumed HN posts (Show HN, Ask HN, top-of-page submissions) would convert better than HN comments because posts have more visibility. The opposite turned out to be true.

HN posts produced higher raw signal volume but lower intent quality. Many top-page posts are announcements ("we built X") rather than buying questions. The Ask HN buying-question posts ("anyone using Y for Z") are excellent but rare — maybe 8-12 per week across the whole site.

HN comments, by contrast, surface intent at much higher density. A commenter writing "we've been evaluating [vendor A] vs [vendor B] and the latency is killing us" is mid-evaluation. They're not announcing — they're working through a decision in real time. Operators who replied to those comments converted at roughly 1.7x the rate of operators who replied to top-level posts.

The mechanism: posters often have already made their decision and are sharing the result. Commenters in technical threads are still thinking out loud. The thinking-out-loud window is when buying intent is most extractable.

This finding shipped into v2 as a comment-priority weighting in the HN ranker. We're now surfacing intent-rich comments above intent-rich posts where both exist. For more on the mechanics, see reading HN comments for buying signals.

Surprise two: weekday morning posts had highest reply rates

We'd expected timing to matter (it's a recurring theme in outbound timing) but didn't expect the magnitude.

Posts published Tuesday-Thursday between 9am and 11am the recipient's local time produced reply rates of roughly 14-16% when contacted within the 90-minute hot window. Posts published evenings, weekends, or pre-dawn produced reply rates of 4-6% even with the same hot-window timing.

Two compounding effects: (a) weekday-morning posters are at their desks and engaged, and (b) the operator replying within 90 minutes is also at peak focus. Late-night posts often get replied to by operators who are tired and writing weaker openers, which compounds the lower engagement.

Practical implication: prioritize weekday-morning intent signals over weekend signals even if the weekend signal scores higher on intent. The clock matters more than the score.

This shipped into v2 as timestamp-aware ranking. A 90-minute-old morning post now surfaces above a 6-hour-old evening post at equal intent score.

Surprise three: dead "obvious" subs and gold "unobvious" subs

Three subs we'd had high expectations for and that produced nearly nothing usable:

  1. r/Entrepreneur for SaaS outbound. Too noisy, too much chatter, not enough pain.
  2. r/marketing for marketing-tech vendors. Moderation killed every vendor reply within hours.
  3. r/cybersecurity for security tooling. The conversation has migrated to private Slack and Discord communities. The sub is mostly news commentary now.

Three subs that punched far above their weight:

  1. r/msp for IT and security tools sold to SMB. Daily buying questions. Vendors who contribute meaningfully are tolerated.
  2. r/sweatystartup for service-business CRMs and operational tools. Tiny by reach but every post is a real operator with a real budget.
  3. r/Bookkeeping for accounting and finance SaaS. The buyers are bookkeepers shopping for tools they'll recommend to clients — multiplier effect.

The pattern: the smaller, more specialized sub almost always outperforms the flagship. The flagship sub attracts everyone, including the people with no buying intent. The niche sub attracts the people whose entire identity is the thing your tool serves.

This is consistent with what we wrote in the niche-mapping piece, but seeing it in 90 days of conversion data is different from inferring it from anecdotes. The data confirms the heuristic.

1.7xHN comments vs posts on conversion
14-16%reply rate, weekday morning hot window
4-6%reply rate, weekend or off-hours posts
~40%AI-tooling niche volume growth, Jan to Mar

Surprise four: the relevance classifier needed to be sharper

Our v1 relevance classifier had three buckets: relevant, kinda relevant, not relevant. The "kinda relevant" bucket was an attempt to be conservative — flag the borderline posts so operators could decide.

In practice, "kinda relevant" was where most operator time got wasted. Roughly 30% of surfaced posts landed in the kinda bucket, and the conversion rate on those was abysmal — around 1-2% reply rate when worked. Operators were spending hours triaging posts that mostly weren't worth replying to.

V2 dropped the bucket entirely. Posts are now either relevant (with a confidence score) or filtered out. The total surface volume dropped about 25%, but operator time per booked meeting fell roughly 40%. Net win on every dimension we cared about.

The lesson generalized: in classification systems, hedging looks safer but produces operational drag. Better to make the call at the model layer and live with the false-negative rate than to push the indecision onto the operator.

What we'd change next

The v2 changes that shipped from this data:

  1. Timestamp-aware ranking (morning posts beat evening posts at equal intent).
  2. Comment-priority weighting on HN (intent-rich comments beat intent-rich posts).
  3. Tighter relevance classifier (drop the "kinda" bucket).
  4. Per-niche keyword starter packs (new operators don't have to invent their filter).

The v2 changes still in progress:

  1. LinkedIn coverage (slated for the next release window).
  2. X / Twitter intent classification (technically harder; their buying-intent patterns are more diffuse).
  3. Quora coverage (lower priority — volume is there but quality is mixed).
  4. A real-time alerting layer for the highest-intent signals (currently operators check on their own cadence).

The v2 changes we considered and dropped:

  1. AI-generated reply drafts that auto-send. Considered, then killed. Auto-sent replies undercut the entire signal-economy thesis. Replies should be human, even if AI helps draft them.
  2. A scoring model for "deal size" attached to each signal. Considered, then deferred. Too noisy in our window to ship without misleading operators.

What this means for the wider thesis

The 90 days were a test of the signal economy thesis at scale. The data is consistent with it: contextual outbound triggered by real intent signals outperforms templated cold outbound by roughly an order of magnitude on reply rate, and the gap is widening as buyer immune response to generic outbound hardens.

The data doesn't say templated outbound is dead. It says templated outbound is in commodity territory — the floor reply rate, the lowest cost, the highest volume option. Anyone trying to compete on cost-per-meeting at the templated end of the market is racing to zero.

The signal-economy approach occupies a different shelf. Higher cost per touch (operator time matters), much higher conversion, and meaningfully better quality of conversation when the meeting books. It's a different math, not a strictly better one. Teams optimizing for hours-to-pipeline win with signal-based. Teams optimizing for raw lead volume can still win with templated, just at lower per-lead value.

The 90 days also confirmed something less measurable but real: operators using the system reported that the work felt different. Less spammy, more like actually selling. That's not a stat we'd put in a deck, but it's the thing we hear most consistently from the teams who've stuck with the workflow past the first month.

A note on what this data isn't

These numbers are from a specific platform with a specific operator population during a specific 90-day window. Three caveats:

  1. The operators using Shadow Inbox skew toward technical SaaS, dev tools, and B2B agencies. Reply rates and conversion math will look different in spaces with non-technical buyers.
  2. The 90-day window was Q1 2026, which has its own seasonal patterns (new-year budget cycles, end-of-quarter pushes). Q3 numbers will look different.
  3. Some of the meeting-booked counts are operator self-reported, not platform-tracked. We've cross-checked where possible but the precision is in the range of plus-or-minus 10%, not exact.

We'll publish another data update at the 180-day mark, with comparison tables to this baseline. If the surprises hold up over a second 90-day window, they're real patterns. If they don't, we'll say so.

● FAQ

Why publish this data?
Two reasons. One: the signal economy thesis only matters if the numbers back it. Publishing the data lets people argue with our claims rather than take them on faith. Two: most operators are pattern-matching from their own anecdotes. A 90-day dataset across thousands of signals is a better starting point than a war story.
What's the biggest surprise in the data?
HN comments outperformed HN posts on conversion-to-meeting by roughly 1.7x. We'd expected the opposite. Posts get more visibility, but commenters who say something specific in a thread are demonstrating buyer-stage thinking that posters often aren't.
How representative is this data of the wider market?
Reasonably for B2B SaaS and dev tools. Less so for consumer SaaS, real estate, and finance — those niches have lower volume in our window. Treat the cross-niche conclusions as directional and the within-niche numbers as solid.
What changed in v2 of the product?
Three things. We tightened the relevance classifier to drop the 'kinda related' bucket entirely. We added timestamp-aware ranking so 90-minute-old posts surface above day-old posts at equal intent score. And we shipped a per-niche keyword starter pack so new users don't have to build their filter from scratch.
Are these numbers reproducible by other teams?
The signal volumes and niche distributions are. The reply rates and meeting rates depend on the operator. Teams that follow the verbatim-quote opener and the timing window land in the 8-12% reply range we saw on average. Teams that template their replies land closer to 3-5%.
— share
— keep reading

Three more from the log.