The anatomy of a high-intent Reddit post: 10 signals we extract at Shadow Inbox
Ten extractable signals separate a buyer post from a vent post on Reddit. Here's how we score each one, the thresholds, and the false-positive traps.

A Reddit post that closes a deal looks different from a Reddit post that wastes your afternoon. The difference isn't gut feel — it's ten extractable signals that you can score with a regex, an embedding, or a single LLM call. We score every
A Reddit post that closes a deal looks different from a Reddit post that wastes your afternoon. The difference isn't gut feel — it's ten extractable signals that you can score with a regex, an embedding, or a single LLM call.
We score every post that hits Shadow Inbox against this anatomy. The top decile by score converts roughly 6x better than the median post in the same subreddit. The signals are the moat; the model is the conveyor belt.
Buying intent is not a vibe. It's ten observable features that compound into a number you can sort by.
The four signals that pull most of the weight.
Before the full ten, here are the four highest-precision features we extract. If a post hits all four, our internal eval shows a 73 percent likelihood the OP buys something in the category within 30 days.
Now the full ten, in order of how much weight they carry in our scoring rubric.
Signal 1: Explicit budget mention.
The OP writes a number with a currency symbol or a per-month qualifier. "Looking for something under $80/mo," "we have around 5k for this," "happy to spend up to $200 a seat."
Extraction: A regex pass for currency tokens (\$\d+, \d+/mo, \d+k) followed by an LLM verifier that checks if the number is the OP's budget vs a number quoted from somewhere else (a competitor's pricing, a benchmark stat). The verifier matters — naive regex flags 30% false positives.
Threshold: Any explicit budget gets +15 points in the rubric. Budget over $1K/mo gets +20.
Trap: OPs in r/Entrepreneur often quote prices they saw on a competitor's site as a complaint ("Salesforce wants $300/seat, that's insane"). That's not their budget, that's their anchor. The verifier catches this.
Signal 2: Competitor name + frustration verb.
The OP names a tool and a pain in the same sentence. "HubSpot is killing us on pricing." "Calendly's UI got worse." "Tried Notion, it's too slow for our team."
Extraction: We maintain a per-customer competitor list (built from the user's search profile + a one-time auto-suggest pass). We then look for any competitor token within 6 tokens of a frustration verb (hate, sucks, broken, killing, leaving, switching, alternative, replace).
Threshold: Each hit is +10. Multiple competitors named with frustration is capped at +20 (diminishing returns — past two complaints they're often just venting).
Trap: Past tense matters. "We used to use Mailchimp" is a switching signal. "We use Mailchimp" with no negative verb is a brag, not a buyer. The classifier prompt is instructed to weight tense.
Signal 3: Deadline language.
"Need to pick something by end of Q2." "Onboarding the team next week." "Trial ends Friday and I need a backup." Time pressure is one of the cleanest predictors of action.
Extraction: A small classifier looks for temporal phrases bound to a decision verb (need, have to, must, deciding, choosing). We extract the relative time horizon and convert it to a date.
Threshold: Deadline within 7 days: +15. Within 30 days: +10. "By end of year": +4.
Trap: "I'll get to this eventually" or "someday I want to" reads as deadline language to a naive classifier but is actually the opposite signal. We treat indefinite future markers as a -3 penalty.
Signal 4: Tools-already-tried list.
The OP lists 2 or more tools they've evaluated and rejected. This is the single strongest indicator that they've done the work and are ready to commit.
Extraction: Look for list patterns ("tried X, Y, and Z", "evaluated X but it's missing Y", "X is too A, Y is too B") and validate that each item is a real tool in the category, not a generic noun.
Threshold: 2 tools tried: +10. 3+: +14.
Trap: People list tools they've heard of without trying them. "I've heard about Salesforce, HubSpot, and Pipedrive" is research mode, not evaluation mode. Listen for verbs of actual usage (used, tried, ran, tested) vs verbs of awareness (heard of, seen, considering).
Signal 5: First-time-asker on this topic.
We check the OP's recent comment and post history for prior asks in the same category. If they've been asking about CRMs for 18 months in r/sales, they're a serial researcher. If this is their first CRM post in 2 years, they're transitioning from "we don't need one" to "we need one now."
Extraction: Pull the last 200 posts/comments from the OP. Embed each one. Cluster against the current post's category. If the new post is the first cluster member in 90+ days, flag as first-time-asker.
Threshold: First-time-asker: +8. Serial researcher (3+ posts in category in 90 days): -5.
Trap: Some buyers research thoroughly before posting publicly. They might have lurked for months and finally posted once. The signal is directional, not absolute, which is why the weight is modest.
Signal 6: Comment depth ratio.
The OP is engaging in the comments under their own post. They're replying to suggestions, asking follow-up questions, debating tradeoffs. That's a buyer doing the work in real time.
Extraction: After the post is 30 minutes old, count the OP's replies in their own thread. Divide by total comments. A ratio over 0.25 means the OP is highly engaged.
Threshold: Ratio >0.25: +10. >0.4: +14. Ratio of 0 after 2+ hours with 5+ comments existing: -8 (OP has ghosted).
Trap: Engagement can be defensive. Some OPs reply only to argue with everyone. We have a small classifier on the OP's reply tone — combative replies are weighted half.
Signal 7: Weekday morning post timing.
Posts written between 7am and 11am Monday-Thursday in the OP's likely timezone correlate strongly with work-context posts. Posts at 11pm on Saturday correlate with hobby projects, side hustles, or venting.
Extraction: Read the post timestamp, infer timezone from the OP's posting history pattern, bucket into time-of-day-and-week.
Threshold: Weekday 7-11am local: +6. Friday 4pm to Sunday 11pm: -3.
Trap: Asynchronous teams and remote workers post whenever. We never use this signal in isolation; it's a tiebreaker between two otherwise-equal posts.
Signal 8: Specific use case description.
A high-intent post tells you exactly what they want to do. "We're a 6-person agency running 4 client retainers and need a way to track hours per client." A low-intent post asks an abstract category question. "What's the best time-tracking tool?"
Extraction: We score the post's specificity by counting concrete nouns (team size, industry, current workflow, integrations needed). An LLM pass scores 0-5 on specificity.
Threshold: Specificity score 4-5: +8. Score 0-1: -4.
Trap: Specificity can mean someone has rigid requirements that no tool meets. It's a positive signal but not a guarantee. Pair with the tools-tried signal to confirm they're actually buying, not designing the perfect tool in their head.
Signal 9: Account age and karma sanity.
A 4-day-old account with 12 karma asking about enterprise tools is a throwaway, a sock puppet, or a competitor doing recon. A 6-year-old account with 8K karma is a real human.
Extraction: Pull account creation date and total karma at scan time. Bucket into age cohorts.
Threshold: Account >2 years old, karma >500: +5. Account <30 days: -10. Account <7 days: -25 (effectively excluded).
Trap: New legitimate accounts exist. People create burner accounts to ask sensitive questions ("how do I leave HubSpot when our CTO loves it"). We flag young accounts but let the operator override. About 1 in 30 young-account posts converts.
Signal 10: Subreddit-specific weight.
Same post, different subreddit, different score. A "looking for a CRM" post in r/sales is worth roughly 1.4x the same post in r/Entrepreneur because the audience is more pre-qualified.
Extraction: Per-subreddit multiplier set during onboarding and tuned based on the operator's accept/reject feedback after 2 weeks.
Threshold: Multipliers range from 0.5x (r/smallbusiness — too horizontal) to 1.4x (r/sales for sales tools). r/SaaS sits at 1.0x as the baseline.
Trap: Multipliers can mask genuine signal in unloved subreddits. We re-evaluate them every 60 days against the actual close-rate data per subreddit.
How the scoring rolls up.
The ten signals plug into a 100-point base score, then get multiplied by the subreddit weight and modified by the LLM intent classifier verdict. The shape:
const baseScore =
budgetSignal + // 0-20
competitorFrustration + // 0-20
deadlineSignal + // 0-15
toolsTried + // 0-14
firstTimeAsker + // -5 to +8
commentDepth + // -8 to +14
timing + // -3 to +6
specificity + // -4 to +8
accountSanity + // -25 to +5
intentClassBonus; // 0-40 from LLM verdict
const finalScore = baseScore * subredditMultiplier;A post over 65 lights up red in the dashboard. Over 80 triggers the immediate alert. The full pipeline this slots into is documented in the Reddit playbook, and the LLM classifier that produces the intent class bonus is broken down in our LLM classification piece.
Why we don't use upvotes, awards, or post length.
Three features we tested and dropped:
Upvotes. Correlates with entertainment value. Funny posts about HubSpot pricing rack up upvotes from people who will never buy a CRM.
Reddit awards. Mostly dead since the gold/silver economy changed. Even when they existed, they correlated with humor.
Post length. Naively, longer posts seemed more buyer-like. Turns out the longest posts are usually rants. The strongest predictor is specificity per word, not raw length.
We mention these because operators ask. They feel like signals; they aren't.
What this stops working when.
The ten signals are tuned for English-language posts in B2B-ish subreddits. They degrade when:
- The subreddit is non-English (we have a separate Spanish/Portuguese signal pack for r/empreendedorismo and similar).
- The category is consumer (the budget signal weight changes — consumers don't post per-month numbers).
- The OP is using a brand-new account intentionally (some buyers segregate work and personal Reddit, and the account-age penalty is unfair to them).
We retune signal weights per cohort every 60 days based on close-rate data. The thresholds in this piece are starting points, not commandments.
The signal extraction connects directly into enrichment workflows — once a post scores high, the username goes into the enrichment pipeline to convert it into a verified work email.
● FAQ
- Which signal is the most predictive of a closed deal?
- Competitor name plus a frustration verb is consistently our highest-precision single signal — about 71 percent in the last cohort we measured. The OP has already done the comparison work and disqualified the incumbent. Your job is just to show up before they re-Google.
- Why don't we use upvote count as a signal?
- Upvotes correlate with entertainment value, not buying intent. A funny rant about HubSpot pricing gets thousands of upvotes from people who will never buy a CRM. We tested it as a feature and dropped it — it added noise without lifting precision.
- How do we keep students and devs from polluting our buyer queue?
- We score account age, comment-to-post ratio, and whether the OP mentions a current employer or company name. A 3-month-old account with a 0.4 comment-to-post ratio asking about CRM features is almost always a student or dev evaluating, not a buyer. We weight those signals down 0.3x.
- Can we run these signals on subreddits Shadow Inbox doesn't cover by default?
- Yes — the signal extractors are subreddit-agnostic. The only thing that changes per subreddit is the score multiplier we apply on top, since r/SaaS posts skew toward real buyers and r/Entrepreneur posts skew toward wantrepreneurs. Add the subreddit, let signal volume accumulate for two weeks, then tune.
- Do these signals work on platforms other than Reddit?
- Most of them, yes. Budget mention, deadline language, and competitor frustration are channel-agnostic and we use them on HackerNews and Quora threads too. First-time-asker and comment depth ratio are Reddit-specific because they depend on Reddit's account and threading model.
Three more from the log.

Using LLMs to classify buying intent: what actually works
Classifying buying intent with LLMs is mostly about evidence quotation, temperature pinning, and not asking the model dumb questions. Here's what worked for us.
Dec 11, 2025 · 7 min
The signal economy: why intent beats volume in 2026
The signal economy: why intent-based outbound beats volume in 2026, the new operator stack, and where real-time intent graphs go from here.
Apr 09, 2026 · 12 min
The contextual cold message: how to reference a specific Reddit post without being creepy
Contextual cold message playbook: how to reference a specific Reddit post in outbound without being creepy, with two annotated samples and a four-part anatomy.
Dec 03, 2025 · 9 min