📖 Vocabulary rarity analysis
Hapax legomena counter
Paste a passage to count words that appear exactly once, measure hapax ratio, type-token ratio, vocabulary richness, and inspect the rare word list after filters.
Load a realistic sample, then adjust token rules, stopword exclusion, minimum word length, and rare-word display settings.
DISCLOSURE: This post may contain affiliate links, meaning when you click the links and make a purchase, I receive a commission. As an Amazon Associate I earn from qualifying purchases.
This table updates with the once-used words that survive your selected stopword, length, case, hyphen, apostrophe, and number filters.
| # | Hapax word | Length | First position | Segment | Context |
|---|---|---|---|---|---|
| Load a preset or paste text to see rare words. | |||||
| Rank | Frequent word | Count | Token share | Repeat class |
|---|---|---|---|---|
| Frequency table appears after calculation. | ||||
| Hapax band | Hapax ratio | Typical reading | Editing signal |
|---|---|---|---|
| Lean | 0-28% | Vocabulary is reused heavily. | Check repeated phrasing or narrow topic terms. |
| Balanced | 29-45% | Common for clear prose samples. | Usually a stable mix of anchors and variety. |
| Rich | 46-62% | Many one-off terms add texture. | Confirm rare words are intentional and clear. |
| Sparse | 63%+ | Very many single-use terms. | Compare against length, genre, and filtering. |
| Text type | Expected TTR pattern | Hapax behavior | Filter suggestion |
|---|---|---|---|
| Short excerpt | Often high because sample is small. | Can look inflated. | Compare equal word counts. |
| Fiction scene | Names and setting words affect the count. | Proper nouns may dominate. | Add names as custom stops if needed. |
| Academic abstract | Technical terms repeat by design. | Lower hapax can be normal. | Keep key terms unless testing style only. |
| Poetry or notes | Compact language can raise TTR. | Rare words may be structural. | Review context before cutting. |
| Control | What it changes | Best when | Risk if changed mid-project |
|---|---|---|---|
| Minimum length | Removes short tokens before counting. | Ignoring tiny function words. | Ratios shift sharply in short text. |
| Stopword set | Excludes common or editorial terms. | Comparing content vocabulary. | May hide repeated structure words. |
| Case handling | Merges or separates capital variants. | Checking proper nouns or acronyms. | Preserve mode can overcount variants. |
| Hyphen mode | Treats compounds as one or many. | Technical or compound-heavy prose. | May change both N and V. |
In other words: It’s a counter to measure words used exactly once in any passage of writing. It is a kind of hapax legomena. This allows you to see rare word patterns and how rich vocabulary is. You can also see the type-token ratio, the hapax ratio, and stopword effects.
Basically these are all the words that show up exactly once in whatever text you’re looking at. And if you’ve ever felt like reading something where you recognize everything but also feel like you don’t quite know what happened, it is because the balance between those single-use word has produced an effect. You will see that effect clearly with this tool on page.
What Are Words Used Only Once?
If a word appears only once in your text (a one-time use), that’s intriguing: either it indicates an innovation or precision, or shows that the author doesn’t repeat helpful anchor words. Depending on what rules you use for counting them, this may be a count of how often a word appeared, or how many times a given name or proper noun did. The latter would reflect the decorative part off the writing; the former the structure.
With stopwords included, the difference is huge. Leave in all of the “ands” and “the”s and your counts remains small. Remove them and you’ll be able to see just how much content words drive this thing. The balanced-looking passage may now appear full of unique character names/technical terms. Run both sets and consider what best serves your need.
The same goes for length filters. Generally speaking, the shorter your word, the more likely it’s a glue word and not meaningful. By increasing minimum length, you increase the hapax ratio by focusing on substantive vocabulary. But take this too high and you may throw out some useful short verbs. There is no magic number; consistency trumps all.
To get good data, compare two passage of similar length that have been filtered the same way. A high number of unique words makes writers anxious: Am I losing my reader? Is it true sometimes? Yes. But more frequently, the topic simply demand variation (a review that identifies subtler shades; a poem listing varieties of rain).
The type-token ratio, plus hapax share and repetition balance, combine to give us the richness score. This tells you whether your draft leans toward repetition or variety. Normally, when you have a strong piece of writing, you’ll find some thematic anchors but also allow for some one-off terms to sneak in there. By doing so, it add rhythm to what writer is reading. You can visualize that in the frequency contrast table which lines up your list of repeated words along with your list of single-use words. It lets you know how much repetition versus rarity there is.
Numbers don’t represent taste; they’re merely a way for you to better envision your options. A single occurrence of a word could be an exact fit, or it could be a sad omission when consider against its earlier counterpart. But at that point, it’s still up to us. If your draft is overly ornamented, if it’s just flat, run it through this thing. Perhaps what you called “repetition” was realy “nervous circling.” Or maybe what you thought was “variety” was mostly weather words and proper nouns.
Who cares about the numbers? What matters is the clearer perspective you have when you look back at your sentence: the stuff you should of keep because it serves your purpose, and the stuff you shouldn’t.

