Conversion options
Pass these as properties of the GukhanmunOptions object to load().
Preset
preset selects a preconfigured set of defaults:
Unlike the Rust crate, the JavaScript packages never include the bundled
dictionary automatically; always pass it via dictionaries.
Segmentation strategy
segmentation controls how Gukhanmun finds word boundaries within hanja runs:
"lattice"(default): evaluates all dictionary matches at every position and selects the globally optimal segmentation using dynamic programming. Most accurate, especially for compound words and ambiguous boundaries."eager": greedy left-to-right longest-match. Faster, but may mis-segment compound words.
Prefer "eager" only when throughput matters more than accuracy.
Numeral handling
numerals controls how hanja numeral characters such as 二〇一六 are rendered.
Chinese-style numerals can represent numbers in multiple ways depending on
whether they encode positions or quantities:
"smart" chooses positional notation for year-like four-digit sequences and
additive notation for quantities; it is a good default for general-purpose
documents.
Initial sound law
The initial sound law (頭音法則) is a South Korean phonological rule that changes certain initial consonants at the start of a word. The rule applies to fallback readings for characters not found in any dictionary; dictionary entries already encode their correct readings.
Disable it for North Korean orthography ("ko-kp" preset) or when processing
text that follows North Korean spelling conventions.
Homophone disambiguation window
Different hanja words can share the same hangul reading (for example, 連霸 and
連敗 are both 연패). In "hangul-only" rendering mode, Gukhanmun can keep the
hanja in parentheses for such words so readers can tell them apart.
homophoneWindow sets the scope across which a reading is considered ambiguous:
Wider windows are appropriate for dense hanja texts where readings recur across many sections.
Homophone detection strategy
homophoneDetection chooses which readings count as ambiguous within the
window:
"context-local" keeps hangul-only output clean: a word is glossed only when
the surrounding text genuinely makes it ambiguous. "dictionary-wide" is
broader, but with a large reference dictionary such as the Standard Korean
Dictionary nearly every common reading has some homophone, so it glosses most
Sino-Korean words. To always gloss a specific word regardless of context, use
a requireHanja directive instead (see User directives).
Only recognized words are disambiguated
Homophone disambiguation operates on words the dictionary recognizes as units.
A hanja sequence with no dictionary entry of its own is not treated as a single
word, and its fallback (non-dictionary) characters are never glossed; any
recognized single-character entries inside it (such as 紫) are still handled
on their own. For example, with the Standard Korean Dictionary loaded, 自由
and 子游 are both entries read 자유, so 自由와 子游 renders as
자유(自由)와 자유(子游); but 紫楡 has no entry of its own, so under the
default context-local strategy 自由와 紫楡 renders as 자유와 자유 with no
gloss, because the engine never sees a second 자유 unit to collide with
自由. To disambiguate the whole term, add it to a
custom dictionary so the engine treats it as a single unit.
First-occurrence clearing window
When enabled, first-occurrence clearing stops annotating a hanja after its first occurrence within the window. This is useful for documents that introduce each character once and then use it freely; subsequent occurrences are left as plain hangul without parenthetical hanja.
Error recovery
recovery controls what happens when the HTML parser encounters markup it
cannot interpret. It has no effect for plain text or Markdown input.
Use "lenient" when processing HTML from external sources that may contain
fragments or non-standard markup; it skips problematic parts rather than
throwing a GukhanmunError.