Conversion options
All options are set on Builder before calling .build().
Preset
Builder::with_preset(preset) configures a coherent set of defaults:
Individual options below override the preset.
Segmentation strategy
Lattice finds the globally optimal segmentation using dynamic programming.
Eager is a greedy left-to-right longest-match; faster but less accurate for
compound words.
Numeral handling
NumeralStrategy controls how hanja numeral characters such as 二〇一六 are
rendered. Chinese-style numerals can represent numbers in positional or
additive notation depending on context:
Smart chooses positional notation for year-like four-digit sequences and
additive notation for quantities; use it for general-purpose documents.
Initial sound law
Applies the South Korean phonetic rule (頭音法則) to fallback readings for characters not found in any dictionary:
Homophone disambiguation window
Different hanja words can share the same hangul reading (for example, 連霸 and
連敗 are both 연패). In RenderMode::HangulOnly, Gukhanmun can keep the hanja
in parentheses for such words so readers can tell them apart.
homophone_window sets the scope across which a reading is considered
ambiguous:
Wider windows are appropriate for dense hanja texts where readings recur across many sections.
Homophone detection strategy
homophone_detection chooses which readings count as ambiguous within the
window:
ContextLocal keeps hangul-only output clean: a word is glossed only when the
surrounding text genuinely makes it ambiguous. DictionaryWide is broader, but
with a large reference dictionary such as the bundled Standard Korean Dictionary
nearly every common reading has some homophone, so it glosses most Sino-Korean
words. To always gloss a specific word regardless of context, use a
DirectiveAction::RequireHanja directive instead (see
User directives).
Only recognized words are disambiguated
Homophone disambiguation operates on words the dictionary recognizes as units.
A hanja sequence with no dictionary entry of its own is not treated as a single
word, and its fallback (non-dictionary) characters are never glossed; any
recognized single-character entries inside it (such as 紫) are still handled
on their own. For example, 自由 and 子游 are both bundled entries read
자유, so 自由와 子游 renders as 자유(自由)와 자유(子游); but 紫楡 has no
entry of its own, so under the default context-local strategy 自由와 紫楡
renders as 자유와 자유 with no gloss, because the engine never sees a second
자유 unit to collide with 自由. To disambiguate the whole term, add it to a
custom dictionary so the engine treats it as a single unit.
First-occurrence clearing window
When enabled, first-occurrence clearing stops annotating a hanja after its first occurrence within the window. This is useful for documents that introduce each character once and then use it freely; subsequent occurrences are left as plain hangul without parenthetical hanja.
Error recovery
Relevant for HTML conversion; plain text and Markdown do not produce recoverable errors.