Documentation
¶
Overview ¶
Package normalize provides text normalization utilities for embedding preprocessing.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Normalize ¶
Normalize applies configured transformations to the input text. The order of operations is: Unicode → StripMarkdown → CompactSpace → Lowercase.
func Normalizer ¶
Normalizer returns a normalizing function configured with the given options. This is useful for passing to embedding options.
Types ¶
type Config ¶
type Config struct {
Unicode bool // Apply NFC Unicode normalization
CompactSpace bool // Collapse multiple whitespaces into single spaces
StripMarkdown bool // Remove Markdown syntax symbols
Lowercase bool // Convert to lowercase
}
Config holds normalization options.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a reasonable default configuration.
Click to show internal directories.
Click to hide internal directories.