normalize

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

Package normalize provides text normalization utilities for embedding preprocessing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Normalize

func Normalize(text string, cfg Config) string

Normalize applies configured transformations to the input text. The order of operations is: Unicode → StripMarkdown → CompactSpace → Lowercase.

func Normalizer

func Normalizer(cfg Config) func(string) string

Normalizer returns a normalizing function configured with the given options. This is useful for passing to embedding options.

Types

type Config

type Config struct {
	Unicode       bool // Apply NFC Unicode normalization
	CompactSpace  bool // Collapse multiple whitespaces into single spaces
	StripMarkdown bool // Remove Markdown syntax symbols
	Lowercase     bool // Convert to lowercase
}

Config holds normalization options.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a reasonable default configuration.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL