Supported Languages
Complete list of languages supported by @visulima/content-safety for banned word detection.
Last updated:
Supported Languages
@visulima/content-safety includes banned word lists for 19 languages, covering major languages from around the world. All languages are checked simultaneously when analyzing text.
Language Coverage
European Languages
English (en)
- Script: Latin
- Coverage: Comprehensive profanity, slurs, hate speech
- Variants: Includes leet-speak (b1tch, etc.)
- Multi-word phrases: Yes (e.g., "white trash")
German (de)
- Script: Latin (with umlauts)
- Coverage: German profanity and slurs
- Regional variants: Standard German
Spanish (es)
- Script: Latin (with diacritics)
- Coverage: Spanish profanity and slurs
- Regional variants: Multiple Spanish-speaking regions
French (fr)
- Script: Latin (with diacritics)
- Coverage: French profanity and slurs
- Regional variants: European French
Italian (it)
- Script: Latin
- Coverage: Italian profanity and slurs
Dutch (nl)
- Script: Latin
- Coverage: Dutch profanity and slurs
Polish (pl)
- Script: Latin (with Polish diacritics)
- Coverage: Polish profanity and slurs
Portuguese (pt)
- Script: Latin (with diacritics)
- Coverage: Portuguese profanity and slurs
- Regional variants: Both European and Brazilian Portuguese
Swedish (sv)
- Script: Latin (with Swedish characters)
- Coverage: Swedish profanity and slurs
Russian (ru)
- Script: Cyrillic
- Coverage: Russian profanity and slurs
- Unicode support: Full Cyrillic character handling
Irish (ga)
- Script: Latin
- Coverage: Irish profanity and slurs
Middle Eastern Languages
Arabic (ar)
- Script: Arabic (RTL)
- Coverage: Arabic profanity and slurs
- Script features: Right-to-left text support
- Regional variants: Modern Standard Arabic
Persian/Farsi (fa)
- Script: Perso-Arabic (RTL)
- Coverage: Persian profanity and slurs
- Script features: Right-to-left text support
Turkish (tr)
- Script: Latin (with Turkish-specific letters)
- Coverage: Turkish profanity and slurs
Azerbaijani (az)
- Script: Latin
- Coverage: Azerbaijani profanity and slurs
Asian Languages
Japanese (ja)
- Script: Mixed (Hiragana, Katakana, Kanji)
- Coverage: Japanese profanity and slurs
- Word boundaries: No word boundaries (CJK handling)
Korean (ko)
- Script: Hangul
- Coverage: Korean profanity and slurs
- Word boundaries: No word boundaries (CJK handling)
Chinese (zh)
- Script: Simplified and Traditional Chinese
- Coverage: Chinese profanity and slurs
- Word boundaries: No word boundaries (CJK handling)
Hindi (hi)
- Script: Devanagari
- Coverage: Hindi profanity and slurs
- Script features: Complex script support
Language Detection
Automatic Multi-Language Checking
The library automatically checks all languages simultaneously:
import { checkBannedWords } from "@visulima/content-safety";
// Mixed language text - all languages checked
const result = checkBannedWords(`
English bad word
German bad word
Japanese bad word
`);
// All languages will be detected
console.log(result.matches.map((m) => m.language));
// ['en', 'de', 'ja']Language Attribution
Each match includes the language code it was detected from:
const result = checkBannedWords("text with badword");
result.matches.forEach((match) => {
console.log(`Found "${match.word}" from language: ${match.language}`);
});Script Support
Latin Scripts
Standard Latin alphabet with full diacritic support:
- English, Spanish, French, Portuguese, etc.
- Accents: é, ñ, ü, ø, etc.
Cyrillic
Full support for Cyrillic script:
- Russian (Ru): А-Я, а-я
- Unicode-aware word boundaries
Arabic Script
Right-to-left (RTL) text support:
- Arabic (ar): ا-ي
- Persian (fa): Perso-Arabic extensions
- Proper handling of RTL text direction
CJK (Chinese, Japanese, Korean)
Special handling for CJK scripts:
- No word boundaries required (continuous text)
- Character-level matching
- Support for mixed Hiragana, Katakana, Kanji, Hangul
Complex Scripts
Support for complex scripts with combining characters:
- Devanagari (Hindi)
- Thai
- Other Indic scripts
Unicode Normalization
All text is normalized to NFC (Canonical Composition) form for consistent matching:
// Both forms will match identically
checkBannedWords("café"); // NFC form
checkBannedWords("café"); // NFD form (combining accent)This ensures:
- Consistent matching across different Unicode representations
- Proper handling of accented characters
- Correct detection regardless of input normalization
Word Boundary Handling
Western Scripts (Latin, Cyrillic, Arabic, etc.)
Uses Unicode-aware word boundaries:
\p{L}- Unicode letter property\p{N}- Unicode number property- Respects word boundaries in all scripts
// Matches (standalone word)
checkBannedWords("badword here"); // ✓ detected
// May not match (part of compound word)
checkBannedWords("notabadwordhere"); // depends on word listCJK Scripts
No word boundaries applied:
- Characters can appear anywhere in continuous text
- More sensitive matching for CJK content
// CJK example
checkBannedWords("前badword後"); // ✓ detectedData Sources
The banned word lists are curated from multiple sources:
- Manual curation: Carefully reviewed lists
- google-profanity-words: Community-maintained lists
- safetext: Multi-language profanity database
Adding Custom Words
While the library doesn't support runtime customization, you can access the word lists:
import { BANNED_WORDS } from "@visulima/content-safety";
// View all languages
console.log(Object.keys(BANNED_WORDS));
// ['ar', 'az', 'de', 'en', 'es', 'fa', 'fr', 'ga', 'hi', 'it', 'ja', 'ko', 'nl', 'pl', 'pt', 'ru', 'sv', 'tr', 'zh']
// View English words (sensitive content - be careful)
console.log(BANNED_WORDS.en);Language Codes (ISO 639-1)
| Code | Language | Native Name |
|---|---|---|
| ar | Arabic | العربية |
| az | Azerbaijani | Azərbaycanca |
| de | German | Deutsch |
| en | English | English |
| es | Spanish | Español |
| fa | Persian | فارسی |
| fr | French | Français |
| ga | Irish | Gaeilge |
| hi | Hindi | हिन्दी |
| it | Italian | Italiano |
| ja | Japanese | 日本語 |
| ko | Korean | 한국어 |
| nl | Dutch | Nederlands |
| pl | Polish | Polski |
| pt | Portuguese | Português |
| ru | Russian | Русский |
| sv | Swedish | Svenska |
| tr | Turkish | Türkçe |
| zh | Chinese | 中文 |
Contributing Languages
To add support for additional languages:
- Create a new file:
src/words/{language-code}.ts - Export an array of banned words:
const words: readonly string[] = [ // word list here ]; export default words; - Add to
src/banned-words.ts:import newLang from "./words/new-lang"; export const BANNED_WORDS = { // ... "new-lang": newLang, }; - Submit a pull request to visulima/visulima
Next Steps
- Usage Guide - Learn how to use the library
- API Reference - Complete API documentation
- Installation - Setup instructions