Indexing Non-English Content

Stork is not an inherently English-language-specific tool. If your content is in a different language, Stork can parse and index that content.

The only language-specific functionality of Stork is the stemmer: the functionality that lets you search for the word "tributaries" and get results for the word "tributary." In this case, Stork's stemmer will recognize "tributar" as the root of the word "tributary" and use that to improve search results.

This transformation from "tributaries" to "tributar" is English-specific. Stork supports stemming in the following languages:

  • Arabic
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Spanish
  • Swedish
  • Tamil
  • Turkish

The stemming language algorithm used for each file is determined by your Stork config file. With no language specified, Stork will default to English stemming. To change the global stemming algorithm, set it in your config file:

basic.toml (partial)
[input]
base_directory = "my_files/"
stemming = "Dutch"
files = [
...
]

All the files in your files array will be indexed using the Dutch stemming algorithm.

If you don't want the words in your content to be stemmed (or if it's in an unsupported language), you can also specify stemming = "None" in the configuration file to turn off stemming.

The stemming configuration can be set on a per-file basis, as well. For example, if you have three files in Spanish but one in French, you can specify that the Spanish stemming algorithm be used generally, but the French algorithm be used for the French language file:

longer.toml (partial)
[input]
base_directory = "my_files"
stemming = "Spanish"
[[input.files]]
path = "document-1.txt"
url = "/document1.html"
title = "Mi primer documento"
[[input.files]]
path = "document-2.txt"
url = "/document2.html"
title = "Mon document en français"
stemming_override = "French"
[[input.files]]
path = "document-3.txt"
url = "/document3.html"
title = "Mi segundo documento"

Was this page helpful?

If you see an issue, please file a bug!

© 2019–2023. Stork is maintained by James Little, who's really excited that you're checking it out.

This site is open source. Please file a bug or open a PR if you see something confusing or incorrect.

Logo art by Bruno Monts, with special thanks to the fission.codes team. Please contact James Little before using the logo for anything.