Hide Navigation

Indexing Non-English Content

Stork is not an inherently English-language-specific tool. If your content is in a different language, Stork can parse and index that content.

Problem with this guide? Let me know!

The only language-specific functionality of Stork is the stemmer: the functionality that lets you search for the word "tributaries" and get results for the word "tributary." In this case, Stork's stemmer will recognize "tributar" as the root of the word "tributary" and use that to improve search results.

This transformation from "tributaries" to "tributar" is English-specific. Stork supports stemming in the following languages:

  • Arabic
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Spanish
  • Swedish
  • Tamil
  • Turkish

The stemming language algorithm used for each file is determined by your Stork config file. With no language specified, Stork will default to English stemming. To change the global stemming algorithm, set it in your config file:

basic.toml (partial)
[input]
base_directory = "my_files/"
stemming = "Dutch"
files = [
...
]

All the files in your files array will be indexed using the Dutch stemming algorithm.

If you don't want the words in your content to be stemmed (or if it's in an unsupported language), you can also specify stemming = "None" in the configuration file to turn off stemming.

The stemming configuration can be set on a per-file basis, as well. For example, if you have three files in Spanish but one in French, you can specify that the Spanish stemming algorithm be used generally, but the French algorithm be used for the French language file:

longer.toml (partial)
[input]
base_directory = "my_files"
stemming = "Spanish"
[[input.files]]
path = "document-1.txt"
url = "/document1.html"
title = "Mi primer documento"
[[input.files]]
path = "document-2.txt"
url = "/document2.html"
title = "Mon document en français"
stemming_override = "French"
[[input.files]]
path = "document-3.txt"
url = "/document3.html"
title = "Mi segundo documento"

© 2019–2020

Stork is built and shepherded by James Little, who's really excited that you're checking it out. If you have any questions or comments, feel free to get in touch or open an issue on Github.

This site is also on Github; feel free to put up a PR or open an issue if you see something worth changing.