Stork Configuration File Reference

Just getting started? Learn how to build a search index

A Stork configuration file is a TOML file that you pass into the Stork build command. This file defines the way your index is created and processed, and also controls some aspects of how your search results are displayed.

$ stork build --input my-config-file.toml --output my-index.st

The configuration file parser relies heavily on intuitive default values: if a field is inapplicable to your search index (or if you're happy with the listed default value), you can leave the field out of your configuration file.

Stork configuration files are guaranteed to be parseable in all future versions of Stork, though the Stork build phase will warn you if you're using deprecated fields in your configuration file. Fields may be deprecated in point releases of Stork, at which point they will also be undocumented. Setting deprecated fields will have no effect on your index.

Input Options

Input options define how your files will be read and processed. Input options are key/value pairs under the [input] TOML Table:

stork-config.toml
[input]
key1 = "value"
key2 = 123

#files Array of File objects

Default: Empty Array

The list of documents Stork should index.

#base_directory String

Default: Empty string

If Stork is indexing files on your filesystem, this is the base directory that should be used to resolve relative paths. This path will be in relation to the working directory when you run the stork build command.

#url_prefix String

Default: Empty String

Each file has a target URL to which it links. If all those target URLs have the same prefix, you can set that prefix here to make shorter file objects.

#title_boost String

Default: "Moderate"

One of: Minimal, Moderate, Large, and Ridiculous. Determines how much a result will be boosted if the search query matches the title.

#html_selector String

Default: "main"

For all HTML files, this will control the container tag where Stork will index content. Expects a CSS selector. See more

#exclude_html_selector Optional String

Default: Null

For all HTML files, content within this CSS selector will be excluded from the index.

#frontmatter_handling String

Default: "Omit"

One of Ignore, Omit, or Parse. If frontmatter is detected in your content, Ignore will not handle the frontmatter in any special way, effectively including it in the index. Omit will parse and remove frontmatter from indexed content. Parse does nothing.

#stemming String

Default: "English"

The stemming algorithm the indexer should use while analyzing words. Should be None or one of the languages supported by Snowball Stem, e.g. Dutch.

#srt_config SRT Config Object

Default: See below

For all SRT files, this object will describe how Stork will handle the timestamp information embedded in the file. See more

#minimum_indexed_substring_length Integer

Default: 3

The minimum substring length that gets indexed and is available to be searched. Setting this too low will make your index file gigantic.

#minimum_index_ideographic_substring_length Integer

Default: 1

If a string is made of CJK Ideographs , its substrings should be shorter. This defines the minimum indexed substring length when indexing an ideographic string.

#break_on_file_error Boolean

Default: false

If a single document fails to be indexed, this flag controls whether the entire indexing process fails or if indexing continues with the failing document omitted.

The File object

Each file object represents a document that will be indexed by Stork, and an entry that will be displayed when your users search in a search box.

Files can either be specified using an "Array of Objects" syntax, which is more compact...

array-of-objects.toml
[input]
base_directory = "./files"
files = [
{title = "Introduction", url = "https://google.com", path = "federalist-1.txt"}
{title = "Concerning Dangers from Foreign Force and Influence", url = "https://yahoo.com", path = "federalist-2.txt"}
]

... or using the TOML "Array of Tables" syntax:

array-of-tables.toml
[input]
base_directory = "./files"
[[input.files]]
path = "federalist-1.txt"
url = "https://google.com"
title = "Introduction"
[[input.files]]
path = "federalist-2.txt"
url = "https://yahoo.com"
title = "Concerning Dangers from Foreign Force and Influence"

The two are equivalent.

#title String

Required

The document title. Used mainly for display purposes, but search queries with words in the title are given a boost.

#url String

Required

The location this search result links to online. This value eventually becomes the href of the search result link.

#path Optional String

Default: null

The location of the document/file on disk, where the indexer can find it. Each file object must have either a path, contents, or src_url field, but not more than one.

#contents Optional String

Default: null

The contents of the document, embedded inline in the configuration file. Each file object must have either a path, contents, or src_url field, but not more than one.

#src_url Optional String

Default: null

The URL that Stork should scrape to get the contents of the document. Each file object must have either a path, contents, or src_url field, but not more than one. However, if src_url and url are the same, you may omit src_url altogether.

#html_selector_override Optional String

Default: null

Overrides the global html_selector configuration option for this document.

#exclude_html_selector_override Optional String

Default: null

Overrides the global exclude_html_selector configuration option for this document.

#stemming_override Optional String

Default: null

Overrides the stemming algorithm used for this document. You will likely set this if this document is written in a different language from the rest of the documents in your corpus.

#filetype Optional String

Default: null

If specified, one of: PlainText, SRTSubtitle, HTML, or Markdown. Stork needs to know what kind of file it is indexing so it can parse the file's contents properly. Sometimes, Stork can determine what kind of file it is looking at automatically. If Stork cannot detect the type of a file, you should manually set the filetype with this option.

The SRT Configuration object

Read more about configuring SRT behavior on the SRT documentation page.

You can add SRT configuration options by adding the key-value pairs under a [input.srt_config] table:

srt-config.toml
[input]
url_prefix = "https://vimeo.com/"
[input.srt_config]
timestamp_template_string = "#t={}"
timestamp_format = "MinutesAndSeconds"
[[input.files]]
...

#timestamp_linking Boolean

Default: true

Determines whether Stork should use the timestamp data embedded in the subtitle file to append the timestamp to the search result's URL. Setting this to false will effectively strip timestamp data from the subtitle files.

#timestamp_template_string String

Default: "&t={ts}"

The string that gets appended to the URL to add timestamp data to the link. {ts} gets replaced with the timestamp of that result's excerpt. The default template string is the string used by YouTube.

#timestamp_format String

Default: "NumberOfSeconds"

One of: NumberOfSeconds. Determines the format of the timestamp that replaces {ts} in the template string. The default format, "number of seconds", is the timestamp format used in YouTube links.

Output Configuration

The optional output section of a Stork configuration file defines how the config file is written to disk and how the search results should be displayed when using the Javascript API.

output.toml
[input]
...
[output]
excerpts_per_result = 2
displayed_results_count = 5

#excerpt_buffer Integer

Default: 8

The number of words that will surround each search term in the displayed search results. Also determines what defines a nearby word when grouping nearby search results into a single match.

#excerpts_per_result Integer

Default: 5

Defines the maximum number of excerpts that will be shown for each search result, if multiple excerpts match the search query. If set to 0, the indexer will be able to optimize the search index filesize, making it up to 40% smaller.

#displayed_results_count Integer

Default: 10

Defines the maximum number of search results displayed in the list. Pushing this too high will result in performance issues.

#save_nearest_html_id Boolean

Default: False

If true, correlates each word in an HTML document with the nearest ID in the document. The Stork web interface will link directly to that ID, helping your users jump directly to the content they search for.

#debug Boolean

Default: false

When true, Stork will output a pretty-printed JSON representation of the index, instead of the index file. This option should only be used for debugging.

Was this page helpful?

If you see an issue, please file a bug!

© 2019–2022. Stork is maintained by James Little, who's really excited that you're checking it out.

If you have any questions or comments, feel free to start a discussion on Github or chat about the project on Discord.

This site is open source. Please file a bug or open a PR if you see something confusing or incorrect.

Logo art by Bruno Monts, with special thanks to the fission.codes team. Please contact James Little before using the logo for anything.