Stork Configuration File Reference

Just getting started? Learn how to build a search index

A Stork configuration file is a TOML file that you pass into the Stork build command. This file defines the way your index is created and processed, and also controls some aspects of how your search results are displayed.

$ stork build --input my-config-file.toml --output my-index.st

The configuration file parser relies heavily on intuitive default values: if a field is inapplicable to your search index (or if you're happy with the listed default value), you can leave the field out of your configuration file.

Stork configuration files are guaranteed to be parseable in all future versions of Stork, though the Stork build phase will warn you if you're using deprecated fields in your configuration file. Fields may be deprecated in point releases of Stork, at which point they will also be undocumented. Setting deprecated fields will have no effect on your index.

Input Options

Input options define how your files will be read and processed. Input options are key/value pairs under the [input] TOML Table:

stork-config.toml

[input]
key1 = "value"
key2 = 123

#files • Array of File objects

Default: Empty Array

The list of documents Stork should index.

#base_directory • String

Default: Empty string

If Stork is indexing files on your filesystem, this is the base directory that should be used to resolve relative paths. This path will be in relation to the working directory when you run the stork build command.

#url_prefix • String

Default: Empty String

Each file has a target URL to which it links. If all those target URLs have the same prefix, you can set that prefix here to make shorter file objects.

#title_boost • String

Default: "Moderate"

One of: Minimal, Moderate, Large, and Ridiculous. Determines how much a result will be boosted if the search query matches the title.

#html_selector • String

Default: "main"

For all HTML files, this will control the container tag where Stork will index content. Expects a CSS selector. See more

#exclude_html_selector • Optional String

Default: Null

For all HTML files, content within this CSS selector will be excluded from the index.

#frontmatter_handling • String

Default: "Omit"

One of Ignore, Omit, or Parse. If frontmatter is detected in your content, Ignore will not handle the frontmatter in any special way, effectively including it in the index. Omit will parse and remove frontmatter from indexed content. Parse does nothing.

#stemming • String

Default: "English"

The stemming algorithm the indexer should use while analyzing words. Should be None or one of the languages supported by Snowball Stem, e.g. Dutch.

#srt_config • SRT Config Object

Default: See below

For all SRT files, this object will describe how Stork will handle the timestamp information embedded in the file. See more

#minimum_indexed_substring_length • Integer

Default: 3

The minimum substring length that gets indexed and is available to be searched. Setting this too low will make your index file gigantic.

#minimum_index_ideographic_substring_length • Integer

Default: 1

If a string is made of CJK Ideographs , its substrings should be shorter. This defines the minimum indexed substring length when indexing an ideographic string.

#break_on_file_error • Boolean

Default: false

If a single document fails to be indexed, this flag controls whether the entire indexing process fails or if indexing continues with the failing document omitted.

The File object

Each file object represents a document that will be indexed by Stork, and an entry that will be displayed when your users search in a search box.

Files can either be specified using an "Array of Objects" syntax, which is more compact...

array-of-objects.toml

[input]
base_directory = "./files"
files = [
  {title = "Introduction", url = "https://google.com", path = "federalist-1.txt"}
  {title = "Concerning Dangers from Foreign Force and Influence", url = "https://yahoo.com", path = "federalist-2.txt"}
]

... or using the TOML "Array of Tables" syntax:

array-of-tables.toml

[input]
base_directory = "./files"
 
[[input.files]] 
path = "federalist-1.txt"
url = "https://google.com"
title = "Introduction"
 
[[input.files]]
path = "federalist-2.txt"
url = "https://yahoo.com"
title = "Concerning Dangers from Foreign Force and Influence"

The two are equivalent.

#title • String

Required

The document title. Used mainly for display purposes, but search queries with words in the title are given a boost.

#url • String

Required

The location this search result links to online. This value eventually becomes the href of the search result link.

#path • Optional String

Default: null

The location of the document/file on disk, where the indexer can find it. Each file object must have either a path, contents, or src_url field, but not more than one.

#contents • Optional String

Default: null

The contents of the document, embedded inline in the configuration file. Each file object must have either a path, contents, or src_url field, but not more than one.

#src_url • Optional String

Default: null

The URL that Stork should scrape to get the contents of the document. Each file object must have either a path, contents, or src_url field, but not more than one. However, if src_url and url are the same, you may omit src_url altogether.

#html_selector_override • Optional String

Default: null

Overrides the global html_selector configuration option for this document.

#exclude_html_selector_override • Optional String

Default: null

Overrides the global exclude_html_selector configuration option for this document.

#stemming_override • Optional String

Default: null

Overrides the stemming algorithm used for this document. You will likely set this if this document is written in a different language from the rest of the documents in your corpus.

#filetype • Optional String

Default: null

If specified, one of: PlainText, SRTSubtitle, HTML, or Markdown. Stork needs to know what kind of file it is indexing so it can parse the file's contents properly. Sometimes, Stork can determine what kind of file it is looking at automatically. If Stork cannot detect the type of a file, you should manually set the filetype with this option.

The SRT Configuration object

Read more about configuring SRT behavior on the SRT documentation page.

You can add SRT configuration options by adding the key-value pairs under a [input.srt_config] table:

srt-config.toml

[input]
url_prefix = "https://vimeo.com/"
  
[input.srt_config]
timestamp_template_string = "#t={}"
timestamp_format = "MinutesAndSeconds"
  
[[input.files]]
...

#timestamp_linking • Boolean

Default: true

Determines whether Stork should use the timestamp data embedded in the subtitle file to append the timestamp to the search result's URL. Setting this to false will effectively strip timestamp data from the subtitle files.

#timestamp_template_string • String

Default: "&t={ts}"

The string that gets appended to the URL to add timestamp data to the link. {ts} gets replaced with the timestamp of that result's excerpt. The default template string is the string used by YouTube.

#timestamp_format • String

Default: "NumberOfSeconds"

One of: NumberOfSeconds. Determines the format of the timestamp that replaces {ts} in the template string. The default format, "number of seconds", is the timestamp format used in YouTube links.

Output Configuration

The optional output section of a Stork configuration file defines how the config file is written to disk and how the search results should be displayed when using the Javascript API.

output.toml

[input]
...
 
[output]
excerpts_per_result = 2
displayed_results_count = 5

#excerpt_buffer • Integer

Default: 8

The number of words that will surround each search term in the displayed search results. Also determines what defines a nearby word when grouping nearby search results into a single match.

#excerpts_per_result • Integer

Default: 5

Defines the maximum number of excerpts that will be shown for each search result, if multiple excerpts match the search query. If set to 0, the indexer will be able to optimize the search index filesize, making it up to 40% smaller.

#displayed_results_count • Integer

Default: 10

Defines the maximum number of search results displayed in the list. Pushing this too high will result in performance issues.

#save_nearest_html_id • Boolean

Default: False

If true, correlates each word in an HTML document with the nearest ID in the document. The Stork web interface will link directly to that ID, helping your users jump directly to the content they search for.

#debug • Boolean

Default: false

When true, Stork will output a pretty-printed JSON representation of the index, instead of the index file. This option should only be used for debugging.

Getting Started

Going Further

References

Stork Configuration File Reference

Input Options

The File object

The SRT Configuration object

Output Configuration