Stork Configuration File Reference
Just getting started? Learn how to build a search index
A Stork configuration file is a TOML file that you pass into the Stork build command. This file defines the way your index is created and processed, and also controls some aspects of how your search results are displayed.
$ stork build --input my-config-file.toml --output my-index.st
The configuration file parser relies heavily on intuitive default values: if a field is inapplicable to your search index (or if you're happy with the listed default value), you can leave the field out of your configuration file.
Stork configuration files are guaranteed to be parseable in all future versions of Stork, though the Stork build phase will warn you if you're using deprecated fields in your configuration file. Fields may be deprecated in point releases of Stork, at which point they will also be undocumented. Setting deprecated fields will have no effect on your index.
Input Options
Input options define how your files will be read and processed. Input options are key/value pairs under the [input]
TOML Table:
[input]key1 = "value"key2 = 123
#base_directory • String
Default: Empty string
If Stork is indexing files on your filesystem, this is the base directory that should be used to resolve relative paths. This path will be in relation to the working directory when you run the stork build
command.
#url_prefix • String
Default: Empty String
Each file has a target URL to which it links. If all those target URLs have the same prefix, you can set that prefix here to make shorter file objects.
#title_boost • String
Default: "Moderate"
One of: Minimal
, Moderate
, Large
, and Ridiculous
. Determines how much a result will be boosted if the search query matches the title.
#html_selector • String
Default: "main"
For all HTML files, this will control the container tag where Stork will index content. Expects a CSS selector. See more
#exclude_html_selector • Optional String
Default: Null
For all HTML files, content within this CSS selector will be excluded from the index.
#frontmatter_handling • String
Default: "Omit"
One of Ignore
, Omit
, or Parse
. If frontmatter is detected in your content, Ignore
will not handle the frontmatter in any special way, effectively including it in the index. Omit
will parse and remove frontmatter from indexed content. Parse
does nothing.
#stemming • String
Default: "English"
The stemming algorithm the indexer should use while analyzing words. Should be None
or one of the languages supported by Snowball Stem, e.g. Dutch
.
#srt_config • SRT Config Object
Default: See below
For all SRT files, this object will describe how Stork will handle the timestamp information embedded in the file. See more
#minimum_indexed_substring_length • Integer
Default: 3
The minimum substring length that gets indexed and is available to be searched. Setting this too low will make your index file gigantic.
#minimum_index_ideographic_substring_length • Integer
Default: 1
If a string is made of CJK Ideographs , its substrings should be shorter. This defines the minimum indexed substring length when indexing an ideographic string.
#break_on_file_error • Boolean
Default: false
If a single document fails to be indexed, this flag controls whether the entire indexing process fails or if indexing continues with the failing document omitted.
The File object
Each file object represents a document that will be indexed by Stork, and an entry that will be displayed when your users search in a search box.
Files can either be specified using an "Array of Objects" syntax, which is more compact...
[input]base_directory = "./files"files = [ {title = "Introduction", url = "https://google.com", path = "federalist-1.txt"} {title = "Concerning Dangers from Foreign Force and Influence", url = "https://yahoo.com", path = "federalist-2.txt"}]
... or using the TOML "Array of Tables" syntax:
[input]base_directory = "./files" [[input.files]] path = "federalist-1.txt"url = "https://google.com"title = "Introduction" [[input.files]]path = "federalist-2.txt"url = "https://yahoo.com"title = "Concerning Dangers from Foreign Force and Influence"
The two are equivalent.
#title • String
Required
The document title. Used mainly for display purposes, but search queries with words in the title are given a boost.
#url • String
Required
The location this search result links to online. This value eventually becomes the href of the search result link.
#path • Optional String
Default: null
The location of the document/file on disk, where the indexer can find it. Each file object must have either a path
, contents
, or src_url
field, but not more than one.
#contents • Optional String
Default: null
The contents of the document, embedded inline in the configuration file. Each file object must have either a path
, contents
, or src_url
field, but not more than one.
#src_url • Optional String
Default: null
The URL that Stork should scrape to get the contents of the document. Each file object must have either a path
, contents
, or src_url
field, but not more than one. However, if src_url
and url
are the same, you may omit src_url
altogether.
#html_selector_override • Optional String
Default: null
Overrides the global html_selector
configuration option for this document.
#exclude_html_selector_override • Optional String
Default: null
Overrides the global exclude_html_selector
configuration option for this document.
#stemming_override • Optional String
Default: null
Overrides the stemming algorithm used for this document. You will likely set this if this document is written in a different language from the rest of the documents in your corpus.
#filetype • Optional String
Default: null
If specified, one of: PlainText
, SRTSubtitle
, HTML
, or Markdown
. Stork needs to know what kind of file it is indexing so it can parse the file's contents properly. Sometimes, Stork can determine what kind of file it is looking at automatically. If Stork cannot detect the type of a file, you should manually set the filetype with this option.
The SRT Configuration object
Read more about configuring SRT behavior on the SRT documentation page.
You can add SRT configuration options by adding the key-value pairs under a [input.srt_config]
table:
[input]url_prefix = "https://vimeo.com/" [input.srt_config]timestamp_template_string = "#t={}"timestamp_format = "MinutesAndSeconds" [[input.files]]...
#timestamp_linking • Boolean
Default: true
Determines whether Stork should use the timestamp data embedded in the subtitle file to append the timestamp to the search result's URL. Setting this to false will effectively strip timestamp data from the subtitle files.
#timestamp_template_string • String
Default: "&t={ts}"
The string that gets appended to the URL to add timestamp data to the link. {ts}
gets replaced with the timestamp of that result's excerpt. The default template string is the string used by YouTube.
#timestamp_format • String
Default: "NumberOfSeconds"
One of: NumberOfSeconds
. Determines the format of the timestamp that replaces {ts}
in the template string. The default format, "number of seconds", is the timestamp format used in YouTube links.
Output Configuration
The optional output section of a Stork configuration file defines how the config file is written to disk and how the search results should be displayed when using the Javascript API.
[input]... [output]excerpts_per_result = 2displayed_results_count = 5
#excerpt_buffer • Integer
Default: 8
The number of words that will surround each search term in the displayed search results. Also determines what defines a nearby word when grouping nearby search results into a single match.
#excerpts_per_result • Integer
Default: 5
Defines the maximum number of excerpts that will be shown for each search result, if multiple excerpts match the search query. If set to 0
, the indexer will be able to optimize the search index filesize, making it up to 40% smaller.
#displayed_results_count • Integer
Default: 10
Defines the maximum number of search results displayed in the list. Pushing this too high will result in performance issues.
#save_nearest_html_id • Boolean
Default: False
If true, correlates each word in an HTML document with the nearest ID in the document. The Stork web interface will link directly to that ID, helping your users jump directly to the content they search for.
#debug • Boolean
Default: false
When true, Stork will output a pretty-printed JSON representation of the index, instead of the index file. This option should only be used for debugging.