Components

Transformers

A transformer is called during indexing. It takes an iterator of attribute values and yields one or more - possibly changed or filtered - attributes, again.

For each source definition and content-item, the values of the specified attribute are passed to the first defined transformer, which performs some kind of transformation on them. If there are multiple transformers defined, the output of this transformer is then feed into the next transformer and so on, until all transformers defined for this source were applied. The output of the last transformer is then passed to the indexer.

A simple transformer looks like that:

def lowercase(terms):
    for term in terms:
        yield term.lowercase()

Transformers may expect the individual values of their input to be of a special type (like unicode, integer, datetime etc.) and may again output values of a different type. It is up to the creator of the index-schema to make sure that the input- and output-types of the defined transformers match.

Built-In Transformers

Below is the list of built-in transformers Fuse provides.

ToUnicode

Takes any input and transforms it to unicode strings. Usually used as a pre-processing step to use other transformers, which expect string terms as input.

input
anything
output
string

To use this transformer, add the following block to your components section:

{
    "name": "unicode",
    "component": "qc.index.transform:ToUnicode",
    "type": "transformer"
}

CapitalizeLower

Takes a string input and returns the capitalized version of that string, if it consists only of lower-case letters. ‘abc’ turns into ‘Abc’, ‘aBc’ stays unchanged. ‘abc xyz’ turns into ‘Abc xyz’.

input
string
output
string

To use this transformer, add the following block to your components section:

{
    "name": "capitalize",
    "component": "qc.index.transform:CapitalizeLower",
    "type": "transformer"
}

TitleCaseLower

Takes a string input and retuns the title-cased version of that string. That means, every word of the string which consists only of lowercase letters is capitalized.

‘abc’ turns into ‘Abc’, ‘aBc’ stays unchanged. ‘abc xyz’ turns into ‘Abc Xyz’.

To use this transformer, add the following block to your components section:

{
    "name": "titlecase",
    "component": "qc.index.transform:TitleCaseLower",
    "type": "transformer"
}

ISODate

This transformer takes a datetime field and outputs the field as a Date string in ISO format.

input
datetime
output
string

To use this transformer, add the following block to your components section:

{
    "name": "ISODate",
    "component": "qc.index.transform:ISODate",
    "type": "transformer"
}

ISODateTime

This transformer takes a datetime field and outputs the field as a DateTime string in ISO format.

input
datetime
output
string

To use this transformer, add the following block to your components section:

{
    "name": "ISODateTime",
    "component": "qc.index.transform:ISODateTime",
    "type": "transformer"
}

LowerCase

This transformer takes a string and returns the lower case version of that string. For example, aBC becomes abc.

input
string
output
string

To use this transformer, add the following block to your components section:

{
    "name": "lowercase",
    "component": "qc.index.transform:LowerCase",
    "type": "transformer"
}

tokens

Tokenize strings. Strings are tokenized at all non-numbers and non-characters except /&+_-. For example, this is-a;test would be tokenized as this is-a test.

input
string
output
string

To use this transformer, add the following block to your components section:

{
    "name": "token",
    "component": "qc.index.transform:tokens",
    "type": "transformer"
}

filter

Ignore any terms in the given list.

To use this transformer, add the following block to your components section:

{
    "name": "filter_1",  // name it as you like
    "factory": "qc.index.transform:filter",
    "type": "transformer",
    "params": {"skip_terms": ["term1", "term2"]}  // put terms to skip here
}

MapVocabulary

Takes string terms and maps them according to a given vocabulary.

To use this transformer, add the following block to your components section:

{
    "name": "map_1",  // name it as you like
    "factory": "qc.index.transform:MapVocabulary",
    "type": "transformer",
    "params": {"vocabulary": {"de": "Germany", "us": "United States"}}  // put vocabulary here
}

Or, to map non-string values (JSON only supports string keys in mappings):

{
    "name": "map_1",  // name it as you like
    "factory": "qc.index.transform:MapVocabulary",
    "type": "transformer",
    "params": {"vocabulary": [{"key": 0, "value": "off"}, {"key": 1, "value": "on"}]}
}

select_year

Takes string terms and maps them according to a given vocabulary.

To use this transformer, add the following block to your components section:

{
    "name": "map_1",  // name it as you like
    "component": "qc.index.transform:select_year",
    "type": "transformer"
}