Data handling containers

Pipeline

 class Pipeline(exp=None, name=None, out=None) 

Shell for experiment pipeline storing and handling.

Parameters Type Doc
exp class, optional, default None Instance of Experimen with fitted pipes. If not supplied, name and source should be set.
out tuple, optional, default None Tuple with storage options, can be "json" (json serialization),or "db" (for database storage, requires blitzdb).

Methods

Function Doc
_make_tab Tabular level experiment representation.
save Save experiment and classifier in format specified.
load Load experiment and classifier from source specified.
classify Given a data point, return a (label, probability) tuple.

_make_tab

    _make_tab() 

Tabular level experiment representation.

Generates a table-level representation of an experiment. This stores JSON native information ONLY, and is used for the experiment table in the front-end, as deserializing a lot of experiments will be expensive in terms of loading times.

save

    save() 

Save experiment and classifier in format specified.

load

    load() 

Load experiment and classifier from source specified.

classify

    classify(data) 

Given a data point, return a (label, probability) tuple.

CSV

 class CSV 

Quick and dirty csv loader.

Parameters Type Doc
text integer Index integer of the .csv where the text is located.
parse integer, optional, default None Index integer of the .csv where the annotations are provided. Currently it assumes that these are per instance a list of, for every word, (token, lemma, POS). Frog and spaCy are implemented to provide these for you.
header boolean, optional, default False If the file has a header, you can skip it by setting this to true.

Methods

Function Doc
iter Standard iter method.
next Iterate through csv file.

iter

    __iter__() 

Standard iter method.

next

    __next__() 

Iterate through csv file.