Vectorizer and optimization

Vectorizer

 class Vectorizer(conf=None, featurizer=None, normalizers=None, decomposers=None)

Small text mining vectorizer.

The purpose of this class is to provide a small set of methods that can operate on the data provided in the Experiment class. It can load data from an iterator or .csv, and guides that data along a set of modules such as the feature extraction, tf*idf function, SVD, etc. It can be controlled through a settings dict that is provided in conf.

Parameters	Type	Doc
conf	dict	Configuration dictionary passed to the experiment class.

Attributes	Type	Doc
conf	dict	Configuration dictionary passed to the experiment class.
hasher	class	DictVectorizer class from sklearn.
decomposers	class	TruncatedSVD class from sklearn.

Methods

Function	Doc
transform	Send the data through all applicable steps to vectorize.

transform

    transform(data, fit=False)

Send the data through all applicable steps to vectorize.

Optimizer

 class Optimizer(object)

Current placeholder for grid methods. Should be fleshed out.

Parameters	Type	Doc
classifiers	dict, optional, default None	Dictionary where the key is a initiated model class, and the values are a dictionary with parameter settings in a (string-array) format, same as used in the scikit-learn pipeline. So for example, we provide: {LinearSVC(class_weight='balanced'): {'C': np.logspace(-3, 2, 6)}}. Note that pipeline requires some namespace (like clf__C), but the class handles that already.

Methods

Function	Doc
best_model	Choose best parameters of trained classifiers.
choose_classifier	Choose a classifier based on settings.

best_model

    best_model()

Choose best parameters of trained classifiers.

choose_classifier

    choose_classifier(X, y, seed)

Choose a classifier based on settings.