Vectorizer and optimization

Vectorizer

 class Vectorizer(conf=None, featurizer=None, normalizers=None, decomposers=None) 

Small text mining vectorizer.

The purpose of this class is to provide a small set of methods that can operate on the data provided in the Experiment class. It can load data from an iterator or .csv, and guides that data along a set of modules such as the feature extraction, tf*idf function, SVD, etc. It can be controlled through a settings dict that is provided in conf.

Parameters Type Doc
conf dict Configuration dictionary passed to the experiment class.
Attributes Type Doc
conf dict Configuration dictionary passed to the experiment class.
hasher class DictVectorizer class from sklearn.
decomposers class TruncatedSVD class from sklearn.

Methods

Function Doc
transform Send the data through all applicable steps to vectorize.

transform

    transform(data, fit=False) 

Send the data through all applicable steps to vectorize.

Optimizer

 class Optimizer(object) 

Current placeholder for grid methods. Should be fleshed out.

Parameters Type Doc
classifiers dict, optional, default None Dictionary where the key is a initiated model class, and the values are a dictionary with parameter settings in a (string-array) format, same as used in the scikit-learn pipeline. So for example, we provide: {LinearSVC(class_weight='balanced'): {'C': np.logspace(-3, 2, 6)}}. Note that pipeline requires some namespace (like clf__C), but the class handles that already.

Methods

Function Doc
best_model Choose best parameters of trained classifiers.
choose_classifier Choose a classifier based on settings.

best_model

    best_model() 

Choose best parameters of trained classifiers.

choose_classifier

    choose_classifier(X, y, seed) 

Choose a classifier based on settings.