Vectorizer and optimization
Vectorizer
class Vectorizer(conf=None, featurizer=None, normalizers=None, decomposers=None)
Small text mining vectorizer.
The purpose of this class is to provide a small set of methods that can operate on the data provided in the Experiment class. It can load data from an iterator or .csv, and guides that data along a set of modules such as the feature extraction, tf*idf function, SVD, etc. It can be controlled through a settings dict that is provided in conf.
Parameters | Type | Doc |
---|---|---|
conf | dict | Configuration dictionary passed to the experiment class. |
Attributes | Type | Doc |
---|---|---|
conf | dict | Configuration dictionary passed to the experiment class. |
hasher | class | DictVectorizer class from sklearn. |
decomposers | class | TruncatedSVD class from sklearn. |
Methods
Function | Doc |
---|---|
transform | Send the data through all applicable steps to vectorize. |
transform
transform(data, fit=False)
Send the data through all applicable steps to vectorize.
Optimizer
class Optimizer(object)
Current placeholder for grid methods. Should be fleshed out.
Parameters | Type | Doc |
---|---|---|
classifiers | dict, optional, default None | Dictionary where the key is a initiated model class, and the values are a dictionary with parameter settings in a (string-array) format, same as used in the scikit-learn pipeline. So for example, we provide: {LinearSVC(class_weight='balanced'): {'C': np.logspace(-3, 2, 6)}}. Note that pipeline requires some namespace (like clf__C), but the class handles that already. |
Methods
Function | Doc |
---|---|
best_model | Choose best parameters of trained classifiers. |
choose_classifier | Choose a classifier based on settings. |
best_model
best_model()
Choose best parameters of trained classifiers.
choose_classifier
choose_classifier(X, y, seed)
Choose a classifier based on settings.