Omesa Only - End-To-End In 2 Minutes
With the end-to-end Experiment
pipeline and a configuration dictionary,
several experiments or set-ups can be ran and evaluated with a very minimal
piece of code. One of the test examples provided is that of
n-gram classification
of Wikipedia documents. In this experiment, we are provided with a toy set
n_gram.csv that
features 20 articles about Machine Learning, and 20 random other articles. To
run the experiment, the following configuration is used:
from omesa.experiment import Experiment
from omesa.featurizer import Ngrams
Experiment({
"project": "unit_tests",
"name": "gram_experiment",
"train_data": CSV("n_gram.csv", data=1, label=0, header=True),
"lime_data": CSV("n_gram.csv", data=1, label=0, header=True),
"features": [Ngrams(level='char', n_list=[3])],
"classifiers": [
{'clf': MultinomialNB()}
],
"save": ("log")
})
This will cross validate performance on the .csv
, selecting text
and label columns and indicating a header is present in the .csv
document.
We provide the Ngrams
function and parameters to be used as features, and
store the log.
Ouput
The log file will be printed during run time, as well as stored in the script's directory. The output of the current experiment is as follows:
---- Omesa ----
Config:
feature: char_ngram
n_list: [3]
name: gram_experiment
seed: 111
Sparse train shape: (20, 1287)
Tf-CV Result: 0.8