Train a classifier based on labeled geometries and a list of features to consider.
Tags
Learning
Long Description
This application trains a classifier based on labeled geometries and a list of features to consider for classification.
Parameters
[param] -io <string> This group of parameters allows setting input and output data.. Mandatory: True. Default Value: "0"
[param] -cfield <string> Field containing the class id for supervision. Only geometries with this field available will be taken into account.. Mandatory: True. Default Value: "class"
[param] -layer <int32> Index of the layer to use in the input vector file.. Mandatory: False. Default Value: "0"
[param] -valid <string> This group of parameters defines validation data.. Mandatory: True. Default Value: "0"
[param] -rand <int32> Set specific seed. with integer value.. Mandatory: False. Default Value: "0"
[param] -inxml <string> Load otb application from xml file. Mandatory: False. Default Value: ""
[param] -outxml <string> Save otb application to xml file. Mandatory: False. Default Value: ""
[choice] -feat List of field names in the input vector data to be used as features for training. . Mandatory: True. Default Value: ""
[choice] -classifier Choice of the classifier to use for the training. libsvm,boost,dt,gbt,ann,bayes,rf,knn. Mandatory: True. Default Value: "libsvm"
[param] -classifier.libsvm.m <string> Type of SVM formulation.. Mandatory: True. Default Value: "csvc"
[param] -classifier.libsvm.c <float> SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.. Mandatory: True. Default Value: "1"
[param] -classifier.boost.t <string> Type of Boosting algorithm.. Mandatory: True. Default Value: "real"
[param] -classifier.boost.w <int32> The number of weak classifiers.. Mandatory: True. Default Value: "100"
[param] -classifier.boost.r <float> A threshold between 0 and 1 used to save computational time. Samples with summary weight <= (1 - weight_trim_rate) do not participate in the next iteration of training. Set this parameter to 0 to turn off this functionality.. Mandatory: True. Default Value: "0.95"
[param] -classifier.boost.m <int32> Maximum depth of the tree.. Mandatory: True. Default Value: "1"
[group] -dt
[param] -classifier.dt.max <int32> The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.. Mandatory: True. Default Value: "65535"
[param] -classifier.dt.min <int32> If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.. Mandatory: True. Default Value: "10"
[param] -classifier.dt.cat <int32> Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.. Mandatory: True. Default Value: "10"
[param] -classifier.dt.f <int32> If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.. Mandatory: True. Default Value: "10"
[param] -classifier.dt.r <boolean> If true, then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.. Mandatory: False. Default Value: "True"
[param] -classifier.dt.t <boolean> If true, then pruned branches are physically removed from the tree.. Mandatory: False. Default Value: "True"
[group] -gbt
[param] -classifier.gbt.w <int32> Number "w" of boosting algorithm iterations, with w*K being the total number of trees in the GBT model, where K is the output number of classes.. Mandatory: True. Default Value: "200"
[param] -classifier.gbt.p <float> Portion of the whole training set used for each algorithm iteration. The subset is generated randomly.. Mandatory: True. Default Value: "0.8"
[param] -classifier.gbt.max <int32> The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.. Mandatory: True. Default Value: "3"
[group] -ann
[param] -classifier.ann.t <string> Type of training method for the multilayer perceptron (MLP) neural network.. Mandatory: True. Default Value: "reg"
[param] -classifier.ann.sizes <string> The number of neurons in each intermediate layer (excluding input and output layers).. Mandatory: True. Default Value: ""
[param] -classifier.ann.a <float> Alpha parameter of the activation function (used only with sigmoid and gaussian functions).. Mandatory: True. Default Value: "1"
[param] -classifier.ann.b <float> Beta parameter of the activation function (used only with sigmoid and gaussian functions).. Mandatory: True. Default Value: "1"
[param] -classifier.ann.bpdw <float> Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.. Mandatory: True. Default Value: "0.1"
[param] -classifier.ann.bpms <float> Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.. Mandatory: True. Default Value: "0.1"
[param] -classifier.ann.rdw <float> Initial value Delta_0 of update-values Delta_{ij} in RPROP method (default = 0.1).. Mandatory: True. Default Value: "0.1"
[param] -classifier.ann.rdwm <float> Update-values lower limit Delta_{min} in RPROP method. It must be positive (default = 1e-7).. Mandatory: True. Default Value: "1e-07"
[param] -classifier.ann.eps <float> Epsilon value used in the Termination criteria.. Mandatory: True. Default Value: "0.01"
[param] -classifier.ann.iter <int32> Maximum number of iterations used in the Termination criteria.. Mandatory: True. Default Value: "1000"
[group] -bayes
[group] -rf
[param] -classifier.rf.max <int32> The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.. Mandatory: True. Default Value: "5"
[param] -classifier.rf.min <int32> If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.. Mandatory: True. Default Value: "10"
[param] -classifier.rf.ra <float> If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.. Mandatory: True. Default Value: "0"
[param] -classifier.rf.cat <int32> Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.. Mandatory: True. Default Value: "10"
[param] -classifier.rf.var <int32> The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of features.. Mandatory: True. Default Value: "0"
[param] -classifier.rf.nbtrees <int32> The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.. Mandatory: True. Default Value: "100"