A classification approach for heterotic performance prediction based on molecular marker data
Resumen
A number of statistical methods based on molecular data are currently available for assigning new inbreds to heterotic groups in maize (Zea mays L), with variable results. We conjecture that the main flaw of such models is that they do not capture the non-linear relation between parental data and progeny performance. In this paper, we propose the use of supervised learning methods for handling such non-linearity. Standard and novel multiclassification methods are evaluated. Best results are obtained with the recently introduced class of multiclass, binary based,
Recursive ECOC (RECOC) classifiers. RECOC classifiers are inspired in state of art Coding Theory solutions for the problem of transmitting symbols over noisy channels. For molecular marker data the noisy channel abstraction embeds the hardness of learning a classification function from noisy and scarce samples. Field data (top crosses between 26 inbreed lines and four tester populations), processed by cluster analysis in a previous work, was integrated with molecular marker data and used for training RECOC – AdaBoost Support Vector Machines RBF classifiers. A 34.10 % 3-CV error was achieved, clearly improving previously reported results on this task.