A new report details how artificial intelligence (AI) can be used to create efficient models for the genomic selection of sugarcane and forage grass varieties, while predicting their performance in the field based on their DNA.
This is the first time that a highly efficient machine learning-based genomic selection method has been proposed for polyploid plants – in which cells have more than two complete sets of chromosomes.
The methodology, published in Scientific reports, improved the predictive power of machine learning by more than 50%. This means that this model is much more accurate than traditional breeding techniques.
The complexity of breeding techniques
machine learning is a branch of AI that involves computer statistics and optimization with countless applications. Its main goal is to create algorithms that automatically extract patterns from datasets. Its use extends to a wide range, including plant performance, resistance and tolerance to biotic and abiotic stresses – such as cold, drought, salinity and soil nutrient deficiency.
In traditional breeding programs, crossbreeding is the most widely used technique. Alexandre Hild Aono, computer scientist and lead author of the study, said: “You establish populations by crossing interesting plants. In the case of sugar cane, we cross a variety that produces a lot of sugar with another that is more resistant, for example. You cross them and then evaluate the performance of the genotypes obtained in the field.
He continued: “But this evaluation process is very time consuming and very expensive. Our genomic selection method makes it possible to predict the performance of these plants even before they grow. We were able to predict yield based on genetic material. This is important because it saves many years of evaluation.
In the case of sugarcane, the prediction of plant performance is very complex. Traditional breeding techniques take nine to 12 years and come with high costs, according to Anete Pereira de Souza, professor of plant genetics at the Center for Molecular Biology and Genetic Engineering at the State University of Campinas.
“When breeders identify an interesting plant, they multiply it by cloning so as not to lose the genotype, but this takes time and is expensive. An extreme example is rubber tree farming, which can take up to 30 years,” Souza explained.
One method that can be used to overcome these difficulties is “Plant Breeding 4.0”, which makes extensive use of data analysis and highly efficient computational and statistical tools. Each genomic selection model can involve up to a billion sequences.
However, the main obstacle scientists face in breeding better polyploid plant varieties is the complexity of their genomes. “In this case, we didn’t even know if genomic selection would be possible, given the limited resources and the difficulty of working with this complexity,” Aono said.
Develop a new method to predict plant performance
The researchers started the genomic selection process with diploid plants because they have similar chromosomes to polyploid plants. Souza said, “The problem is that high-value tropical plants like sugar cane are not diploid but polyploid, which is a complication.”
While animals and humans are diploid, sugarcane can have up to 12 copies of each chromosome. Any individual of the species Homo sapiens can have up to two variants of each gene, one inherited from the father and one from the mother. Sugarcane is more complex because any gene can have many variants in the same individual, with some genomes having six, eight, ten or even 12 sets.
“The genetics are so complex that breeders work with sugarcane as if it were diploid,” Souza said.
Can genomic selection work effectively to predict plant breeding?
In 2001, Theodorus Meuwissen, a Dutch scientist proposed genomic selection to predict complex traits in animals and plants in association with their phenotypes – which are observable characteristics resulting from the interaction of their genotypes with the environment. The advantage of this plant breeding approach is the link between the phenotypic traits of interest, such as yield, sugar level or earliness, and single nucleotide polymorphisms (SNPs). A SNP is a genomic variant at a single base position in DNA.
Souza explained, “It’s the difference in the genomes of two individuals. For example, we can have an A [corresponding to the nucleotide adenine] which produces a little more than another with a G [guanine] at the same place in the genome. It changes everything “,
“When you find an association with what you’re looking for, such as a high level of sugar production and specific SNPs at different locations in the genome, you can only sequence the population that your breeding work focuses on.”
The genomic selection method proposed by the team eliminates the need to plant and phenotype throughout the selection cycle. “We do field experiments in the early stages of the program to get the phenotype of interest for each clone,” Souza said.
“At the same time, we quite simply sequence all the clones of the breeding population, without needing to have the complete genome for each clone. This is called genotyping by sequencing – partial sequencing looking for differences and similarities in the base pairs of clones, and their association with the production of each clone.
“The association between phenotype and genome shows who produces the most and which SNPs are associated with higher production. In this way, we can identify clones with a large proportion of SNPs that contribute to the higher production observed in the initial experiments and obtain the most productive variety faster and at lower cost.