Under the intense early summer sun of China’s southernmost Hainan Province, researchers are accelerating the propagation of China’s four staple grains — rice, wheat, corn, and soybean– as the ambitious “Tian Shu” (Heavenly Book) initiative shifts from laboratory sequencing to large-scale field data collection.
Over the next few months, the multiplied seeds will be sown at four trial sites across the country’s major grain-producing regions, marking a crucial step in creating a standardized, high-quality dataset that integrates genotype, phenotype, and environmental data for intelligent crop breeding.
Launched in 2023 by the Institute of Crop Sciences of the Chinese Academy of Agricultural Sciences (CAAS) and China’s digital-tech giant Tencent, the “Tian Shu” initiative aims to decode the genetic information carried by crop germplasm resources — the hereditary materials that hold the keys to higher yields, stress tolerance and improved quality.
“These germplasm resources are like a heavenly book recording the mysteries of crop growth, yet they remain largely underutilized due to a lack of in-depth decoding,” said Qian Qian, an academician of the Chinese Academy of Sciences.
According to the Food and Agriculture Organization of the United Nations, as of the end of 2024, less than 10 percent of the more than 6 million conserved crop germplasm accessions globally were effectively used.
To tackle this bottleneck, the “Tian Shu” team first screened over 200,000 accessions of food-crop germplasm to select more than 10,000 representative core accessions, covering at least 95 percent of the genetic diversity of each crop while considering geographic distribution and specific traits.
These core accessions are then decoded using high-depth whole-genome resequencing.
He Qiang, a researcher at the CAAS Institute of Crop Sciences, noted that for wheat, the technology can now obtain up to 15 gigabytes of genetic information per accession, raising data resolution by about 250,000 times compared to conventional methods.
“It’s like moving from recognizing a face to examining every detail,” He said.
BEYOND THE GENETIC CODE
But genetic sequences alone are far from sufficient. A trait is shaped by both genes and the environment, which is often summed up in the Chinese saying that “oranges grown south of the Huai River are sweet, but north of it they become bitter.”
To capture this complexity, the team has established four field trial sites across different ecological zones in the country, covering Northeast, North and South China.
At a site in Xinxiang of central China’s Henan Province, researchers are using drone-based multispectral imaging and ground-based laser radar to monitor thousands of wheat accessions around the clock. Automated weather stations and multi-parameter soil sensors record temperature, precipitation, soil salinity, pH levels, and fertility in real time.
“The core of ‘Tian Shu’ is to build a large-scale, standardized dataset that aligns genotype, phenotype and environment data for each core germplasm accession,” said He Qiang. “This enables traceability and reusability, providing solid data support for precision breeding.”
Zhou Wenbin, director of the CAAS Institute of Crop Sciences and head of the “Tian Shu” initiative, said the large-scale field data collection is key to building the dataset.
“While genotypic data is now relatively easy to obtain at scale, multi-environment field phenotypic data remains the main bottleneck,” Zhou said.
He stressed that the core objective is not simply to verify the field performance of genes pre-selected in the lab.
“It’s about generating high-quality foundational data for deciphering the genetic rules behind key agronomic traits and training breeding prediction models,” Zhou explained.
The value of such a dataset is already evident from past setbacks.
Li Liang, a corn breeder at CAAS, recalled a once-promising maize germplasm that showed high and stable yields and disease resistance in early trials but failed in later regional tests due to unstable performance and poor grain setting under varying climate and soil conditions.
“If we had relevant data for prediction at an early stage, such problems might have been better avoided,” Li said.
FROM EXPERIENCE TO DATA
Zhou Wenbin said that the standardized dataset will eventually feed into a national crop germplasm information platform, providing breeders with references for germplasm requests, gene discovery, phenotype prediction and breeding design.
With the help of AI algorithms, researchers can train predictive models to recommend optimal parent combinations and breeding strategies.
“For example, to develop a corn variety suitable for the Yellow River-Huaihe River-Haihe River region, breeders will only need to input local environmental data of temperature, precipitation and soil conditions, along with target traits such as high yield, disease resistance, and lodging resistance. The model will then recommend the best solutions,” He Qiang said.
This could shorten the breeding cycle from eight to 10 years down to three to five years.
For Qian, the transition is profound. “Traditional breeding relies on experience and time, a process that is full of trial and error. By contrast, ‘Tian Shu’ will drive China’s seed industry from experience-driven to data-driven,” said Qian.
“Decoding this heavenly book of crop genetics will open up endless possibilities for the future of breeding in China.”
The current field data collection is expected to be completed within two years, after which the tripartite dataset will be opened to domestic research institutions and, conditionally, to enterprises, enabling data sharing and collaborative model building.
Reference Link:- https://english.news.cn/20260512/14aa84965d584906aa6d66e4a969aee7/c.html
