Last year marked the 10th anniver­sary of the Human Genome Project, which iden­ti­fied each of the 22,000 genes in human DNA. But as chem­istry pro­fessor William Han­cock pointed out, this was only a beginning.

He is co-​​organizing an inter­na­tional effort to map more than 500,000 pro­teins (col­lec­tively called the pro­teome), which are encoded by our DNA, one chro­mo­some at a time.

The pro­teome tells us what is hap­pening in an indi­vidual, whereas the genome mea­sures the poten­tial for dis­ease,” said Han­cock, the Brad­street Chair in Bio­an­a­lyt­ical Chem­istry at Northeastern’s Bar­nett Insti­tute of Chem­ical and Bio­log­ical Analysis in the Col­lege of Sci­ence. “C-​​HPP [the Chromosome-​​Centric Human Pro­teome Project] will give a more com­plete parts list of what pro­teins are expressed in health and dis­ease,” he said.

The project began in Sep­tember, and Han­cock and col­leagues pre­sented the C-​​HPP con­cept to the com­mu­nity in an article in the journal Nature Biotech­nology in March. In April, the Journal of Pro­teome Research pub­lished the Stan­dard Guide­lines for the project, and next week, Han­cock will join more than 1,000 researchers from around the world in Bei­jing for the 6th bian­nual con­gress of the Asia Oceania Human Pro­teome Orga­ni­za­tion to plan the next steps in the initiative.

The Human Genome Project was largely con­cen­trated in the west,” Han­cock said. “The pro­teome is a much more inter­na­tional effort.” So far, 16 nations on three con­ti­nents have signed on, with Asian nations taking a sig­nif­i­cantly more promi­nent role than in the past. “Such a huge project requires the resources of many coun­tries. Also, there are genetic and dis­ease dif­fer­ences between dif­ferent ethnic groups and geo­log­ical loca­tions,” he said.

Every cell in the human species con­tains 23 pairs of chro­mo­somes, the struc­tural units that orga­nize our DNA in a linear string of genes. Each C-​​HPP team will devote its efforts to defining the full pro­tein parts list of a single chro­mo­some, which con­tains, on average, about 1,000 protein-​​coding genes. Depending on genetic and envi­ron­mental fac­tors, Han­cock said, these thou­sand genes can code for one to approx­i­mately 40 pro­tein iso­forms, which will subtly alter an individual’s disease/​health status. Hancock’s team at North­eastern will work on Chro­mo­some 17, which is a par­tic­u­larly unstable chro­mo­some and includes many genes asso­ci­ated with cancer, including the gene with the highest inher­ited risk for breast cancer.

The ratio­nale for using a chromosome-​​centric approach, he explained, comes down to the need to inte­grate mas­sive amounts of genomic and pro­teomic data. This is much easier to do in the con­text of gene loca­tion, which is deter­mined by the chromosomes.

In addi­tion to taking a unique approach, the C-​​HPP will also use novel data pre­sen­ta­tions to manage the mas­sive quan­ti­ties of infor­ma­tion that the teams will gen­erate. For example, an inter­ac­tive color map will make acces­sible sev­eral levels of data for each pro­tein coding gene, such as gene activity, pro­tein mass spec­trom­etry, anti­body reagents, tissue local­iza­tion and dis­ease information.

This is a major sci­en­tific under­taking and will greatly aid med­ical research and devel­op­ment of inno­v­a­tive, per­son­al­ized drugs,” Han­cock said.