Nav: Home

HybPiper: A bioinformatic pipeline for processing target-enrichment data

November 01, 2016

With the rapid rise of next-generation sequencing technologies, disparate fields from cancer research to evolutionary biology have seen a drastic shift in the way DNA sequence data is obtained. It is now possible to sequence many genes across large numbers of species in an incredibly short period of time. And the price tag keeps getting smaller and smaller. However, the deluge of sequence data obtained using these high-throughput sequencing techniques requires a substantial amount of computational input to process--a daunting task for many biologists. A recently developed bioinformatics pipeline allows researchers with limited computational skills to quickly and efficiently extract gene regions of interest from data obtained with the increasingly popular targeted sequence capture approach.

Targeted sequence capture is a technique used to focus sequencing efforts on specific regions of the genome. By reducing the size of the target genome to only those gene regions of interest, many more samples can be sequenced concurrently. A recent study led by scientists at the Chicago Botanic Garden and available in Applications in Plant Sciences describes the pipeline, HybPiper, for recovering gene regions from sequence data obtained using this technique.

"We set out to design a tool to reliably extract gene sequences from high-throughput sequencing projects to build phylogenetic trees," explains Dr. Matthew Johnson, lead author of the study. "Scientists using next-generation sequencing technologies get their data delivered in a big pile of DNA fragments. HybPiper decides which fragments belong to which gene, assembles the fragments into a gene region, and returns the full gene sequence, including introns, in a format that can be used for downstream analysis."

The pipeline brings together a number of Python scripts and free-standing programs to create a simple-to-use workflow for processing large amounts of sequence data. "We used a variety of tools at each phase, and tweaked the parameter settings until we were consistently recovering the right sequence. We also tried to be sensitive to different targeted sequencing designs--for example, not everyone will be able to design probes from a closely related genome. This flexibility is reflected in a large number of customizable parameters in HybPiper to better fit each individual project," explains Johnson.

One feature that is particularly useful, especially for those researchers working with plants, is HybPiper's ability to detect duplicate genes. Because all flowering plants, for example, have at least one whole genome duplication in their shared evolutionary history, the detection of paralogous gene copies is an essential part of accurately estimating species relationships. This, however, can be an exceedingly difficult and time-consuming task. Enter HybPiper. Built into the pipeline is the ability to detect duplicate genes within a molecular dataset. Johnson explains, "Sorting DNA sequencing fragments can be tricky when what seems like one gene is really two closely related genes. HybPiper has tools that will allow users to avoid this issue and detect whether a gene has been duplicated in their study organism."

Dr. Johnson concludes, "Development of HybPiper is ongoing. We have set up a website ( that helps users with installation issues and a comprehensive tutorial using an example dataset. We encourage users to provide feedback and suggest new features that will help them with their target enrichment analysis."
Johnson, M. G., E. M. Gardner, Y. Liu, R. Medina, B. Goffinet, A. J. Shaw, N. J. C. Zerega, and N. J. Wickett. 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Applications in Plant Sciences 4(7): 1600016. doi:10.3732/apps.1600016

Applications in Plant Sciences (APPS) is a monthly, peer-reviewed, open access journal focusing on new tools, technologies, and protocols in all areas of the plant sciences. It is published by the Botanical Society of America (, a nonprofit membership society with a mission to promote botany, the field of basic science dealing with the study and inquiry into the form, function, development, diversity, reproduction, evolution, and uses of plants and their interactions within the biosphere. APPS is available as part of BioOne's Open Access collection.

For further information, please contact the APPS staff at

Botanical Society of America

Related Genome Articles:

Genome evolution goes digital
Dr. Alan Herbert from InsideOutBio describes ground-breaking research in a paper published online by Royal Society Open Science.
Breakthrough in genome visualization
Kadir Dede and Dr. Enno Ohlebusch at Ulm University in Germany have devised a method for constructing pan-genome subgraphs at different granularities without having to wait hours and days on end for the software to process the entire genome.
Sturgeon genome sequenced
Sturgeons lived on earth already 300 million years ago and yet their external appearance seems to have undergone very little change.
A sea monster's genome
The giant squid is an elusive giant, but its secrets are about to be revealed.
Deciphering the walnut genome
New research could provide a major boost to the state's growing $1.6 billion walnut industry by making it easier to breed walnut trees better equipped to combat the soil-borne pathogens that now plague many of California's 4,800 growers.
Illuminating the genome
Development of a new molecular visualisation method, RNA-guided endonuclease -- in situ labelling (RGEN-ISL) for the CRISPR/Cas9-mediated labelling of genomic sequences in nuclei and chromosomes.
A genome under influence
References form the basis of our comprehension of the world: they enable us to measure the height of our children or the efficiency of a drug.
How a virus destabilizes the genome
New insights into how Kaposi's sarcoma-associated herpesvirus (KSHV) induces genome instability and promotes cell proliferation could lead to the development of novel antiviral therapies for KSHV-associated cancers, according to a study published Sept.
Better genome editing
Reich Group researchers develop a more efficient and precise method of in-cell genome editing.
Unlocking the genome
A team led by Prof. Stein Aerts (VIB-KU Leuven) uncovers how access to relevant DNA regions is orchestrated in epithelial cells.
More Genome News and Genome Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Warped Reality
False information on the internet makes it harder and harder to know what's true, and the consequences have been devastating. This hour, TED speakers explore ideas around technology and deception. Guests include law professor Danielle Citron, journalist Andrew Marantz, and computer scientist Joy Buolamwini.
Now Playing: Science for the People

#576 Science Communication in Creative Places
When you think of science communication, you might think of TED talks or museum talks or video talks, or... people giving lectures. It's a lot of people talking. But there's more to sci comm than that. This week host Bethany Brookshire talks to three people who have looked at science communication in places you might not expect it. We'll speak with Mauna Dasari, a graduate student at Notre Dame, about making mammals into a March Madness match. We'll talk with Sarah Garner, director of the Pathologists Assistant Program at Tulane University School of Medicine, who takes pathology instruction out of...
Now Playing: Radiolab

What If?
There's plenty of speculation about what Donald Trump might do in the wake of the election. Would he dispute the results if he loses? Would he simply refuse to leave office, or even try to use the military to maintain control? Last summer, Rosa Brooks got together a team of experts and political operatives from both sides of the aisle to ask a slightly different question. Rather than arguing about whether he'd do those things, they dug into what exactly would happen if he did. Part war game part choose your own adventure, Rosa's Transition Integrity Project doesn't give us any predictions, and it isn't a referendum on Trump. Instead, it's a deeply illuminating stress test on our laws, our institutions, and on the commitment to democracy written into the constitution. This episode was reported by Bethel Habte, with help from Tracie Hunte, and produced by Bethel Habte. Jeremy Bloom provided original music. Support Radiolab by becoming a member today at     You can read The Transition Integrity Project's report here.