mirror of
https://github.com/boostorg/website-v2-docs.git
synced 2026-01-27 07:22:16 +00:00
510 lines
18 KiB
Plaintext
510 lines
18 KiB
Plaintext
////
|
|
Copyright (c) 2024 The C++ Alliance, Inc. (https://cppalliance.org)
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
|
|
Official repository: https://github.com/boostorg/website-v2-docs
|
|
////
|
|
= Bio-Tech Engineering
|
|
:navtitle: Bio-Tech Engineering
|
|
|
|
Boost libraries provide low-level, auditable control over algorithms and numeric behavior, making them well-suited to computational biology, genomics pipelines, and regulated - verifiable, reproducible and stable - biomedical software.
|
|
|
|
This topic examines the libraries that should assist in building a bio-tech tool, focusing on evolving data structures.
|
|
|
|
[square]
|
|
* <<Libraries>>
|
|
* <<Working with Phylogenetic Trees>>
|
|
* <<Create a Phylogenetic Tree>>
|
|
* <<Persist a Phylogenetic Tree>>
|
|
* <<Next Steps>>
|
|
* <<Footnotes>>
|
|
* <<See Also>>
|
|
|
|
|
|
== Libraries
|
|
|
|
|
|
* boost:graph[] : Provides the features necessary for phylogenetic trees (explored further in this topic) as well as interaction networks, pathway graphs, and other structures well suited to bio-tech study. The library supports efficient graph traversal, shortest path and labeled graph algorithms.
|
|
|
|
* boost:filesystem[] and boost:serialization[] : Provide the persistent storage needed for long runs and simulations.
|
|
|
|
* boost:asio[] : Provides great support for distributed bio-tech platforms, which might be used in cloud-based analysis, multi-lab collaboration, and possibly working with streaming and experimental data.
|
|
|
|
* boost:multi-array[] : Particularly useful for tasks in the medical/biological image and signal processing realms, where multi-dimensional image or sensor data is a key component.
|
|
|
|
* boost:math[], boost:multiprecision[], boost:random[] : Provide the numerical and simulation support for such processes as energy accumulation, long simulations, validation, protein folding, molecular dynamics, docking simulations, energy minimalization, among so many others.
|
|
|
|
* boost:container[] : Useful for memory-stable storage of works-in-progress, notable scenarios include particles/atoms, cache-friendly structures, deterministic memory layout.
|
|
|
|
Note:: The code in this tutorial was written and tested using Microsoft Visual Studio (Visual C++ 2022, Console App project) with Boost version 1.88.0.
|
|
|
|
|
|
== Working with Phylogenetic Trees
|
|
|
|
[#footnote1-location]
|
|
In biotechnology and molecular biology, _phylogenetic trees_ link:#footnote1[(1)] have become go-to data structures - researchers compare the DNA, RNA, or protein sequences of multiple organisms to infer evolutionary relationships and similarities. This is particularly important in drug discovery, where understanding conserved sequences across species can indicate potential targets for antibiotics, antivirals, or cancer therapies. Similarly, phylogenetic analysis aids scientists track how viruses like influenza or SARS-CoV-2 mutate over time, and predict which strains are most likely to dominate.
|
|
|
|
[#footnote2-location]
|
|
Beyond genomics, phylogenetic trees are also widely used in synthetic biology and _metagenomics_ link:#footnote2[(2)]. In synthetic biology, phylotrees guide the design of engineered organisms or genetic circuits.
|
|
|
|
Phylogenetic trees have also proven valuable in epidemiology and biotechnology safety monitoring. By mapping the evolutionary history of pathogens, scientists can trace the source of outbreaks, identify transmission patterns, and evaluate the emergence of drug resistance.
|
|
|
|
[#footnote3-location]
|
|
In medicine, metagenomic analyses of the _human microbiome_ link:#footnote3[(3)] help researchers understand the links between microbial communities and diseases such as obesity, diabetes, and autoimmune disorders. In environmental science, metagenomics can identify microorganisms capable of degrading pollutants or producing biofuels. In industrial biotech, it guides the optimization of microbial consortia for fermentation, waste treatment, or bioproduct synthesis.
|
|
|
|
A key challenge in metagenomics is handling the vast, complex datasets generated by high-throughput sequencing. Computational tools that efficiently model, manipulate, and analyze these data — including phylogenetic trees derived from metagenomic sequences — are essential to insightful observations and conclusions.
|
|
|
|
|
|
== Create a Phylogenetic Tree
|
|
|
|
Phylogenetic trees provide a natural way to organize and visualize complex communities. Using a Boost-based framework - like the following example - each step of tree construction, from adding new sequences, refining branches, to removing erroneous data — can be captured, and animated:
|
|
|
|
[source,cpp]
|
|
----
|
|
#include <boost/graph/adjacency_list.hpp>
|
|
#include <iostream>
|
|
|
|
// ------------------------------
|
|
// Vertex data
|
|
// ------------------------------
|
|
struct Species
|
|
{
|
|
std::string name;
|
|
};
|
|
|
|
// ------------------------------
|
|
// Graph type
|
|
// ------------------------------
|
|
using PhyloTree = boost::adjacency_list<
|
|
boost::vecS,
|
|
boost::vecS,
|
|
boost::directedS,
|
|
Species
|
|
>;
|
|
|
|
using Vertex = boost::graph_traits<PhyloTree>::vertex_descriptor;
|
|
|
|
// ------------------------------
|
|
// Recursive ASCII tree printer
|
|
// ------------------------------
|
|
void print_tree(const PhyloTree& tree, Vertex v,
|
|
const std::string& prefix = "",
|
|
bool is_last = true)
|
|
{
|
|
std::cout << prefix
|
|
<< (is_last ? "|__ " : "|_ ")
|
|
<< tree[v].name << "\n";
|
|
|
|
auto children = boost::make_iterator_range(out_edges(v, tree));
|
|
|
|
for (auto it = children.begin(); it != children.end(); ++it)
|
|
{
|
|
Vertex child = target(*it, tree);
|
|
bool last = (std::next(it) == children.end());
|
|
|
|
print_tree(tree,
|
|
child,
|
|
prefix + (is_last ? " " : "| "),
|
|
last);
|
|
}
|
|
}
|
|
|
|
// ------------------------------
|
|
// Print full tree from root
|
|
// ------------------------------
|
|
void show_tree(const PhyloTree& tree, Vertex root,
|
|
const std::string& title)
|
|
{
|
|
std::cout << "\n=== " << title << " ===\n";
|
|
print_tree(tree, root);
|
|
}
|
|
|
|
int main()
|
|
{
|
|
PhyloTree tree;
|
|
|
|
// ------------------------------
|
|
// Create root
|
|
// ------------------------------
|
|
auto luca = add_vertex({ "LUCA" }, tree);
|
|
show_tree(tree, luca, "Initial tree");
|
|
|
|
// ------------------------------
|
|
// Add primary domains
|
|
// ------------------------------
|
|
auto bacteria = add_vertex({ "Bacteria" }, tree);
|
|
auto archaea = add_vertex({ "Archaea" }, tree);
|
|
auto eukarya = add_vertex({ "Eukarya" }, tree);
|
|
|
|
add_edge(luca, bacteria, tree);
|
|
add_edge(luca, archaea, tree);
|
|
add_edge(luca, eukarya, tree);
|
|
|
|
show_tree(tree, luca, "After adding domains");
|
|
|
|
// ------------------------------
|
|
// Add Eukarya branches
|
|
// ------------------------------
|
|
auto animals = add_vertex({ "Animals" }, tree);
|
|
auto plants = add_vertex({ "Plants" }, tree);
|
|
auto fungi = add_vertex({ "Fungi" }, tree);
|
|
|
|
add_edge(eukarya, animals, tree);
|
|
add_edge(eukarya, plants, tree);
|
|
add_edge(eukarya, fungi, tree);
|
|
|
|
show_tree(tree, luca, "After Eukarya speciation");
|
|
|
|
// ------------------------------
|
|
// Add animal subclasses
|
|
// ------------------------------
|
|
auto mammals = add_vertex({ "Mammals" }, tree);
|
|
auto reptiles = add_vertex({ "Reptiles" }, tree);
|
|
|
|
add_edge(animals, mammals, tree);
|
|
add_edge(animals, reptiles, tree);
|
|
|
|
show_tree(tree, luca, "After animal diversification");
|
|
|
|
// ------------------------------
|
|
// Scientific correction: prune plants
|
|
// ------------------------------
|
|
clear_vertex(plants, tree);
|
|
remove_vertex(plants, tree);
|
|
|
|
show_tree(tree, luca, "After pruning Plants");
|
|
|
|
return 0;
|
|
}
|
|
|
|
----
|
|
|
|
Note:: The methods such as `add_edge`, `add_vertex`, and many others are provided by the boost:graph[] library, so little additional code is needed to manipulate a tree.
|
|
|
|
Run the code:
|
|
|
|
[source,text]
|
|
----
|
|
=== Initial tree ===
|
|
|__ LUCA
|
|
|
|
=== After adding domains ===
|
|
|__ LUCA
|
|
|_ Bacteria
|
|
|_ Archaea
|
|
|__ Eukarya
|
|
|
|
=== After Eukarya speciation ===
|
|
|__ LUCA
|
|
|_ Bacteria
|
|
|_ Archaea
|
|
|__ Eukarya
|
|
|_ Animals
|
|
|_ Plants
|
|
|__ Fungi
|
|
|
|
=== After animal diversification ===
|
|
|__ LUCA
|
|
|_ Bacteria
|
|
|_ Archaea
|
|
|__ Eukarya
|
|
|_ Animals
|
|
| |_ Mammals
|
|
| |__ Reptiles
|
|
|_ Plants
|
|
|__ Fungi
|
|
|
|
=== After pruning Plants ===
|
|
|__ LUCA
|
|
|_ Bacteria
|
|
|_ Archaea
|
|
|__ Eukarya
|
|
|_ Animals
|
|
| |_ Mammals
|
|
| |__ Reptiles
|
|
|__ Fungi
|
|
----
|
|
|
|
Consider adding your own calls to create and prune branches and leaves as appropriate.
|
|
|
|
[#footnote4-location]
|
|
Note:: _LUCA_ stands for "Last Universal Common Ancestors" - a key concept in evolutionary modeling link:#footnote4[(4)].
|
|
+
|
|
LUCA : _Primordial ocean of glowing protocells_ :
|
|
+
|
|
image:bio-tech-luca.png[LUCA]
|
|
|
|
== Persist a Phylogenetic Tree
|
|
|
|
Let's add efficient storing and retrieving of data to our program:
|
|
|
|
[source,cpp]
|
|
----
|
|
#include <boost/graph/adjacency_list.hpp>
|
|
#include <boost/graph/adj_list_serialize.hpp>
|
|
|
|
#include <boost/serialization/vector.hpp>
|
|
#include <boost/serialization/list.hpp>
|
|
#include <boost/serialization/string.hpp>
|
|
#include <boost/serialization/nvp.hpp>
|
|
|
|
#include <boost/archive/text_oarchive.hpp>
|
|
#include <boost/archive/text_iarchive.hpp>
|
|
#include <boost/filesystem.hpp>
|
|
|
|
#include <iostream>
|
|
#include <fstream>
|
|
|
|
// ------------------------------
|
|
// Vertex data
|
|
// ------------------------------
|
|
struct Species
|
|
{
|
|
std::string name;
|
|
|
|
template<class Archive>
|
|
void serialize(Archive& ar, const unsigned int)
|
|
{
|
|
ar& BOOST_SERIALIZATION_NVP(name);
|
|
}
|
|
};
|
|
|
|
// ------------------------------
|
|
// Graph type
|
|
// ------------------------------
|
|
using PhyloTree = boost::adjacency_list<
|
|
boost::vecS,
|
|
boost::vecS,
|
|
boost::directedS,
|
|
Species
|
|
>;
|
|
|
|
using Vertex = boost::graph_traits<PhyloTree>::vertex_descriptor;
|
|
|
|
// ------------------------------
|
|
// ASCII tree printer
|
|
// ------------------------------
|
|
void print_tree(const PhyloTree& tree, Vertex v,
|
|
const std::string& prefix = "",
|
|
bool is_last = true)
|
|
{
|
|
std::cout << prefix
|
|
<< (is_last ? "|_ " : "|__")
|
|
<< tree[v].name << "\n";
|
|
|
|
auto children = boost::make_iterator_range(out_edges(v, tree));
|
|
|
|
for (auto it = children.begin(); it != children.end(); ++it)
|
|
{
|
|
Vertex child = target(*it, tree);
|
|
bool last = (std::next(it) == children.end());
|
|
|
|
print_tree(tree,
|
|
child,
|
|
prefix + (is_last ? " " : "| "),
|
|
last);
|
|
}
|
|
}
|
|
|
|
// ------------------------------
|
|
// Animated display
|
|
// ------------------------------
|
|
void animate(const PhyloTree& tree, Vertex root,
|
|
const std::string& title,
|
|
int delay_ms = 800)
|
|
{
|
|
std::cout << "\n=== " << title << " ===\n";
|
|
print_tree(tree, root);
|
|
std::this_thread::sleep_for(std::chrono::milliseconds(delay_ms));
|
|
}
|
|
|
|
// ------------------------------
|
|
// Save / load helpers using Boost.Filesystem
|
|
// ------------------------------
|
|
void save_tree(const PhyloTree& tree,
|
|
const boost::filesystem::path& file)
|
|
{
|
|
boost::filesystem::create_directories(file.parent_path());
|
|
|
|
std::ofstream ofs(file.string());
|
|
boost::archive::text_oarchive oa(ofs);
|
|
|
|
oa << BOOST_SERIALIZATION_NVP(tree);
|
|
}
|
|
|
|
void load_tree(PhyloTree& tree,
|
|
const boost::filesystem::path& file)
|
|
{
|
|
if (!boost::filesystem::exists(file))
|
|
throw std::runtime_error("Tree file does not exist");
|
|
|
|
std::ifstream ifs(file.string());
|
|
boost::archive::text_iarchive ia(ifs);
|
|
|
|
ia >> BOOST_SERIALIZATION_NVP(tree);
|
|
}
|
|
|
|
// ------------------------------
|
|
// Main demo
|
|
// ------------------------------
|
|
int main()
|
|
{
|
|
PhyloTree tree;
|
|
|
|
auto luca = add_vertex({ "LUCA" }, tree);
|
|
animate(tree, luca, "Initial tree");
|
|
|
|
auto bacteria = add_vertex({ "Bacteria" }, tree);
|
|
auto archaea = add_vertex({ "Archaea" }, tree);
|
|
auto eukarya = add_vertex({ "Eukarya" }, tree);
|
|
|
|
add_edge(luca, bacteria, tree);
|
|
add_edge(luca, archaea, tree);
|
|
add_edge(luca, eukarya, tree);
|
|
|
|
animate(tree, luca, "Domains of life");
|
|
|
|
auto animals = add_vertex({ "Animals" }, tree);
|
|
auto plants = add_vertex({ "Plants" }, tree);
|
|
auto fungi = add_vertex({ "Fungi" }, tree);
|
|
|
|
add_edge(eukarya, animals, tree);
|
|
add_edge(eukarya, plants, tree);
|
|
add_edge(eukarya, fungi, tree);
|
|
|
|
animate(tree, luca, "Eukaryotic diversification");
|
|
|
|
auto mammals = add_vertex({ "Mammals" }, tree);
|
|
auto reptiles = add_vertex({ "Reptiles" }, tree);
|
|
|
|
add_edge(animals, mammals, tree);
|
|
add_edge(animals, reptiles, tree);
|
|
|
|
animate(tree, luca, "Animal diversification");
|
|
|
|
// Scientific correction: prune plants
|
|
clear_vertex(plants, tree);
|
|
remove_vertex(plants, tree);
|
|
|
|
animate(tree, luca, "After pruning Plants");
|
|
|
|
// ------------------------------
|
|
// Persist tree to disk
|
|
// ------------------------------
|
|
boost::filesystem::path data_dir = "data";
|
|
boost::filesystem::path tree_file = data_dir / "phylo_tree.dat";
|
|
|
|
save_tree(tree, tree_file);
|
|
std::cout << "\nTree saved to " << tree_file << "\n";
|
|
|
|
// ------------------------------
|
|
// Reload and verify
|
|
// ------------------------------
|
|
PhyloTree loaded;
|
|
load_tree(loaded, tree_file);
|
|
|
|
std::cout << "\n=== Reloaded tree ===\n";
|
|
print_tree(loaded, luca);
|
|
|
|
return 0;
|
|
}
|
|
|
|
----
|
|
|
|
Run the code:
|
|
|
|
[source,text]
|
|
----
|
|
=== Initial tree ===
|
|
|_ LUCA
|
|
|
|
=== Domains of life ===
|
|
|_ LUCA
|
|
|__Bacteria
|
|
|__Archaea
|
|
|_ Eukarya
|
|
|
|
=== Eukaryotic diversification ===
|
|
|_ LUCA
|
|
|__Bacteria
|
|
|__Archaea
|
|
|_ Eukarya
|
|
|__Animals
|
|
|__Plants
|
|
|_ Fungi
|
|
|
|
=== Animal diversification ===
|
|
|_ LUCA
|
|
|__Bacteria
|
|
|__Archaea
|
|
|_ Eukarya
|
|
|__Animals
|
|
| |__Mammals
|
|
| |_ Reptiles
|
|
|__Plants
|
|
|_ Fungi
|
|
|
|
=== After pruning Plants ===
|
|
|_ LUCA
|
|
|__Bacteria
|
|
|__Archaea
|
|
|_ Eukarya
|
|
|__Animals
|
|
| |__Mammals
|
|
| |_ Reptiles
|
|
|_ Fungi
|
|
|
|
Tree saved to "data\phylo_tree.dat"
|
|
|
|
=== Reloaded tree ===
|
|
|_ LUCA
|
|
|__Bacteria
|
|
|__Archaea
|
|
|_ Eukarya
|
|
|__Animals
|
|
| |__Mammals
|
|
| |_ Reptiles
|
|
|_ Fungi
|
|
----
|
|
|
|
Locate the folder containing your program, then examine the contents of `data\phylo_tree.dat`:
|
|
|
|
[source,text]
|
|
----
|
|
22 serialization::archive 20 0 0 8 7 0 0 4 LUCA 8 Bacteria 7 Archaea 7 Eukarya 7 Animals 5 Fungi 7 Mammals 8 Reptiles 0 1 0 0 0 2 0 3 3 4 3 5 4 6 4 7
|
|
----
|
|
|
|
== Next Steps
|
|
|
|
If manipulating trees, which are a form of graph, is part of your bio-tech projects, consider examining the full range of structures and methods available in boost:graph[].
|
|
|
|
== Footnotes
|
|
|
|
[#footnote1]
|
|
link:#footnote1-location[(1)]
|
|
A phylogenetic tree, often called a “phylotree”, is a diagrammatic representation of the evolutionary relationships among various biological species or entities, based on similarities and differences in their genetic, morphological, or biochemical characteristics. Each node represents a species (or other taxonomic unit), and the edges—or branches—denote evolutionary paths connecting them. The root of the tree represents the _most recent common ancestor_ (often shortened to _MRCA_) of all the species in the diagram, while the leaves correspond to current-day organisms or sequences. Phylogenetic trees can be either _rooted_, showing directionality of evolution, or _unrooted_, highlighting relationships without assuming an ancestral origin.
|
|
|
|
[#footnote2]
|
|
link:#footnote2-location[(2)]
|
|
Metagenomics is the study of genetic material recovered directly from environmental samples, rather than from individual, isolated organisms. It allows scientists to analyze entire microbial communities in their natural habitats — such as soil, water, the human gut, or industrial bioreactors — without the need to culture the organisms in a lab. By sequencing and analyzing the collective genomes of these communities, metagenomics provides insights into the diversity, function, and interactions of microorganisms that might otherwise remain hidden.
|
|
|
|
[#footnote3]
|
|
link:#footnote3-location[(3)]
|
|
The human microbiome refers to the vast community of microorganisms that live in and on the human body, including bacteria, viruses, fungi, and archaea. These microbes inhabit diverse environments such as the gut, skin, mouth, respiratory tract, and urogenital system. Collectively, their genomes — known as the microbiome — contain far more genes than the human genome itself, forming a complex biological system that interacts continuously with its host.
|
|
|
|
Gut microbes, for example, help digest complex carbohydrates, synthesize vitamins, regulate the immune system, and protect against pathogenic organisms. Disruptions to the microbiome, often called dysbiosis, have been linked to a wide range of conditions including inflammatory bowel disease, obesity, diabetes, allergies, and even neurological disorders. As a result, microbiome research has become a major focus in areas such as precision medicine, probiotic development, and microbiome-targeted therapies.
|
|
|
|
From a computational perspective, studying the human microbiome relies heavily on metagenomics and phylogenetic analysis.
|
|
|
|
[#footnote4]
|
|
link:#footnote4-location[(4)]
|
|
LUCA, or _Last Universal Common Ancestors_, refers to the most recent organism (or often population of organisms) from which all life on Earth descends. In phylogenetic trees, LUCA is often used as the root node, representing the point before life diversified into the major domains (such as Bacteria, Archaea, and Eukarya). Importantly, LUCA is not thought of as the very first living thing, but rather as the latest shared ancestor of all existing life — a complex organism that already possessed fundamental biological machinery like genetic coding, ribosomes, and basic metabolic pathways. That's why using LUCA as the root in your demo is both scientifically meaningful and intuitive: it provides a natural starting point from which the evolutionary branches unfold.
|
|
|
|
== See Also
|
|
|
|
* https://www.boost.org/doc/libs/latest/libs/libraries.htm#Algorithms[Category: Algorithms]
|
|
* https://www.boost.org/doc/libs/latest/libs/libraries.htm#Containers[Category: Containers]
|
|
* https://www.boost.org/doc/libs/latest/libs/libraries.htm#Data[Category: Data structures]
|
|
* https://www.boost.org/doc/libs/latest/libs/libraries.htm#Math[Category: Math and numerics] |