//// Copyright (c) 2024 The C++ Alliance, Inc. (https://cppalliance.org) Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) Official repository: https://github.com/boostorg/website-v2-docs //// = High-Performance Database Engine :navtitle: Database Engine Creating a high-performance database application in pass:[C++] involves a range of tasks, including efficient data structures, shared and optimized memory management, safe message and network communication, persistent storage, and so much more. This section examines how to get started. [square] * <> * <> * <> * <> * <> * <> * <> * <> == Libraries Here are some Boost libraries that might be useful when planning and building your database app: [circle] * boost:container[] : Provides STL-compatible containers, including stable vector, flat set/map and more. The containers provided by this library can offer performance benefits over their standard library equivalents, making them a good fit for a high-performance database application. * boost:pool[] : This library is used for simple, fast memory allocation and can improve efficiency in some scenarios by managing memory in chunks. * boost:interprocess[] : This library allows for shared memory communication and synchronization between processes. In a database context, this can be useful for inter-process communication (IPC) and shared memory databases. * boost:lockfree[] : Provides lock-free data structures which could be useful in multi-threaded database applications where you want to avoid locking overhead. * boost:serialization[] : If you need to serialize objects for storage, boost:serialization[] can be a useful tool. However, be aware that for many database applications, more specialized serialization formats (like Protocol Buffers, Thrift, etc.) might be more appropriate. * boost:asio[] : Provides a consistent asynchronous model using a modern pass:[C++] approach for network and low-level I/O programming. It supports a variety of network protocols, which could be helpful if your database needs to communicate over a network. * boost:thread[] : Provides a portable interface for multithreading, which can be crucial when creating a high-performance database that can handle multiple queries concurrently. * boost:fiber[] : Allows you to write code that works with fibers, which are user-space threads that can be used to write concurrent code. This can be useful in situations where you have many tasks that need to run concurrently but are I/O-bound rather than CPU-bound. * boost:polygon[] or boost:geometry[] : For storing and querying spatial data, these libraries can provide the necessary data types and algorithms. * boost:filesystem[] : Provides a portable way of querying and manipulating paths, files, and directories. Note:: The code in this tutorial was written and tested using Microsoft Visual Studio (Visual C++ 2022, Console App project) with Boost version 1.88.0. == Sample Database Engine using Containers A database engine requires efficient data structures for handling indexes, caches, and storage layouts. The boost:container[] library provides drop-in replacements for standard containers like `std::vector`, `std::map`, and `std::unordered_map`, but optimized for memory efficiency and performance. In the following sample code, we will use in-memory indexing as the basis of a database engine. The boost:container[] `flat_map` feature is used to store a sorted index for quick lookups, and the `stable_vector` feature to store persistent records with stable pointers. The sample demonstrates inserting and retrieving records efficiently. [source,cpp] ---- #include #include #include #include // Define a Simple Database Table Structure struct Record { int id; // Primary Key std::string name; // Represents record data Record(int id, std::string name) : id(id), name(std::move(name)) {} }; // Implement a Database Table Class class DatabaseTable { public: using RecordStorage = boost::container::stable_vector; using IndexMap = boost::container::flat_map; // Fast lookup void insert(int id, const std::string& name) { size_t index = records.size(); records.emplace_back(id, name); index_map[id] = index; } const Record* find(int id) { auto it = index_map.find(id); if (it != index_map.end()) { return &records[it->second]; } return nullptr; } void print_all() const { for (const auto& record : records) { std::cout << "ID: " << record.id << ", Name: " << record.name << "\n"; } } private: RecordStorage records; // Stores records in a stable manner IndexMap index_map; // Provides fast ID lookups }; // Demonstrate Database Operations int main() { DatabaseTable db; // Insert records db.insert(101, "Alice"); db.insert(102, "Bob"); db.insert(103, "Charlie"); // Retrieve a record const Record* record = db.find(102); if (record) { std::cout << "Found: ID = " << record->id << ", Name = " << record->name << "\n"; } else { std::cout << "Record not found!\n"; } // Print all records std::cout << "All records:\n"; db.print_all(); return 0; } ---- Note:: Key features of this sample are that it is memory-efficient (reducing fragmentation and with good performance), `stable_vector` prevents invalid references when resizing, and `flat_map` is faster than `std::map` for heavy use. Run the program, the output should be: [source,text] ---- Found: ID = 102, Name = Bob All records: ID: 101, Name: Alice ID: 102, Name: Bob ID: 103, Name: Charlie ---- == Optimize Memory Allocation As we are dealing with frequent allocations of small objects (the database records) we'll enhance our database engine by using boost:pool[]. This library avoids repeated calls to `malloc`, `new` and `delete`. [source,cpp] ---- #include #include #include #include struct Record { int id; std::string name; Record(int id, std::string name) : id(id), name(std::move(name)) {} }; class DatabaseTable { public: using IndexMap = boost::container::flat_map; DatabaseTable() : recordPool(sizeof(Record)) {} Record* insert(int id, const std::string& name) { void* memory = recordPool.malloc(); // Allocate memory from the pool if (!memory) { throw std::bad_alloc(); } Record* newRecord = new (memory) Record(id, name); // Placement new index_map[id] = newRecord; return newRecord; } void remove(int id) { auto it = index_map.find(id); if (it != index_map.end()) { it->second->~Record(); // Call destructor recordPool.free(it->second); // Free memory back to the pool index_map.erase(it); } } Record* find(int id) { auto it = index_map.find(id); return (it != index_map.end()) ? it->second : nullptr; } void print_all() { for (const auto& pair : index_map) { std::cout << "ID: " << pair.first << ", Name: " << pair.second->name << "\n"; } } ~DatabaseTable() { for (const auto& pair : index_map) { pair.second->~Record(); recordPool.free(pair.second); } } private: boost::pool<> recordPool; IndexMap index_map; }; // Demonstrate Efficient Memory Use int main() { DatabaseTable db; // Insert records db.insert(101, "Alice"); db.insert(102, "Bob"); db.insert(103, "Charlie"); // Retrieve a record Record* record = db.find(102); if (record) { std::cout << "Found: ID = " << record->id << ", Name = " << record->name << "\n"; } // Remove a record db.remove(102); if (!db.find(102)) { std::cout << "Record 102 removed successfully.\n"; } // Print all records std::cout << "All records:\n"; db.print_all(); return 0; } ---- Note:: Custom _Object Pools_ can be tuned for your specific object sizes. The output should be: [source,text] ---- Found: ID = 102, Name = Bob Record 102 removed successfully. All records: ID: 101, Name: Alice ID: 103, Name: Charlie ---- == Use Persistent Shared Memory In a realistic database environment, you would probably want to enable a shared-memory database table that multiple processes can access simultaneously. For this, we need the features of boost:interprocess[]. This library enables multiple processes to share the same data faster than inter-process communication (IPC) via files or sockets, and includes mutexes and condition variables. [source,cpp] ---- #include #include #include namespace bip = boost::interprocess; const char* SHM_NAME = "SharedDatabase"; const char* TABLE_NAME = "UserTable"; const std::size_t MAX_USERS = 10; struct UserRecord { int id; char name[32]; }; using ShmemAllocator = bip::allocator; using UserTable = bip::vector; void create_table() { bip::shared_memory_object::remove(SHM_NAME); bip::managed_shared_memory segment(bip::create_only, SHM_NAME, 65536); const ShmemAllocator alloc_inst(segment.get_segment_manager()); UserTable* table = segment.construct(TABLE_NAME)(alloc_inst); for (int i = 0; i < 3; ++i) { UserRecord user; user.id = 1 + table->size(); std::snprintf(user.name, sizeof(user.name), "User%d", user.id); table->push_back(user); } std::cout << "Shared memory table created with 3 initial users.\n"; } void show_table() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); UserTable* table = segment.find(TABLE_NAME).first; if (!table) { std::cerr << "Table not found.\n"; return; } std::cout << "User Table:\n"; for (const auto& user : *table) { std::cout << " ID: " << user.id << ", Name: " << user.name << "\n"; } } catch (...) { std::cerr << "Shared Memory error - create a table\n"; } } void add_user() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); UserTable* table = segment.find(TABLE_NAME).first; if (!table) { std::cerr << "Table not found.\n"; return; } if (table->size() >= MAX_USERS) { std::cerr << "Table is full (max " << MAX_USERS << " users).\n"; return; } std::string name; std::cout << "Enter user name: "; std::getline(std::cin, name); UserRecord user; user.id = 1 + table->size(); std::snprintf(user.name, sizeof(user.name) - 1, "%s", name.c_str()); user.name[sizeof(user.name) - 1] = '\0'; table->push_back(user); std::cout << "User added.\n"; } catch (...) { std::cerr << "Shared Memory error - create a table\n"; } } void print_menu() { std::cout << "\n=== Shared Memory User Table Menu ===\n"; std::cout << "1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: "; } int main() { while (true) { print_menu(); int choice = 0; std::cin >> choice; std::cin.ignore(); // discard newline switch (choice) { case 1: create_table(); show_table(); break; case 2: show_table(); break; case 3: add_user(); break; case 4: bip::shared_memory_object::remove(SHM_NAME); break; case 5: std::cout << "Exiting...\n"; return 0; default: std::cout << "Invalid option. Try again.\n"; } } } ---- Boost shared memory is persistent. Run the program, add some user records, and exit without choosing option `4`. Then run the program again and note the records you added have persisted. First run: [source,text] ---- === Shared Memory User Table Menu === 1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: 1 Shared memory table created with 3 initial users. User Table: ID: 1, Name: User1 ID: 2, Name: User2 ID: 3, Name: User3 === Shared Memory User Table Menu === 1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: 3 Enter user name: Nigel User added. === Shared Memory User Table Menu === 1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: 2 User Table: ID: 1, Name: User1 ID: 2, Name: User2 ID: 3, Name: User3 ID: 4, Name: Nigel === Shared Memory User Table Menu === 1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: 5 Exiting... ---- Second run: [source,text] ---- === Shared Memory User Table Menu === 1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit: 2 User Table: ID: 1, Name: User1 ID: 2, Name: User2 ID: 3, Name: User3 ID: 4, Name: Nigel ---- == Safely Allow Access from Multiple Processes To safely allow multiple processes to access and modify shared memory concurrently in your boost:interprocess[] program, you should use interprocess synchronization primitives — like `interprocess_mutex` to guard critical sections. [source,cpp] ---- #include #include #include namespace bip = boost::interprocess; const char* SHM_NAME = "SharedDatabase"; const std::size_t MAX_USERS = 10; struct UserRecord { int id; char name[32]; }; using SegmentManager = bip::managed_shared_memory::segment_manager; using ShmemAllocator = bip::allocator; using UserTable = bip::vector; // Wrap the shared data and the mutex struct SharedData { bip::interprocess_mutex mutex; UserTable table; SharedData(const ShmemAllocator& alloc) : table(alloc) {} }; const char* TABLE_NAME = "SharedUserTable"; void create_table() { bip::shared_memory_object::remove(SHM_NAME); bip::managed_shared_memory segment(bip::create_only, SHM_NAME, 65536); ShmemAllocator alloc_inst(segment.get_segment_manager()); // Construct SharedData in shared memory segment.construct(TABLE_NAME)(alloc_inst); std::cout << "Shared memory table created.\n"; } void show_table() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); SharedData* data = segment.find(TABLE_NAME).first; if (!data) { std::cerr << "Table not found.\n"; return; } bip::scoped_lock lock(data->mutex); std::cout << "User Table:\n"; for (const auto& user : data->table) { std::cout << " ID: " << user.id << ", Name: " << user.name << "\n"; } } catch (...) { std::cerr << "Error accessing shared memory. Is it created?\n"; } } void add_user() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); SharedData* data = segment.find(TABLE_NAME).first; if (!data) { std::cerr << "Table not found.\n"; return; } bip::scoped_lock lock(data->mutex); if (data->table.size() >= MAX_USERS) { std::cerr << "Table is full (max " << MAX_USERS << " users).\n"; return; } std::string name; std::cout << "Enter user name: "; std::cin.ignore(); std::getline(std::cin, name); UserRecord user; user.id = 1 + static_cast(data->table.size()); std::snprintf(user.name, sizeof(user.name) - 1, "%s", name.c_str()); user.name[sizeof(user.name) - 1] = '\0'; data->table.push_back(user); std::cout << "User added.\n"; } catch (...) { std::cerr << "Error accessing shared memory. Is it created?\n"; } } void print_menu() { std::cout << "\n=== Shared Memory User Table Menu ===\n"; std::cout << "1. Create table 2. Show table 3. Add user 4. Clear shared memory 5. Exit\n"; std::cout << "Choose an option: "; } int main() { while (true) { print_menu(); int choice = 0; std::cin >> choice; switch (choice) { case 1: create_table(); show_table(); break; case 2: show_table(); break; case 3: add_user(); break; case 4: bip::shared_memory_object::remove(SHM_NAME); std::cout << "Shared memory cleared.\n"; break; case 5: std::cout << "Exiting...\n"; return 0; default: std::cout << "Invalid option. Try again.\n"; } } } ---- Now it is safe to run this program from two, or more, terminal sessions. == Add Serialization to Archive the Database Finally, let's add the features of boost:serialization[] to allow us to save and restore snapshots of our shared-memory database, making it persistent across program runs even when the shared memory is cleared. We will extend our sample to serialize the records into an archive format. [source,cpp] ---- #include // For managing shared memory segments #include // STL-like vector that works inside shared memory #include // Mutex across processes #include // Serialization support for std::vector #include // For saving serialized data to text files #include // For loading serialized data from text files #include #include namespace bip = boost::interprocess; // ---- Global configuration constants ---- const char* SHM_NAME = "SharedDatabase"; // Name of the shared memory segment const char* TABLE_NAME = "UserTable"; // Name of the container object inside shared memory const char* MUTEX_NAME = "SharedTableMutex"; // Name of the interprocess mutex const std::size_t MAX_USERS = 10; // Maximum number of users allowed in table // ---- User Record structure, supports Boost.Serialization ---- struct UserRecord { int id; // Unique user ID char name[32]; // Fixed-size character buffer for username // Serialization function used by Boost.Archive template void serialize(Archive& ar, const unsigned int) { ar& id; // Wrap raw array in make_array so Boost knows how to handle it ar& boost::serialization::make_array(name, sizeof(name)); } }; // ---- Type aliases for clarity ---- using ShmemAllocator = bip::allocator; // Vector of UserRecords in shared memory using UserTable = bip::vector; // ---- Create a new table in shared memory ---- void create_table() { // Remove any old shared memory segment and mutex (cleanup) bip::shared_memory_object::remove(SHM_NAME); bip::named_mutex::remove(MUTEX_NAME); // Create new shared memory segment of fixed size (64 KB here) bip::managed_shared_memory segment(bip::create_only, SHM_NAME, 65536); ShmemAllocator alloc(segment.get_segment_manager()); // Construct a UserTable object inside shared memory UserTable* table = segment.construct(TABLE_NAME)(alloc); // Pre-populate with three sample users for (int i = 0; i < 3; ++i) { UserRecord user; user.id = 1 + table->size(); std::snprintf(user.name, sizeof(user.name), "User%d", user.id); table->push_back(user); } std::cout << "Shared memory table created with 3 initial users.\n"; } // ---- Display the contents of the table ---- void show_table() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); bip::named_mutex mutex(bip::open_or_create, MUTEX_NAME); // Lock table to prevent concurrent modifications bip::scoped_lock lock(mutex); // Find UserTable in shared memory UserTable* table = segment.find(TABLE_NAME).first; if (!table) { std::cerr << "Table not found.\n"; return; } // Print all users std::cout << "User Table:\n"; for (const auto& user : *table) { std::cout << " ID: " << user.id << ", Name: " << user.name << "\n"; } } catch (...) { std::cerr << "Unable to access shared memory.\n"; } } // ---- Add a user to the shared memory table ---- void add_user() { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); bip::named_mutex mutex(bip::open_or_create, MUTEX_NAME); bip::scoped_lock lock(mutex); UserTable* table = segment.find(TABLE_NAME).first; if (!table || table->size() >= MAX_USERS) { std::cerr << "Table not found or full.\n"; return; } // Get new user name from console std::string name; // Discard leftover newline from previous input std::cin.ignore(); std::cout << "Enter user name: "; std::getline(std::cin, name); // Create new record and append UserRecord user; user.id = 1 + table->size(); std::snprintf(user.name, sizeof(user.name) - 1, "%s", name.c_str()); table->push_back(user); std::cout << "User added.\n"; } catch (...) { std::cerr << "Failed to add user.\n"; } } // ---- Save snapshot of current table to a text file ---- void save_snapshot(const std::string& filename) { try { bip::managed_shared_memory segment(bip::open_only, SHM_NAME); bip::named_mutex mutex(bip::open_or_create, MUTEX_NAME); bip::scoped_lock lock(mutex); UserTable* table = segment.find(TABLE_NAME).first; if (!table) { std::cerr << "Table not found.\n"; return; } // Copy data from shared memory into std::vector (heap memory) std::vector snapshot(table->begin(), table->end()); // Save serialized snapshot to file std::ofstream ofs(filename); boost::archive::text_oarchive oa(ofs); oa << snapshot; std::cout << "Snapshot saved to " << filename << "\n"; } catch (...) { std::cerr << "Failed to save snapshot.\n"; } } // ---- Load snapshot from text file into shared memory ---- void load_snapshot(const std::string& filename) { try { // Open file and load into vector std::ifstream ifs(filename); if (!ifs) { std::cerr << "Snapshot file not found.\n"; return; } std::vector snapshot; boost::archive::text_iarchive ia(ifs); ia >> snapshot; // Reset shared memory segment and mutex bip::shared_memory_object::remove(SHM_NAME); bip::managed_shared_memory segment(bip::create_only, SHM_NAME, 65536); bip::named_mutex::remove(MUTEX_NAME); bip::named_mutex mutex(bip::create_only, MUTEX_NAME); bip::scoped_lock lock(mutex); // Recreate UserTable and repopulate ShmemAllocator alloc(segment.get_segment_manager()); UserTable* table = segment.construct(TABLE_NAME)(alloc); for (const auto& user : snapshot) { table->push_back(user); } std::cout << "Snapshot loaded from " << filename << "\n"; } catch (...) { std::cerr << "Failed to load snapshot.\n"; } } // ---- Clear all shared memory resources ---- void clear_shared_memory() { bip::shared_memory_object::remove(SHM_NAME); bip::named_mutex::remove(MUTEX_NAME); std::cout << "Shared memory cleared.\n"; } // ---- Print the interactive menu ---- void print_menu() { std::cout << "\n=== Shared Memory Menu ===\n" << "1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:"; } // ---- Program entry point ---- int main() { while (true) { print_menu(); int choice; std::cin >> choice; switch (choice) { case 1: create_table(); // Show immediately after creation show_table(); break; case 2: show_table(); break; case 3: add_user(); break; case 4: save_snapshot("snapshot.txt"); break; case 5: load_snapshot("snapshot.txt"); show_table(); break; case 6: clear_shared_memory(); break; case 7: return 0; default: std::cout << "Invalid choice.\n"; } } } ---- Run the sample, and verify that the saved file persists after shared memory has been cleared. [source,text] ---- === Shared Memory Menu === 1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:1 Shared memory table created with 3 initial users. User Table: ID: 1, Name: User1 ID: 2, Name: User2 ID: 3, Name: User3 === Shared Memory Menu === 1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:3 Enter user name: Nigel User added. === Shared Memory Menu === 1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:4 Snapshot saved to snapshot.txt === Shared Memory Menu === 1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:6 Shared memory cleared. === Shared Memory Menu === 1. Create table 2. Show table 3. Add user 4. Save snapshot 5. Load snapshot 6. Clear shared memory 7. Exit:5 Snapshot loaded from snapshot.txt User Table: ID: 1, Name: User1 ID: 2, Name: User2 ID: 3, Name: User3 ID: 4, Name: Nigel ---- == Next Steps In the design of a database, consider all the independent processes, and how they might access persistent memory, for example: image::database-persistent-memory.png[] Perhaps now consider boost:filesystem[] for file management, and for a heavier duty database engine - integrate boost:asio[] to handle remote database transactions. Referring to the xref:task-networking.adoc[] sample would be a good place to start. The Boost libraries have a lot to offer this particular scenario! == See Also * https://www.boost.org/doc/libs/latest/libs/libraries.htm#Containers[Category: Containers] * https://www.boost.org/doc/libs/latest/libs/libraries.htm#Data[Category: Data structures] * https://www.boost.org/doc/libs/latest/libs/libraries.htm#Memory[Category: Memory]