Files
website-v2/docs/first_time_data_import.md
Lacey Williams Henschel 214f37986b Streamline import commands and add docs
Add data JSONField to LibraryVersion model

Add fields to library list admin display

Add commands to update authors and maintainers individually

Exclude data JSONField from view querysets

Silence some output from the library-version import management command

Remove unused field from the library-version import management command

Save library-versions more cleanly in the library-version import management command

Remove loading maintainers from the import command, since they now have their own command

Add docs for new commands

Add boost_setup command to run one command to import all data

Add docs on first-time data import

Better exception handling, quieter flow, reduce GH API calls

Graceful handling if there is not a github repo

Pass most recent 12 months to commit counts command

Add some user-friendly output to setup command
2023-09-19 11:17:05 -07:00

2.6 KiB

Populating the Database for the First Time

This document contains information about importing Boost Versions (also called Releases), Libraries, and the data associated with those objects. It is concerned with importing data in deployed environments, but at the bottom of the page there is a section on importing data for local development.

Deployed Environments

There are several steps to populating the database with historical Boost data, because we retrieve Boost data from multiple sources.

You can run all of these steps in sequence in a single command with the command:

./manage.py boost_setup

The boost_setup command will run all of the processes listed here:


# Import Boost releases
./manage.py import_versions
# import_versions also runs import_artifactory_release_data

# Import Boost libraries
./manage.py update_libraries

# Save which Boost releases include which libraries
./manage.py import_library_versions
# import_library_versions retrieves documentation urls, so boost_setup
# doesn't run import_library_version_docs_urls

# Save other data we need for Libraries and LibraryVersions
./manage.py update_maintainers
./manage.py update_authors
./manage.py import_commit_counts

Read more aboout these management commands.

Collectively, this is what these management commands accomplish:

  1. import_versions: Imports Boost releases as Version objects, and imports links to Boost downloads hosted on Artifactory.
  2. update_libraries: Imports Boost libraries and categories as Library and Category objects.
  3. import_library_versions: Establishes which Boost libraries are included in which Boost versions. That information is stored in LibraryVersion objects. This process also stores the link to the version-specific Boost documentation for this library.
  4. update_maintainers: For each LibraryVersion, saves the maintainers as User objects and makes sure they are associated with the LibraryVersion.
  5. update_authors: For each Library, saves the authors as User objects and makes sure they are associated with the Library.
  6. import_commit_counts: For each Library, uses information in the GitHub API to save the last 12 months of commit history. One CommitData object per library, per month is created to store the number of commits to the master branch of that library for that month.

Further Reading