Files
website-v2/docs/first_time_data_import.md
Brian Perrett 48c09d3d5e Import commits per release and create release reports (#1263)
View stats per release, we do this by
doing log diffs between release tags. Ex: `git log
boost-1.78.0..boost-1.79.0`. The output is parsed and the commits
are saved with a foreign key to the `LibraryVersion` it relates to.

- commits are imported by doing "bare" clones (no project files, only
git data) of repos into temporary directories, as created by python's
bulitin `tempfile.TemporaryDirectory`
- Added Commit model
- Added CommitAuthor model
- Added CommitAuthorEmail model
  - One CommitAuthor can have many emails.
- Added task for importing commits. (and admin link to trigger it)
- Added task for importing CommitAuthor github data (avatar and profile
url, with admin link to trigger it)
- Added a basic Library stat page which can be viewed by going to the
admin -> library -> view stats.
- Added a `Get Release Report` button in the `LibraryAdmin` which allows
a staff member to select a boost version and up to 8 libraries to
generate a report for. The report is just a webpage which attempts to
convert cleanly to a pdf using the browser's print to pdf functionality.
- Updated the Library Detail page to show commits per release instead of
per month.
- Updated the Library Detail page to show `Maintainers & Contributors`
sorted by maintainers, then the top contributors for the selected
release, then the top contributors overall by commits descending.
- Removed CommitData, which was tracking monthly commit stats
2024-09-25 15:09:07 -07:00

2.9 KiB

Populating the Database for the First Time

This document contains information about importing Boost Versions (also called Releases), Libraries, and the data associated with those objects. It is concerned with importing data in deployed environments, but at the bottom of the page there is a section on importing data for local development.

Deployed Environments

There are several steps to populating the database with historical Boost data, because we retrieve Boost data from multiple sources.

You can run all of these steps in sequence in a single command with the command:

./manage.py boost_setup

The boost_setup command will run all of the processes listed here:


# Import Boost releases
./manage.py import_versions
# import_versions also runs import_artifactory_release_data

# Import Boost libraries
./manage.py update_libraries

# Save which Boost releases include which libraries
./manage.py import_library_versions
# import_library_versions retrieves documentation urls, so boost_setup
# doesn't run import_library_version_docs_urls

# Save other data we need for Libraries and LibraryVersions
./manage.py update_maintainers
./manage.py update_authors
./manage.py import_commits

# Get the most recent beta release, and delete old beta releases
./manage.py import_beta_release --delete-versions

Read more aboout these management commands.

Collectively, this is what these management commands accomplish:

  1. import_versions: Imports Boost releases as Version objects, and imports links to Boost downloads hosted on Artifactory.
  2. update_libraries: Imports Boost libraries and categories as Library and Category objects.
  3. import_library_versions: Establishes which Boost libraries are included in which Boost versions. That information is stored in LibraryVersion objects. This process also stores the link to the version-specific Boost documentation for this library.
  4. update_maintainers: For each LibraryVersion, saves the maintainers as User objects and makes sure they are associated with the LibraryVersion.
  5. update_authors: For each Library, saves the authors as User objects and makes sure they are associated with the Library.
  6. import_commits: For each Library, iterate through the LibraryVersions and create Commit, CommitAuthor, and CommitAuthorEmail objects. Also attempts to update CommitAuthors with their github profile URL and Avatar URL.
  7. import_beta_release: Retrieves the most recent beta release from GitHub and imports it. If --delete-versions is passed, will delete the existing beta releases in the database.

Further Reading