Writing a package manager
Writing a package manager is not one of the most common programming tasks. After all, there are many out-of-the-box ones available. Yet, somehow I've found myself in exactly this situation.
How so?
I'm a big fan of SQLite and its extensions. Given the large number of such extensions in the wild, I wanted a structured approach to managing them. Which usually involves, well, a package manager. Except there is none for SQLite. So I decided to build one!
If you haven't seen them before, SQLite extensions are just libraries (
.dll
,.dylib
or.so
depending on the operating system). To make an extension work, you download it and load it into SQLite.
Needless to say, building a package manager is not an easy task. In fact, Sam Boyer has written a great article about the problems involved. So I won't going to dwell on it.
This article explains the design choices and implementation details that allowed me to actually build a working package manager in a couple of weeks (mostly evenings and nights, to be honest). I tried to leave out most of the SQLite specifics, so hopefully you can apply this approach to any package manager should you decide to build one.
Design decisions
Package management is complex by nature, and there is no magic bullet that will make it simple (unless you are willing to radically narrow the scope). So let's go through the building blocks together, tackling problems as they arise.
Spec file • Folder structure • Scope • Registry • Version • Latest version • Lockfile • Source of truth • Checksums • Dependencies • Install and update
Spec file
To work with packages, the manager needs some information about them. At least the package ID and the download location. So let's design a package spec file that describes a package.
Here is a simple one:
{
"owner": "sqlite",
"name": "stmt",
"assets": {
"path": "https://github.com/nalgeon/sqlean/releases/download/incubator",
"files": {
"darwin-amd64": "stmt.dylib",
"darwin-arm64": "stmt.dylib",
"linux-amd64": "stmt.so",
"windows-amd64": "stmt.dll"
}
}
}
owner
+ name
form a unique package identifier (we don't want any name conflicts, thank you very much, Python).
The assets.path
is a base URL for the package assets. The assets themselves are listed in the assets.files
. When the manager installs the package, it chooses the asset name according to the user's operating system, combines it with the assets.path
and downloads the asset.
> install sqlite/stmt
↓
download spec
┌───────────────┐
│ sqlite/stmt │
└───────────────┘
↓
check platform
↪ OS: darwin
↪ arch: arm64
↓
download asset
┌───────────────┐
│ stmt.dylib │
└───────────────┘
Good start!
Folder structure
Let's say there is a package hosted somewhere on GitHub. I tell the manager (called sqlpkg
from now on) to install it:
sqlpkg install sqlite/stmt
The manager downloads the package and stores it locally in a folder named .sqlpkg
:
.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
(sqlpkg.json
is the spec file and stmt.dylib
is the package asset)
Let's add another one:
sqlpkg install asg017/vss
.sqlpkg
├── asg017
│ └── vss
│ ├── sqlpkg.json
│ └── vss0.dylib
│
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
As you can probably see, given this folder structure, it's quite easy for the manager to reason about the installed packages.
For example, if I run sqlpkg update OWNER/NAME
, it does the following:
- Reads the spec file from the path
.sqlpkg/OWNER/NAME/sqlpkg.json
. - Downloads the latest asset using the
assets.path
from the spec. - Replaces the old
.dylib
with the new one.
When I run sqlpkg uninstall OWNER/NAME
, it deletes the corresponding directory.
And when I run sqlpkg list
, it searches for all paths that match .sqlpkg/*/*/sqlpkg.json
.
Simple, isn't it?
Project vs. global scope
Some package managers (e.g. npm
) use per-project scope by default, but also allow you to install packages globally using flags (npm install -g
). Others (e.g. brew
) use global scope.
I like the idea of allowing both project and global scope, but I do not like the flags approach. Why don't we apply a heuristic:
- If there is a
.sqlpkg
folder in the current directory, use project scope. - Otherwise, use global scope.
This way, if users don't need separate project environments, they will just run sqlpkg
as is and install packages in their home folder (e.g. ~/.sqlpkg
). Otherwise, they'll create a separate .sqlpkg
for each project (we can provide a helper init
command for this).
Project scope:
$ cd /my/project
$ sqlpkg init
$ sqlpkg install sqlite/stmt
$ tree .sqlpkg
.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
Global scope:
$ cd /some/other/path
$ sqlpkg install sqlite/stmt
$ tree ~/.sqlpkg
/Users/anton/.sqlpkg
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
No flags involved!
Package registry
For a package manager to be useful, it should support existing extensions (which, of course, are completely unaware of it at the moment). Maybe extension authors will eventually add spec files to their packages, maybe they won't — we can't rely on that.
So let's add a simple fallback algorithm. When the user runs sqlpkg install OWNER/NAME
, the manager does the following:
- Attempts to fetch the spec from the owner's GitHub repo
https://github.com/OWNER/NAME
. - If the spec is not found, fetches it from the package registry.
owner/name
↓
┌─────────────────┐ found ┌───────────┐
│ owner's repo │ → │ install │
└─────────────────┘ └───────────┘
↓ not found
┌─────────────────┐ found ┌───────────┐
│ pkg registry │ → │ install │
└─────────────────┘ └───────────┘
↓ not found
✗ error
The package registry is just another GitHub repo with a two-level owner/name structure:
pkg/
├── asg017
│ ├── fastrand.json
│ ├── hello.json
│ ├── html.json
│ └── ...
├── daschr
│ └── cron.json
├── dessus
│ ├── besttype.json
│ ├── fcmp.json
│ └── ...
├── ...
...
We'll bootstrap the registry with known packages, so the manager will work right out of the box. As package authors catch up and add sqlpkg.json
to their repos, the manager will gradually switch to using them instead of the registry.
The manager should also support links to specific GitHub repos (in case the repo has a different name than the package):
sqlpkg install github.com/asg017/sqlite-vss
And other URLs, because not everyone uses GitHub:
sqlpkg install https://antonz.org/downloads/stats.json
And also local paths:
sqlpkg install ./stats.json
All this "locator" logic complicates the design quite a bit. So if you are comfortable with requiring package authors to provide the specs, feel free to omit the fallback step and the registry altogether.
Version
What a package without a version, right? Let's add it:
{
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "{repository}/releases/download/{version}",
"files": {
"darwin-amd64": "vss-{version}-macos-x86_64.tar.gz",
"darwin-arm64": "vss-{version}-macos-aarch64.tar.gz",
"linux-amd64": "vss-{version}-linux-x86_64.tar.gz"
}
}
}
We also introduced variables like {repository}
and {version}
so package authors don't have to repeat themselves.
When updating a package, the manager must now compare local and remote versions according to the semantic versioning rules:
local spec │ remote spec
│
> update │
┌─────────────┐ │ ┌─────────────┐
│ v0.1.0 │ < │ v0.1.1 │
└─────────────┘ │ └─────────────┘
↓ │
updating... │
┌─────────────┐ │
│ v0.1.1 │ │
└─────────────┘ │
Nice!
Latest version
While not required, it would be nice to support the latest
version placeholder and automatically resolve it via API for GitHub-hosted packages:
{
"owner": "asg017",
"name": "vss",
"version": "latest",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "{repository}/releases/download/{version}",
"files": {
"darwin-amd64": "vss-{version}-macos-x86_64.tar.gz",
"darwin-arm64": "vss-{version}-macos-aarch64.tar.gz",
"linux-amd64": "vss-{version}-linux-x86_64.tar.gz"
}
}
}
This way, package authors don't have to change the spec when releasing a new version. When installing the package, the manager will fetch the latest version from GitHub:
local spec │ remote spec │ github api
│ │
> update │ │
┌─────────────┐ │ │
│ v0.1.0 │ │ │
└─────────────┘ │ │
↓ │ │
wait a sec... │ │
┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐
│ v0.1.0 │ ? │ latest │ → │ v0.1.1 │
└─────────────┘ │ └─────────────┘ │ └─────────────┘
↓ │ │
┌─────────────┐ │ ┌─────────────┐ │
│ v0.1.0 │ < │ v0.1.1 │ │
└─────────────┘ │ └─────────────┘ │
↓ │ │
updating... │ │
┌─────────────┐ │ │
│ v0.1.1 │ │ │
└─────────────┘ │ │
In this scenario, it's important to store the specific version in the local spec, not the "latest" placeholder. Otherwise, the manager won't be able to reason about the currently installed version when the user runs an update
command.
Lockfile
Having an .sqlpkg
folder with package specs and assets is enough to implement all manager commands. We can install, uninstall, update and list packages based on the .sqlpkg
data only.
.sqlpkg
├── asg017
│ └── vss
│ ├── sqlpkg.json
│ └── vss0.dylib
│
└── sqlite
└── stmt
├── sqlpkg.json
└── stmt.dylib
But what if the user wants to reinstall the packages on another machine or CI server? That's where the lockfile comes in.
The lockfile stores a list of all installed packages with just enough information to reinstall them if needed:
{
"packages": {
"asg017/vss": {
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"specfile": "https://github.com/nalgeon/sqlpkg/raw/main/pkg/asg017/vss.json",
"assets": {
// ...
}
},
"sqlite/stmt": {
"owner": "sqlite",
"name": "stmt",
"version": "",
"specfile": "https://github.com/nalgeon/sqlpkg/raw/main/pkg/sqlite/stmt.json",
"assets": {
// ...
}
}
}
}
The only new field here is the specfile
— it's a path to a remote spec file to fetch the rest of the package information (e.g. description, license, and authors).
Now the user can commit the lockfile along with the rest of the project, and run install
on another machine to install all the packages listed in the lockfile:
local spec │ lockfile │ remote spec
│ │
> install │ ┌─────────────┐ │ ┌─────────────┐
└─ (empty) → │ asg017/vss │ → │ asg017/vss │
│ │ sqlite/stmt │ │ └─────────────┘
│ └─────────────┘ │ ┌─────────────┐
┌─ ← ← │ sqlite/stmt │
installing... │ │ └─────────────┘
┌─────────────┐ │ │
│ asg017/vss │ │ │
└─────────────┘ │ │
┌─────────────┐ │ │
│ sqlite/stmt │ │ │
└─────────────┘ │ │
So far, so good.
Source of truth
Lockfile sounds like a no-brainer, but in fact it introduces a major problem — we no longer have a single source of truth for any given local package.
Let's consider one of the simpler commands — list
, which displays all installed packages. Previously, all it had to do was scan the .sqlpkg
for spec files:
> list
↓
glob .sqlpkg/*/*/sqlpkg.json
┌─────────────┐
│ asg017/vss │
│ sqlite/stmt │
└─────────────┘
But now we have two sources of package information — the .sqlpkg
folder and the lockfile. Imagine that for some reason they are out of sync:
local spec │ lockfile
│
> list │
↓ │
let's see... │
┌─────────────┐ │ ┌──────────────┐
│ asg017/vss │ │ │ asg017/vss │
└─────────────┘ │ │ nalgeon/text │
┌─────────────┐ │ └──────────────┘
│ sqlite/stmt │ │
└─────────────┘ │
↓ │
???
Instead of the simple "just list the .sqlpkg contents", we now have 4 possible situations for any given package:
- The package is listed in both .sqlpkg and the lockfile with the same version.
- The package is listed in both .sqlpkg and the lockfile, but the versions are different.
- The package is listed in .sqlpkg, but not in the lockfile.
- The package is listed in the lockfile, but not in .sqlpkg.
➊ is easy, but what should the manager do with ➋, ➌ and ➍?
Instead of coming up with clever conflict resolution strategies, let's establish the following ground rule:
There is a single source of truth, and it's the contents of the .sqlpkg folder.
This immediately solves all kinds of lockfile related problems. For the list
command example, we now only look in .sqlpkg
(as we did before) and then synchronize the lockfile with it, adding the missing packages if necessary:
local spec │ lockfile
│
> list │
↓ │
glob .sqlpkg/*/*/sqlpkg.json
┌─────────────┐ │ ┌─────────────┐
│ asg017/vss │ │ │ does not │
└─────────────┘ │ │ matter │
┌─────────────┐ │ └─────────────┘
│ sqlite/stmt │ │
└─────────────┘ │
↓ │
sync the lockfile
┌─────────────┐ │ ┌─────────────┐
│ asg017/vss │ → │ asg017/vss │
│ sqlite/stmt │ │ │ sqlite/stmt │
└─────────────┘ │ └─────────────┘
Phew.
Checksums
So far we've assumed that there will be no problems downloading remote package assets to the user's machine. And indeed, in most cases there will be none. But just to be sure that everything was downloaded correctly, we'd better check the asset's checksum.
Calculating the actual checksum of the downloaded asset is easy — we'll just use the SHA-256 algorithm. But we also need a value to compare it to — the expected asset checksum.
We can specify checksums right in the package spec file:
{
"owner": "asg017",
"name": "vss",
"version": "v0.1.1",
"repository": "https://github.com/asg017/sqlite-vss",
"assets": {
"path": "https://github.com/asg017/sqlite-vss/releases/download/v0.1.1",
"files": {
"darwin-amd64": "vss-macos-x86_64.tar.gz",
"darwin-arm64": "vss-macos-aarch64.tar.gz",
"linux-amd64": "vss-linux-x86_64.tar.gz"
},
"checksums": {
"vss-macos-x86_64.tar.gz": "sha256-a3694a...",
"vss-macos-aarch64.tar.gz": "sha256-04dc3c...",
"vss-linux-x86_64.tar.gz": "sha256-f9cc84..."
}
}
}
But this would require the package author to edit the spec after each release, since the checksums are not known in advance.
It's much better to provide the checksums in a separate file (e.g. checksums.txt
) that is auto-generated with each new release. Such a file is hosted along with other package assets:
https://github.com/asg017/sqlite-vss/releases/download/v0.1.1
├── checksums.txt
├── vss-macos-x86_64.tar.gz
├── vss-macos-aarch64.tar.gz
└── vss-linux-x86_64.tar.gz
When installing the package, the manager fetches checksums.txt
, injects it into the local spec file, and validates the downloaded asset checksum against the expected value:
local assets │ local spec │ remote assets
│ │
> install │ │ ┌──────────────────┐
└─ (empty) → (empty) → │ asg017/vss │
│ │ ├──────────────────┤
│ │ │ checksums.txt │
┌─ ← ┌─ ← │ macos-x86.tar.gz │
download asset │ save spec w/checksums │ └──────────────────┘
┌──────────────────┐ │ ┌──────────────────┐ │
│ macos-x86.tar.gz │ │ │ asg017/vss │ │
└──────────────────┘ │ ├──────────────────┤ │
↓ │ │ macos-x86.tar.gz │ │
calculate checksum │ │ sha256-a3694a... │ │
┌──────────────────┐ │ └──────────────────┘ │
│ sha256-a3694a... │ │ │
└──────────────────┘ │ │
↓ │ │
verify checksum │ │
↪ ✗ abort if failed │ asg017/vss │
┌──────────────────┐ │ ┌──────────────────┐ │
│ macos-x86.tar.gz │ │ │ macos-x86.tar.gz │ │
│ sha256-a3694a... │ = │ sha256-a3694a... │ │
└──────────────────┘ │ └──────────────────┘ │
↓ │ │
install asset │ │
┌──────────────────┐ │ │
│ vss0.dylib │ │ │
└──────────────────┘ │ │
✓ done!
If the remote package is missing checksums.txt
, the manager can warn the user or even refuse to install such a package.
Package dependencies
Okay, it's time to talk about the elephant in the room — package dependencies.
A package dependency is when package A depends on package B:
┌─────┐ ┌─────┐
│ A │ ──> │ B │
└─────┘ └─────┘
A transitive dependency is when package A depends on B, and B depends on C, so A depends on C:
┌─────┐ ┌─────┐ ┌─────┐
│ A │ ──> │ B │ ──> │ C │
└─────┘ └─────┘ └─────┘
Dependencies, especially transitive ones, are a major headache (read Sam's article if you don't believe me). Fortunately, in the SQLite world, extensions are usually self-contained and don't depend on other extensions. So getting rid of the dependency feature is an obvious choice. It radically simplifies things.
┌─────┐ ┌─────┐ ┌─────┐
│ A │ │ B │ │ C │
└─────┘ └─────┘ └─────┘
Since all packages are independent, the manager can install and update them individually without worrying about version conflicts.
I understand that dropping dependencies altogether may not be something you are ready to accept. But all the other building blocks we've discussed are still relevant regardless of the dependency handling strategy, so let's leave it at that.
Install and update
Now that we've seen all the building blocks, let's look at two of the most complex commands: install
and update
.
Suppose I tell the manager to install the asg017/vss
package:
local spec │ lockfile │ remote spec
│ │
> install asg017/vss │
↓ │ │
read remote spec │ │ ┌─────────────┐
└─ → → → │ asg017/vss │
│ │ │ latest │
│ │ └─────────────┘
│ │ ↓
│ │ resolve version
│ │ ┌─────────────┐
│ │ │ asg017/vss │
┌─ ← ← ← │ v0.1.0 │
download spec │ │ └─────────────┘
┌─────────────┐ │ │
│ asg017/vss │ │ │
│ v0.1.0 │ │ │
└─────────────┘ │ │
↓ │ │
download assets │ │
validate checksums │
↪ ✗ abort if failed │
↓ │ │
install assets │ │
┌─────────────┐ │ │
│ vss0.dylib │ │ │
└─────────────┘ │ │
└─ → add to lockfile │
│ ┌─────────────┐ │
│ │ asg017/vss │ │
│ │ v0.1.0 │ │
│ └─────────────┘ │
✓ done!
Now let's say I heard there was a new release, so I tell the manager to update the package:
local spec │ lockfile │ remote spec
│ │
> update asg017/vss │
↓ │ │
read local spec │ │
↪ abort if failed │
┌─────────────┐ │ ┌─────────────┐ │
│ asg017/vss │ │ │ does not │ │
│ v0.1.0 │ │ │ matter │ │
└─────────────┘ │ └─────────────┘ │
↓ │ │
read remote spec │ │
resolve version │ │ ┌─────────────┐
└─ → → → │ asg017/vss │
┌─ ← ← ← │ v0.1.1 │
has new version? │ │ └─────────────┘
↪ ✗ abort if not │
┌─────────────┐ │ │ ┌─────────────┐
│ v0.1.0 │ < is less than < │ v0.1.1 │
└─────────────┘ │ │ └─────────────┘
↓ │ │
download assets │ │
validate checksums │
↪ ✗ abort if failed │
↓ │ │
install assets │ │
add to lockfile │ │
┌─────────────┐ │ ┌─────────────┐ │
│ asg017/vss │ → │ asg017/vss │ │
│ v0.1.1 │ │ │ v0.1.1 │ │
└─────────────┘ │ └─────────────┘ │
┌─────────────┐ │ │
│ vss0.dylib │ │ │
└─────────────┘ │ │
✓ done!
Not so complicated after all, huh?
Implementation details
I've written the package manager in Go. I believe Go is a great choice: not only is it reasonably fast and compiles to native code, but it's also the simplest of the mainstream languages. So I think you'll be able to easily follow the code even if you don't know Go. Also, porting the code to another language should not be a problem.
Another benefit of using Go is it's well thought out standard library. It allowed me to implement the whole project with zero dependencies, which is always nice.
spec • assets • checksums • lockfile • cmd • top-level
spec
package
The spec
package provides data structures and functions related to the spec file.
spec
┌─────────────────────────────────────┐
│ Package{} Read() Dir() │
│ Assets{} ReadLocal() Path() │
│ AssetPath{} ReadRemote() │
└─────────────────────────────────────┘
The spec file and its associated data structures are the heart of the system:
// A Package describes the package spec.
type Package struct {
Owner string
Name string
Version string
Homepage string
Repository string
Specfile string
Authors []string
License string
Description string
Keywords []string
Symbols []string
Assets Assets
}
// Assets are archives of package files, each for a specific platform.
type Assets struct {
Path *AssetPath
Pattern string
Files map[string]string
Checksums map[string]string
}
// An AssetPath describes a local file path or a remote URL.
type AssetPath struct {
Value string
IsRemote bool
}
We've already discussed the most important Package
fields in the Design section. The rest (Homepage
, Authors
, License
, and so on) provide additional package metadata.
The Package
structure provides the basic spec management methods:
ExpandVars
substitutes variables inAssets
with real values.ReplaceLatest
forces a specific package version instead of the "latest" placeholder.AssetPath
determines the asset url for a specific platform (OS + architecture).Save
writes the package spec file to the specified directory.
Assets.Pattern
provides a way to selectively extract files from the archive. It accepts a glob pattern. For example, if the package asset contains many libraries, and we only want to extract the text
one, the Assets.Pattern
would be text.*
.
The Read
family of functions loads the package spec file from the specified path (either local or remote).
Finally, the Dir
and Path
functions return the directory and spec file path of the installed package.
assets
package
The assets
package provides functions for managing the actual package assets.
assets
┌──────────────────────┐
│ Asset{} Download() │
│ Copy() │
│ Unpack() │
└──────────────────────┘
An Asset
is a binary file or an archive of package files for a particular platform:
type Asset struct {
Name string
Path string
Size int64
Checksum []byte
}
The Asset
provides a Validate
method that checks the asset's checksum against the provided checksum string.
The Download
, Copy
and Unpack
package-level functions perform corresponding actions on the asset.
The assets
and spec
packages are independent, but both are used by a higher level cmd
package, which we'll discuss later.
checksums
package
The checksums
package has one job — it loads asset checksums from a file (checksums.txt
) into a map (which can be assigned to the spec.Package.Assets.Checksums
field).
checksums
┌──────────┐
│ Exists() │
│ Read() │
└──────────┘
Exists
checks if a checksum file exists in the given path. Read
loads checksums from a local or remote file into a map, where keys are filenames and values are checksums. Pretty simple stuff.
Similar to assets
, checksums
and spec
packages are independent, but both are used by a higher level cmd
package.
lockfile
package
Just like the spec
package works with the spec file, the lockfile
works with the lockfile.
lockfile
┌──────────────────────────┐
│ Lockfile{} ReadLocal() │
│ Path() │
└──────────────────────────┘
A Lockfile
describes a collection of installed packages:
type Lockfile struct {
Packages map[string]*spec.Package
}
It has a bunch of package-related methods:
Has
checks if a package is in the lockfile.Add
adds a package to the lockfile.Remove
removes a package from the lockfile.Range
iterates over packages from the lockfile.Save
writes the lockfile to the specified directory.
Since the Lockfile
is always local, there is only one read function — ReadLocal
. The Path
function returns the path to the lockfile.
The lockfile
package depends on the spec
:
┌──────────┐ ┌──────────┐
│ lockfile │ → │ spec │
└──────────┘ └──────────┘
cmd
package
The cmd
package provides command steps — the basic building blocks for top-level commands like install
or update
.
cmd
┌─────────────────────────────────────────────────────────────────────────────┐
│ assets spec lockfile version │
├─────────────────────────────────────────────────────────────────────────────┤
│ BuildAssetPath ReadSpec ReadLockfile ResolveVersion │
│ DownloadAsset FindSpec AddToLockfile HasNewVersion │
│ ValidateAsset ReadInstalledSpec RemoveFromLockfile │
│ UnpackAsset ReadChecksums │
│ InstallFiles │
│ DequarantineFiles │
└─────────────────────────────────────────────────────────────────────────────┘
Each step falls into a specific domain category, such as "assets" or "spec".
Steps use the spec
, assets
and lockfile
packages we've discussed earlier. Let's look at the DownloadAsset
step for example (error handling omitted for brevity):
// DownloadAsset downloads the package asset.
func DownloadAsset(pkg *spec.Package, assetPath *spec.AssetPath) *assets.Asset {
logx.Debug("downloading %s", assetPath)
dir := spec.Dir(os.TempDir(), pkg.Owner, pkg.Name)
fileio.CreateDir(dir)
var asset *assets.Asset
if assetPath.IsRemote {
asset = assets.Download(dir, assetPath.Value)
} else {
asset = assets.Copy(dir, assetPath.Value)
}
sizeKb := float64(asset.Size) / 1024
logx.Debug("downloaded %s (%.2f Kb)", asset.Name, sizeKb)
return asset
}
I think it's pretty obvious what's going on here: we create a temporary directory and then download (or copy) the asset file into it.
The logx
and fileio
packages provide helper functions for logging and working with the file system. There are also httpx
for HTTP and github
for GitHub API calls.
Let's look at another one — HasNewVersion
:
// HasNewVersion checks if the remote package is newer than the installed one.
func HasNewVersion(remotePkg *spec.Package) bool {
installPath := spec.Path(WorkDir, remotePkg.Owner, remotePkg.Name)
if !fileio.Exists(installPath) {
return true
}
installedPkg := spec.ReadLocal(installPath)
logx.Debug("local package version = %s", installedPkg.Version)
if installedPkg.Version == "" {
// not explicitly versioned, always assume there is a later version
return true
}
if installedPkg.Version == remotePkg.Version {
return false
}
return semver.Compare(installedPkg.Version, remotePkg.Version) < 0
}
It's pretty simple, too: we load the locally installed spec file and compare its version with the version from the remote spec. The semver
helper package does the actual comparison.
The cmd
package depends on all the packages we've already discussed:
┌────────────────────────────────────────────────┐
│ cmd │
└────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌──────────┐ ┌──────┐ ┌────────┐ ┌───────────┐
│ lockfile │ → │ spec │ │ assets │ │ checksums │
└──────────┘ └──────┘ └────────┘ └───────────┘
Command packages
There is a top-level package for each package manager command:
cmd/install
installs packages.cmd/update
updates installed packages.cmd/uninstall
removes installed packages.cmd/list
shows installed packages.cmd/info
shows package information.- and so on.
Let's look at one of the most complex commands — update
(error handling omitted for brevity):
func Update(args []string) {
fullName := args[0]
installedPkg := cmd.ReadLocal(fullName)
pkg := cmd.ReadSpec(installedPkg.Specfile)
cmd.ResolveVersion(pkg)
if !cmd.HasNewVersion(pkg) {
return
}
cmd.ReadChecksums(pkg)
assetUrl := cmd.BuildAssetPath(pkg)
asset := cmd.DownloadAsset(pkg, assetUrl)
cmd.ValidateAsset(pkg, asset)
cmd.UnpackAsset(pkg, asset)
cmd.InstallFiles(pkg, asset)
cmd.DequarantineFiles(pkg)
lck := cmd.ReadLockfile()
cmd.AddToLockfile(lck, pkg)
}
Thanks to the building blocks in the cmd
package, the update logic has become straightforward and self-explanatory. Just a linear sequence of steps with a single "does it have a new version?" branch.
Here is a complete package diagram (some arrows omitted to make it less noisy):
┌─────────┐ ┌────────┐ ┌───────────┐ ┌──────┐
│ install │ │ update │ │ uninstall │ │ list │ ...
└─────────┘ └────────┘ └───────────┘ └──────┘
↓ ↓ ↓ ↓
┌─────────────────────────────────────────────────┐
│ cmd │
└─────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌──────────┐ ┌──────┐ ┌────────┐ ┌───────────┐
│ lockfile │ → │ spec │ │ assets │ │ checksums │
└──────────┘ └──────┘ └────────┘ └───────────┘
┌────────┐ ┌────────┐ ┌───────┐ ┌──────┐ ┌────────┐
│ fileio │ │ github │ │ httpx │ │ logx │ │ semver │
└────────┘ └────────┘ └───────┘ └──────┘ └────────┘
And that's it!
Summary
We've explored design choices for a simple general-purpose package manager:
- A package spec file describing a package.
- A hierarchical owner/name folder structure for installed packages.
- Project and global scope for installed packages.
- Spec file locator with fallback to the package registry.
- Versioning and latest versions.
- The lockfile and single source of truth.
- Asset checksums.
- Package dependencies (or lack thereof).
We've also explored implementation details in Go:
spec
package with data structures and functions related to the spec file.assets
package for managing the package assets.checksums
package for loading asset checksums from a file.lockfile
package for working with the lockfile.cmd
package with basic building blocks for top-level commands.- top-level packages for individual commands.
Thanks for reading! I hope you'll find this article useful if you ever need to implement a package manager (or parts of it).
★ Subscribe to keep up with new posts.