Anton ZhiyanovEverything about Go, SQL, and software in general.https://antonz.org/https://antonz.org/assets/favicon/favicon.pngAnton Zhiyanovhttps://antonz.org/Hugo -- gohugo.ioen-usSat, 23 Mar 2024 11:30:00 +0000Grep by example: Interactive guidehttps://antonz.org/grep-by-example/Sat, 23 Mar 2024 11:30:00 +0000https://antonz.org/grep-by-example/Interactive introduction to grep with real-world use cases.grep is the ultimate text search tool available on virtually all Linux machines. While there are now better alternatives (such as ripgrep), you will still often find yourself on a server where grep is the only search tool available. So it's nice to have a working knowledge of it.

That's why is I've created this interactive step-by-step guide to grep operations. You can read it from start to finish to (hopefully) learn more about grep, or jump to a specific use case that interests you.

Feel free to experiment with the examples by changing the commands and clicking Run.

Basics · Recursive search · Search options · Output options · Final thoughts

This guide is also available in other formats:

Basics

Basically, grep works like this:

  • You give it a search pattern and a file.
  • grep reads the file line by line, printing the lines that match the pattern and ignoring others.

Let's look at an example. We'll search the httpurr source code, which I've already downloaded to the /opt/httpurr directory like this:

cd /opt
curl -OL https://github.com/rednafi/httpurr/archive/refs/tags/v0.1.2.tar.gz
tar xvzf v0.1.2.tar.gz
mv httpurr-0.1.2 httpurr
cd httpurr

Search in file · Matches · Regular expressions · Fixed strings · Multiple patterns

Search in file

Let's find all occurrences of the word codes in README.md:

grep -n codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

grep read the contents of README.md, and for each line that contained codes, grep printed it to the terminal.

grep also included the line number for each line, thanks to the -n (--line-number) option.

Not all grep versions support the long option syntax (e.g. --line-number). If you get an error using the long version, try the short one (e.g. -n) — it may work fine.

Matches

grep uses partial matches by default:

grep -n descr README.md
81:* Display the description of a status code:
127:		  Print the description of an HTTP status code

The word description matches the descr search pattern.

To search for whole words instead, use the -w (--word-regexp) option:

grep -n --word-regexp code README.md
81:* Display the description of a status code:
84:	httpurr --code 410
94:	The HyperText Transfer Protocol (HTTP) 410 Gone client error response code
99:	code should be used instead.
126:	  -c, --code [status code]
127:		  Print the description of an HTTP status code

grep found strings containing the word code, but not codes. Try removing --word-regexp and see how the results change.

When using multiple short options, you can combine them like this: grep -nw code README.md. This gives exactly the same result as using the separate options (-n -w).

To search for whole lines instead of partial matches of whole words, use the -x (--line-regexp) option:

grep -n --line-regexp end httpurr.rb
47:end

Try removing --line-regexp and see how the results change.

Regular expressions

To make grep use regular expressions (Perl-compatible regular expressions in grep terminology), use the -P (--perl-regexp) option.

Let's find all lines with a word that contains res followed by other letters:

grep -Pn 'res\w+' README.md
94:	The HyperText Transfer Protocol (HTTP) 410 Gone client error response code
95:	indicates that access to the target resource is no longer available at the
152:of the rest.

\w+ means "one or more word-like characters" (e.g. letters like p or o, but not punctuation like . or !), so response, resource, and rest all match.

Regular expression dialects in grep

Without --perl-regexp, grep treats the search pattern as something called a basic regular expression. While regular expressions are quite common in the software world, the basic dialect is really weird, so it's better not to use it at all.

Another dialect supported by grep is extended regular expressions. You can use the -E (--extended-regexp) option to enable them. Extended regular expressions are almost like normal regular expressions, but not quite. So I wouldn't use them either.

Some grep versions do not support --perl-regexp. For those, --extended-regexp is the best you can get.

Suppose we are only interested in 4 letter words starting with res:

grep -Pn 'res\w\b' README.md
152:of the rest.

\b means "word boundary" (e.g. a space, a punctuation character, or the end of a line), so rest matches, but response and resource don't.

Finally, let's search for 3-digit numbers (showing first 10 matches with head):

grep -Pn '\d\d\d' README.md | head
45:	100    Continue
46:	101    Switching Protocols
47:	102    Processing
48:	103    Early Hints
69:	200    OK
70:	201    Created
71:	202    Accepted
72:	203    Non-Authoritative Information
73:	204    No Content
74:	205    Reset Content

A full tutorial on regular expressions is beyond the scope of this guide, but grep's "Perl-compatible" syntax is documented in the PCRE2 manual.

Fixed strings

What if we want to search for a literal string instead of a regular expression? Suppose we are interested in a word code followed by a dot:

grep -Pn 'code.' src/data.go | head
8:The HTTP 100 Continue informational status response code indicates that
14:status code in response before sending the body.
31:The HTTP 101 Switching Protocols response code indicates a protocol to which the
53:Deprecated: This status code is deprecated. When used, clients may still accept
56:The HTTP 102 Processing informational status response code indicates to client
59:This status code is only sent if the server expects the request to take
112:The HTTP 200 OK success status response code indicates that the request has
141:The HTTP 201 Created success status response code indicates that the request has
149:The common use case of this status code is as the result of a POST request.
165:The HTTP 202 Accepted response status code indicates that the request has been

Since . means "any character" in regular expressions, our pattern also matches code , codes and other cases we are not interested in.

To treat the pattern as a literal string, use the -F (--fixed-strings) option:

grep -Fn 'code.' src/data.go
197:to responses with any status code.
283:Browsers accessing web pages will never encounter this status code.
695:to an error code.
1027:erroneous cases it happens, they will handle it as a generic 400 status code.
1051:Regular web servers will normally not return this status code. But some other
1418:then the server responds with the 510 status code.

Much better!

Multiple patterns

To search for multiple patterns, list them with the -e (--regexp) option. grep will output lines matching at least one of the specified patterns.

For example, search for make or run:

grep -En -e make -e run README.md
139:* Go to the root directory and run:
141:	make init
145:	make lint
149:	make test

Unfortunately, grep can't use Perl-compatible regular expressions (-P) with multiple patterns. So we are stuck with the extended (-E) dialect.

If you have many patterns, it may be easier to put them in a file and point grep to it with -f (--file):

echo 'install' > /tmp/patterns.txt
echo 'make' >> /tmp/patterns.txt
echo 'run' >> /tmp/patterns.txt

grep -En --file=/tmp/patterns.txt README.md
13:* On MacOS, brew install:
17:	    && brew install httpurr
20:* Or elsewhere, go install:
23:	go install github.com/rednafi/httpurr/cmd/httpurr
139:* Go to the root directory and run:
141:	make init
145:	make lint
149:	make test

grep searches directories recursively when called with the -r (--recursive) option.

Search in directory · File globs · Binary files

Search in directory

Let's find all unexported functions (they start with a lowercase letter):

grep -Pnr 'func [a-z]\w+' .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {

This search returned matches from both the cmd and src directories. If you are only interested in cmd, specify it instead of .:

grep -Pnr 'func [a-z]\w+' cmd
cmd/httpurr/main.go:12:func main() {

To search multiple directories, list them all like this:

grep -Pnr 'func [a-z]\w+' cmd src
cmd/httpurr/main.go:12:func main() {
src/cli.go:16:func formatStatusText(text string) string {
src/cli.go:21:func printHeader(w *tabwriter.Writer) {
src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {

File globs

Let's search for httpurr:

grep -Pnr --max-count=5 httpurr .
./README.md:2:    <h1>ᗢ httpurr</h1>
./README.md:16:	brew tap rednafi/httpurr https://github.com/rednafi/httpurr \
./README.md:17:	    && brew install httpurr
./README.md:23:	go install github.com/rednafi/httpurr/cmd/httpurr
./README.md:33:	httpurr --list
./cmd/httpurr/main.go:4:	"github.com/rednafi/httpurr/src"
./go.mod:1:module github.com/rednafi/httpurr
./httpurr.rb:7:  homepage "https://github.com/rednafi/httpurr"
./httpurr.rb:12:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_x86_64.tar.gz"
./httpurr.rb:16:        bin.install "httpurr"
./httpurr.rb:20:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_arm64.tar.gz"
./httpurr.rb:24:        bin.install "httpurr"
./src/cli.go:24:	fmt.Fprintf(w, "\nᗢ httpurr\n")
./src/cli_test.go:64:	want := "\nᗢ httpurr\n==========\n\n"

Note that I have limited the number of results per file to 5 with the -m (--max-count) option to keep the results readable in case there are many matches.

Quite a lot of results. Let's narrow it down by searching only in .go files:

grep -Pnr --include='*.go' httpurr .
./cmd/httpurr/main.go:4:	"github.com/rednafi/httpurr/src"
./src/cli.go:24:	fmt.Fprintf(w, "\nᗢ httpurr\n")
./src/cli_test.go:64:	want := "\nᗢ httpurr\n==========\n\n"

The --include option (there is no short version) takes a glob (filename pattern), typically containing a fixed part (.go in our example) and a wildcard * ("anything but the path separator").

Another example — search in files named http-something:

grep -Pnr --include='http*' httpurr .
./httpurr.rb:7:  homepage "https://github.com/rednafi/httpurr"
./httpurr.rb:12:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_x86_64.tar.gz"
./httpurr.rb:16:        bin.install "httpurr"
./httpurr.rb:20:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_arm64.tar.gz"
./httpurr.rb:24:        bin.install "httpurr"
./httpurr.rb:31:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Linux_arm64.tar.gz"
./httpurr.rb:35:        bin.install "httpurr"
./httpurr.rb:39:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Linux_x86_64.tar.gz"
./httpurr.rb:43:        bin.install "httpurr"

To negate the glob, use the --exclude option. For example, search everywhere except the .go files:

grep -Pnr --exclude '*.go' def .
./.goreleaser.yml:1:# This is an example .goreleaser.yml file with some sensible defaults.
./httpurr.rb:15:      def install
./httpurr.rb:21:      sha256 "82acefd1222f6228636f2cda6518e0316f46624398adc722defb55c68ac3bb30"
./httpurr.rb:23:      def install
./httpurr.rb:34:      def install
./httpurr.rb:42:      def install

To apply multiple filters, specify multiple glob options. For example, find all functions except those in test files:

grep -Pnr --include '*.go' --exclude '*_test.go' 'func ' .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {
./src/cli.go:123:func Cli(w *tabwriter.Writer, version string, exitFunc func(int)) {

Binary files

By default, grep does not ignore binary files:

grep -Pnr aha .
grep: ./data.bin: binary file matches

Most of the time, this is probably not what you want. If you're searching in a directory that might contain binary files, it's better to ignore them altogether with the -I (--binary-files=without-match) option:

grep -Pnr --binary-files=without-match aha .
(not found)

If for some reason you want grep to search binary files and print the actual matches (as it does with text files), use the -a (--text) option.

Search options

grep supports a couple of additional search options you may find handy.

Ignore case · Inverse matching

Ignore case

Let's find all occurrences of the word codes in README.md:

grep -Pnr codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

It returns codes matches, but not Codes because grep is case-sensitive by default. To change this, use -i (--ignore-case):

grep -Pnr --ignore-case codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
40:	Status Codes
54:* Filter the status codes by categories:
64:	Status Codes
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

Inverse matching

To find lines that do not contain the pattern, use -v (--invert-match). For example, find the non-empty lines without the @ symbol:

grep -Enr --invert-match -e '@' -e '^$' Makefile
1:.PHONY: lint
2:lint:
8:.PHONY: lint-check
9:lint-check:
14:.PHONY: test
15:test:
20:.PHONY: clean
21:clean:
27:.PHONY: init
28:init:

Output options

grep supports a number of additional output options you may find handy.

Count matches · Limit matches · Show matches only · Show files only · Show context · Silent mode · Colors

Count matches

To count the number of matched lines (per file), use -c (--count). For example, count the number of functions in each .go file:

grep -Pnr --count --include '*.go' 'func ' .
./cmd/httpurr/main.go:1
./src/cli.go:5
./src/cli_test.go:10
./src/data_test.go:2

Note that --count counts the number of lines, not the number of matches. For example, there are 6 words string in src/cli.go, but two of them are on the same line, so --count reports 5:

grep -nrw --count string src/cli.go
5

Limit matches

To limit the number of matching lines per file, use the -m (--max-count) option:

grep -Pnrw --max-count=5 func .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {
./src/cli.go:123:func Cli(w *tabwriter.Writer, version string, exitFunc func(int)) {
./src/cli_test.go:15:func TestFormatStatusText(t *testing.T) {
./src/cli_test.go:54:func TestPrintHeader(t *testing.T) {
./src/cli_test.go:71:func TestPrintStatusCodes(t *testing.T) {
./src/cli_test.go:159:		t.Run(want, func(t *testing.T) {
./src/cli_test.go:168:func TestPrintStatusText(t *testing.T) {
./src/data_test.go:9:func TestStatusCodes(t *testing.T) {
./src/data_test.go:99:func TestStatusCodeMap(t *testing.T) {

With --max-count=N, grep stops searching the file after finding the first N matching lines (or non-matching lines if used with --invert-match).

Show matches only

By default, grep prints the entire line containing the match. To show only the matching part, use -o (--only-matching).

Suppose we want to find functions named print-something:

grep -Pnr --only-matching --include '*.go' 'func print\w+' .
./src/cli.go:21:func printHeader
./src/cli.go:35:func printStatusCodes
./src/cli.go:105:func printStatusText

The results are much cleaner than without --only-matching (try removing the option in the above command and see for yourself).

Show files only

If there are too many matches, you may prefer to show only the files where the matches occurred. Use -l (--files-with-matches) to do this:

grep -Pnr --files-with-matches 'httpurr' .
./README.md
./cmd/httpurr/main.go
./go.mod
./httpurr.rb
./src/cli.go
./src/cli_test.go

Show context

Let's search for GitHub action jobs:

grep -Pnr 'jobs:' .github/workflows
.github/workflows/automerge.yml:8:jobs:
.github/workflows/lint.yml:11:jobs:
.github/workflows/release.yml:10:jobs:
.github/workflows/test.yml:11:jobs:

These results are kind of useless, because they don't return the actual job name (which is on the next line after jobs). To fix this, let's use -C (--context), which shows N lines around each match:

grep -Pnr --context=1 'jobs:' .github/workflows
.github/workflows/automerge.yml-7-
.github/workflows/automerge.yml:8:jobs:
.github/workflows/automerge.yml-9-  dependabot:
--
.github/workflows/lint.yml-10-
.github/workflows/lint.yml:11:jobs:
.github/workflows/lint.yml-12-  golangci:
--
.github/workflows/release.yml-9-
.github/workflows/release.yml:10:jobs:
.github/workflows/release.yml-11-  goreleaser:
--
.github/workflows/test.yml-10-
.github/workflows/test.yml:11:jobs:
.github/workflows/test.yml-12-  test:

It might be even better to show only the next line after the match, since we are not interested in the previous one. Use -A (--after-context) for this:

grep -Pnr --after-context=1 'jobs:' .github/workflows
.github/workflows/automerge.yml:8:jobs:
.github/workflows/automerge.yml-9-  dependabot:
--
.github/workflows/lint.yml:11:jobs:
.github/workflows/lint.yml-12-  golangci:
--
.github/workflows/release.yml:10:jobs:
.github/workflows/release.yml-11-  goreleaser:
--
.github/workflows/test.yml:11:jobs:
.github/workflows/test.yml-12-  test:

There is also -B (--before-context) for showing N lines before the match.

Nice!

Silent mode

Sometimes you just want to know if a file contains a certain string; you don't care about the number or positions of the matches.

To make grep quit immediately after the first match and not print anything, use the -q (--quiet or --silent) option. Use the return code ($?) to see if grep found anything (0 — found, 1 — not found):

grep -Pnrw --quiet main cmd/httpurr/main.go
if [ $? = "0" ]; then echo "found!"; else echo "not found"; fi
found!

Try changing the search pattern from main to Main and see how the results change.

When searching in multiple files with --quiet, grep stops after the first match in any file and does not check other files:

grep -Pnrw --quiet main .
if [ $? = "0" ]; then echo "found!"; else echo "not found"; fi
found!

Colors

To highlight matches and line numbers, use the --color=always option:

grep -Pnr --color=always codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

Use --color=auto to let grep decide whether to use colors based on your terminal. Use --color=never to force no-color mode.

Final thoughts

That's it! We've covered just about everything grep can do. Unfortunately, it doesn't support replacing text, reading options from a configuration file, or other fancy features provided by grep alternatives like ack or ripgrep. But grep is still quite powerful, as you can probably see now.

Use grep --help to quickly see all supported options and see the official guide for option descriptions.

Have fun grepping!

──

P.S. Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.

]]>
Git by example: Interactive guidehttps://antonz.org/git-by-example/Thu, 14 Mar 2024 10:30:00 +0000https://antonz.org/git-by-example/Interactive introduction to Git with real-world use cases.Git is the distributed version control system used in software development today. It's very powerful, but also known for its not-so-obvious syntax.

I got tired of googling the same Git commands over and over again. So I created an interactive step-by-step guide to Git operations, from basic to advanced. You can read it from start to finish to (hopefully) learn more about Git, or jump to a specific use case that interests you.

Feel free to experiment with the examples by changing the commands and clicking Run.

ConceptsBasicsBranch & mergeLocal & remoteUndoAdvancedFinal thoughts

This guide is also available in other formats:

Concepts

This is the only piece of theory in the guide. I'll keep it short and simplified to the π == 3 level. Please don't judge me if you're a Git master.

Working tree, staging area, repository

┌──────────────┐         ┌──────────────┐
│ local        │ push ─> │ remote       │
│ repo         │ <- pull │ repo         │
└──────────────┘         └──────────────┘
check │  ↑↓ commit / reset
out   │ ┌──────────────┐
      │ │ staging area │
      │ └──────────────┘
      ▽  ↑↓ add / restore
┌──────────────┐
│ working tree │
│ .            │
│ ├── go.mod   │
│ └── main.go  │
└──────────────┘

A working tree is the slice of the project at any given moment (usually it's the current moment). When you add or edit code, you change the working tree.

A staging area is where you stage the changes from the working tree before making them permanent.

A repo (repository) is the collection of permanent changes (commits) made throughout the history of the project. Typically, there is a single remote repo (managed by GitHub/GitLab/etc) and many local repos — one for each developer involved in a project.

When you make a change in the staging area permanent, it is removed from the staging area and committed to the local repo. A commit is the permanent record of that change. The repo contains all the commits that have been made.

When you checkout a specific commit, the working tree is updated to reflect the project state at the time of that commit.

Local and remote repos are frequently synchronized so that all repos contain all commits from all developers.

Branch, tag, HEAD

      main             ○ v1.1
feat-2 │               │
      ╲│               •
       │ feat-1        │
       │╱              ○ v1.0
       │               │

A branch is an alternate version of the project reality. Typically, there is a main branch, and separate branches for features under development. When work on a feature branch is complete, it is merged into the main branch (or discarded).

A tag is a named state of the project. Typically, tags are created on the main branch for important milestones such as releases.

The currently checked-out commit (usually the latest commit in a branch) is referenced as HEAD.

Now that the boring stuff is out of the way, let's get to the recipes!

Basics

Let's start with basic Git operations on a local repo.

init repoadd fileedit filerename filedelete fileshow statusshow logshow commitsearch

Init repo

Create an empty repo:

git init
Initialized empty Git repository in /tmp/repo/.git/

Set user name and email for the repo (they are required):

git config user.email alice@example.com
git config user.name "Alice Zakas"
ok

Use the --global flag to set the name and email at the OS user level instead of the repo level.

Show user and repo configs:

git config --list --show-origin
file:/sandbox/.gitconfig	user.email=sandbox@example.com
file:/sandbox/.gitconfig	user.name=sandbox
file:/sandbox/.gitconfig	init.defaultbranch=main
file:.git/config	core.repositoryformatversion=0
file:.git/config	core.filemode=true
file:.git/config	core.bare=false
file:.git/config	core.logallrefupdates=true
file:.git/config	user.email=alice@example.com
file:.git/config	user.name=Alice Zakas

Add file

Create a file and add it to the staging area:

echo "git is awesom" > message.txt
git add message.txt
ok

View changes in the staging area:

git diff --cached
diff --git a/message.txt b/message.txt
new file mode 100644
index 0000000..0165e86
--- /dev/null
+++ b/message.txt
@@ -0,0 +1 @@
+git is awesom

Commit to the local repo:

git commit -m "add message"
[main (root-commit) 3a2bd8f] add message
 1 file changed, 1 insertion(+)
 create mode 100644 message.txt

Edit file

Edit the previously committed file:

echo "git is awesome" > message.txt
ok

View local changes:

git diff
diff --git a/message.txt b/message.txt
index 0165e86..118f108 100644
--- a/message.txt
+++ b/message.txt
@@ -1 +1 @@
-git is awesom
+git is awesome

Add modified files and commit in one command:

git commit -am "edit message"
[main ecdeb79] edit message
 1 file changed, 1 insertion(+), 1 deletion(-)

Note that -a does not add new files, only changes to the already committed files.

Rename file

Rename the previously committed file:

git mv message.txt praise.txt
ok

The change is already in the staging area, so git diff won't show it. Use --cached:

git diff --cached
diff --git a/message.txt b/praise.txt
similarity index 100%
rename from message.txt
rename to praise.txt

Commit the change:

git commit -m "rename message.txt"
[main d768287] rename message.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename message.txt => praise.txt (100%)

Delete file

Delete the previously committed file:

git rm message.txt
rm 'message.txt'

The change is already in the staging area, so git diff won't show it. Use --cached:

git diff --cached
diff --git a/message.txt b/message.txt
deleted file mode 100644
index 0165e86..0000000
--- a/message.txt
+++ /dev/null
@@ -1 +0,0 @@
-git is awesom

Commit the change:

git commit -m "delete message.txt"
[main 6a2d19b] delete message.txt
 1 file changed, 1 deletion(-)
 delete mode 100644 message.txt

Show current status

Edit the previously committed file and add the changes to the staging area:

echo "git is awesome" > message.txt
git add message.txt
ok

Create a new file:

echo "git is great" > praise.txt
ok

Show the working tree status:

git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   message.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	praise.txt

Note that message.txt is in the staging area, while praise.txt is not tracked.

Show commit log

Show commits:

git log
commit ecdeb79aad4565d8d7d725678ffadc48b3cdec52
Author: sandbox <sandbox@example.com>
Date:   Thu Mar 14 15:00:00 2024 +0000

    edit message

commit 3a2bd8f0929c605193120bd1ad12732f49457e99
Author: sandbox <sandbox@example.com>
Date:   Thu Mar 14 15:00:00 2024 +0000

    add message

Show only the commit message and the short hash:

git log --oneline
ecdeb79 edit message
3a2bd8f add message

Show commits as an ASCII graph:

git log --graph
* commit ecdeb79aad4565d8d7d725678ffadc48b3cdec52
| Author: sandbox <sandbox@example.com>
| Date:   Thu Mar 14 15:00:00 2024 +0000
|
|     edit message
|
* commit 3a2bd8f0929c605193120bd1ad12732f49457e99
  Author: sandbox <sandbox@example.com>
  Date:   Thu Mar 14 15:00:00 2024 +0000

      add message

Show compact ASCII graph:

git log --oneline --graph
* ecdeb79 edit message
* 3a2bd8f add message

Show specific commit

Show the last commit contents:

git show HEAD
commit ecdeb79aad4565d8d7d725678ffadc48b3cdec52
Author: sandbox <sandbox@example.com>
Date:   Thu Mar 14 15:00:00 2024 +0000

    edit message

diff --git a/message.txt b/message.txt
index 0165e86..118f108 100644
--- a/message.txt
+++ b/message.txt
@@ -1 +1 @@
-git is awesom
+git is awesome

Show the second-to-last commit:

git show HEAD~1
commit 3a2bd8f0929c605193120bd1ad12732f49457e99
Author: sandbox <sandbox@example.com>
Date:   Thu Mar 14 15:00:00 2024 +0000

    add message

diff --git a/message.txt b/message.txt
new file mode 100644
index 0000000..0165e86
--- /dev/null
+++ b/message.txt
@@ -0,0 +1 @@
+git is awesom

Use HEAD~n to show the nth-before-last commit or use the specific commit hash instead of HEAD~n.

Search repo

There are 3 commits, each adding a new line to message.txt:

git log --oneline
cc5b883 no debates
2774a8b is great
31abe57 is awesome

The current message.txt state:

cat message.txt
git is awesome
git is great
there is nothing to debate

Search in working tree (current state):

git grep "debate"
message.txt:there is nothing to debate

Search the project as of the second-to-last commit:

git grep "great" HEAD~1
HEAD~1:message.txt:git is great

You can use the specific commit hash instead of HEAD~n.

Branch and merge

Let's dive into the wondrous world of merging.

branchmergerebasesquashcherry-pick

Branch

Show branches (there is only main now):

git branch
* main

Create and switch to a new branch:

git branch ohmypy
git switch ohmypy
Switched to branch 'ohmypy'

Show branches (the current one is ohmypy):

git branch
  main
* ohmypy

Add and commit a file:

echo "print('git is awesome')" > ohmy.py
git add ohmy.py
git commit -m "ohmy.py"
[ohmypy a715138] ohmy.py
 1 file changed, 1 insertion(+)
 create mode 100644 ohmy.py

Show only commits from the ohmypy branch:

git log --oneline main..ohmypy
a715138 ohmy.py

Merge

Show commits from all branches (two commits in main, one in ohmypy):

git log --all --oneline --graph
* ecdeb79 edit message
| * a715138 ohmy.py
|/
* 3a2bd8f add message

We are now on the main branch, let's merge the ohmypy branch back into main:

git merge ohmypy
Merge made by the 'ort' strategy.
 ohmy.py | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 ohmy.py

There are no conflicts, so git commits automatically. Show the new commit history:

git log --all --oneline --graph
*   7d5ac4f Merge branch 'ohmypy'
|\
| * a715138 ohmy.py
* | ecdeb79 edit message
|/
* 3a2bd8f add message

Rebase

Show commits from all branches (two commits in main, one in ohmypy):

git log --all --oneline --graph
* ecdeb79 edit message
| * a715138 ohmy.py
|/
* 3a2bd8f add message

We are now on the main branch, let's rebase the ohmypy branch back into main:

git rebase ohmypy
Rebasing (1/1)
Successfully rebased and updated refs/heads/main.

Note that the new commit history is linear, unlike when we do a git merge ohmypy:

git log --all --oneline --graph
* c2b0c60 edit message
* a715138 ohmy.py
* 3a2bd8f add message

Rebasing rewrites history. So it's better not to rebase branches that have already been pushed to remote.

Squash

Show commits from all branches (two commits in main, three in ohmypy):

git log --all --oneline --graph
* ecdeb79 edit message
| * b9a7d0f ohmy.lua
| * 5ca4d55 ohmy.sh
| * a715138 ohmy.py
|/
* 3a2bd8f add message

If we do git merge ohmypy to merge the ohmypy branch into main, the main branch will receive all three commits from ohmypy.

Sometimes we prefer to "squash" all the branch commits into a single commit, and then merge it into main. Let's do it.

Switch to the ohmypy branch:

git switch ohmypy
Switched to branch 'ohmypy'

Combine all ohmypy changes into a single commit in the working directory:

git merge --squash main
Squash commit -- not updating HEAD

Commit the combined changes:

git commit -m "ohmy[py,sh,lua]"
[ohmypy 4f2a17f] ohmy[py,sh,lua]
 1 file changed, 1 insertion(+), 1 deletion(-)

Switch back to the main branch:

git switch main
ok

Merge the ohmypy branch into main:

git merge --no-ff ohmypy -m "ohmy[py,sh,lua]"
Merge made by the 'ort' strategy.
 ohmy.lua | 1 +
 ohmy.py  | 1 +
 ohmy.sh  | 1 +
 3 files changed, 3 insertions(+)
 create mode 100644 ohmy.lua
 create mode 100644 ohmy.py
 create mode 100644 ohmy.sh

Note the single commit in main made of three commits in ohmypy:

git log --all --oneline --graph
*   008dce6 ohmy[py,sh,lua]
|\
| * 4f2a17f ohmy[py,sh,lua]
| * b9a7d0f ohmy.lua
| * 5ca4d55 ohmy.sh
| * a715138 ohmy.py
* | ecdeb79 edit message
|/
* 3a2bd8f add message

Cherry-pick

I have a typo in message.txt:

cat message.txt
git is awesom

And I accidentally fixed it in the ohmypy branch instead of main:

git log --all --oneline --graph --decorate
* 568193c (HEAD -> main) add praise
| * bbce161 (ohmypy) ohmy.sh
| * cbb09c6 fix typo
| * a715138 ohmy.py
|/
* 3a2bd8f add message

I'm not ready to merge the entire ohmypy branch, so I will cherry-pick the commit:

git cherry-pick cbb09c6
[main b23d3ee] fix typo
 Date: Thu Mar 14 15:00:00 2024 +0000
 1 file changed, 1 insertion(+), 1 deletion(-)

cherry-pick applied the comment to the main branch:

git log --all --oneline --graph --decorate
* b23d3ee (HEAD -> main) fix typo
* 568193c add praise
| * bbce161 (ohmypy) ohmy.sh
| * cbb09c6 fix typo
| * a715138 ohmy.py
|/
* 3a2bd8f add message

The typo is fixed:

cat message.txt
git is awesome

Local and remote

Working with a local repo is fun, but adding a remote repo is even funnier.

pushpullresolvepush branchfetch branchtags

Push

Alice wants to clone our repo and make some changes.

Clone the remote repo:

git clone /tmp/remote.git /tmp/alice
Cloning into '/tmp/alice'...
done.

Normally you'd see a GitHub/GitLab/etc URL here, but our "remote" repo is on the same machine in /tmp/remote.git.

Set user name and email:

cd /tmp/alice
git config user.email alice@example.com
git config user.name "Alice Zakas"
ok

Make some changes and commit:

echo "Git is awesome!" > message.txt
git commit -am "edit from alice"
[main b9714f2] edit from alice
 1 file changed, 1 insertion(+), 1 deletion(-)

Push locally committed changes to the remote repo:

git push
ok

Pull

I want to pull Alice's changes to the local repo.

No commits from Alice yet:

git log --oneline
3a2bd8f add message

Pull the latest changes from the remote repo:

git pull
Updating 3a2bd8f..b9714f2
Fast-forward
 message.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

The local repo now contains commits from Alice:

git log --oneline
b9714f2 edit from alice
3a2bd8f add message

Resolve conflict

I have a local commit (not yet pushed to the remote) that conflicts with Alice's changes (already pushed to the remote), so I need to resolve it.

Pull the changes from the remote repo:

git pull
Auto-merging message.txt
CONFLICT (content): Merge conflict in message.txt
Automatic merge failed; fix conflicts and then commit the result.From /tmp/remote
   3a2bd8f..b9714f2  main       -> origin/main (exit status 1)

There is a conflict in message.txt! Let's show it:

cat message.txt
<<<<<<< HEAD
git is awesome
=======
Git is awesome!
>>>>>>> b9714f2c59c7dbd1205cf20e0a99939b7a686d97

I like Alice's version better, so let's choose it:

git checkout --theirs -- message.txt
# to choose our version, use --ours
ok

Add the resolved file to the staging area and complete the merge:

git add message.txt
git commit -m "merge alice"
[main cbb6112] merge alice

Push branch

Create the local ohmypy branch:

git branch ohmypy
git switch ohmypy
Switched to branch 'ohmypy'

Add and commit a file:

echo "print('git is awesome')" > ohmy.py
git add ohmy.py
git commit -m "ohmy.py"
[ohmypy c64073e] ohmy.py
 1 file changed, 1 insertion(+)
 create mode 100644 ohmy.py

Push the local branch to remote:

git push -u origin ohmypy
branch 'ohmypy' set up to track 'origin/ohmypy'.

Show both local and remote branches:

git branch --all
  main
* ohmypy
  remotes/origin/main
  remotes/origin/ohmypy

Fetch branch

Fetch remote branches:

git fetch
ok

Remote has the ohmypy branch, but it's not checked out locally:

git branch
* main

Checkout the ohmypy branch:

git switch ohmypy
# or: git checkout ohmypy
branch 'ohmypy' set up to track 'origin/ohmypy'.

Show branches:

git branch
  main
* ohmypy

Tags

Create a tag for the latest commit:

git tag 0.1.0 HEAD
ok

Create a tag for the nth-before-last commit:

git tag 0.1.0-alpha HEAD~1
ok

You can use the commit hash instead of HEAD~n.

Show tags:

git tag -l
0.1.0
0.1.0-alpha

Show compact log with tags:

git log --decorate --oneline
ecdeb79 (HEAD -> main, tag: 0.1.0, origin/main) edit message
3a2bd8f (tag: 0.1.0-alpha) add message

Delete tag:

git tag -d 0.1.0-alpha
Deleted tag '0.1.0-alpha' (was 3a2bd8f)

Push tags to the remote:

git push --tags
ok

Undo

"Damn, how do I undo what I just did?" — is the eternal Git question. Let's answer it once and for all.

amend commitundo uncommittedundo localundo remoterewind historystash changes

Amend commit

Edit a file and commit:

echo "git is awesome" > message.txt
git commit -am "edit nessage"
[main c0206a0] edit nessage
 1 file changed, 1 insertion(+), 1 deletion(-)

Show commits:

git log --oneline
c0206a0 edit nessage
3a2bd8f add message

I made a typo, so I want to change the commit message:

git commit --amend -m "edit message"
[main ecdeb79] edit message
 Date: Thu Mar 14 15:00:00 2024 +0000
 1 file changed, 1 insertion(+), 1 deletion(-)

Git has replaced the last commit:

git log --oneline
ecdeb79 edit message
3a2bd8f add message

To change the commit message for one of the last n commits, use git rebase -i HEAD~n (interactive) and follow the instructions on the screen.

Amend only works if the commit has not yet been pushed to the remote repo!

Undo uncommitted changes

Edit the previously committed file and add the changes to the staging area:

echo "git is awesome" > message.txt
git add message.txt
ok

Show the working tree status:

git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   message.txt

Remove the changes from the staging area:

git restore --staged message.txt
ok

The local file is still modified, but it's not staged for commit:

git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   message.txt

no changes added to commit (use "git add" and/or "git commit -a")

Now let's discard the changes altogether:

git restore message.txt
# or: git checkout message.txt
ok

Show the file contents:

cat message.txt
git is awesom

The changes are gone.

Undo local commit

I changed my mind about the last commit and I want to undo it.

Show commits:

git log --oneline
ecdeb79 edit message
3a2bd8f add message

Undo the last one:

git reset --soft HEAD~
ok

The commit is gone:

git log --oneline
3a2bd8f add message

But the changes are still in the staged area:

git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   message.txt

To remove both the commit and the local changes, use --hard instead of --soft:

git reset --hard HEAD~
git status
HEAD is now at 3a2bd8f add message
On branch main
nothing to commit, working tree clean

Reset only works if the commit has not yet been pushed to the remote repo!

Undo remote commit

I changed my mind about the last commit and I want to undo it, but the commit is already pushed to the remote repo.

Show commits:

git log --oneline
ecdeb79 edit message
3a2bd8f add message

Undo the last one:

git revert HEAD --no-edit
[main 9ffa044] Revert "edit message"
 Date: Thu Mar 14 15:00:00 2024 +0000
 1 file changed, 1 insertion(+), 1 deletion(-)

You can revert to nth-before-last commit by using HEAD~n or use the specific commit hash instead of HEAD~n.

Since the commit has already been pushed, git can't delete it. Instead it creates an "undo" commit:

git log --oneline
9ffa044 Revert "edit message"
ecdeb79 edit message
3a2bd8f add message

Push the "undo" commit to the remote:

git push
ok

Rewind history

Show commits:

git log --oneline --graph
*   7d5ac4f Merge branch 'ohmypy'
|\
| * a715138 ohmy.py
* | ecdeb79 edit message
|/
* 3a2bd8f add message

Show all repo states in reverse chronological order:

git reflog
7d5ac4f HEAD@{0}: merge ohmypy: Merge made by the 'ort' strategy.
ecdeb79 HEAD@{1}: commit: edit message
3a2bd8f HEAD@{2}: checkout: moving from ohmypy to main
a715138 HEAD@{3}: commit: ohmy.py
3a2bd8f HEAD@{4}: checkout: moving from main to ohmypy
3a2bd8f HEAD@{5}: commit (initial): add message

Suppose I want to go back to HEAD@{3}:

git reset --hard HEAD@{3}
HEAD is now at a715138 ohmy.py

This resets the entire repo and the working tree to the moment of HEAD@{3}:

git log --oneline --graph
* a715138 ohmy.py
* 3a2bd8f add message

Stash changes

Edit the previously committed file:

echo "git is awesome" > message.txt
git add message.txt
ok

Let's say we need to switch to another branch, but we don't want to commit the changes yet.

Stash the local changes (i.e. save them in "drafts"):

git stash
Saved working directory and index state WIP on main: 3a2bd8f add message

Stash is a stack, so you can push multiple changes onto it:

echo "Git is awesome!" > message.txt
git stash
Saved working directory and index state WIP on main: 3a2bd8f add message

Show stash contents:

git stash list
stash@{0}: WIP on main: 3a2bd8f add message
stash@{1}: WIP on main: 3a2bd8f add message

Now we can switch to another branch and do something:

...(omitted for brevity)...

Switch back to the main branch and re-apply the latest changes from the stash:

git switch main
git stash pop
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   message.txt

no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (96af1a51462f29d7b947a7563938847efd5d5aeb)

pop returns changes from the stack in "last in, first out" order.

Clear the stash:

git stash clear
ok

Advanced stuff

While git gurus probably know all about these features, most developers have never heard of them. Let's fix that.

log summaryworktreebisectpartial checkoutpartial clone

Log summary

Since the 1.0 release (tag v1.0), we have 6 commits from 3 contributors:

git log --pretty=format:'%h %an %s %d'
7611979 bob ohmy.lua  (HEAD -> main, origin/main)
ef4f23e bob ohmy.sh
3d8f700 bob ohmy.py
c61962c alice no debates
2ab82f6 alice go is great
ecdeb79 sandbox edit message
3a2bd8f sandbox add message  (tag: v1.0)

Note the --pretty option which customizes the log fields:

%h   commit hash
%an  author
%s   message
%d   decoration (e.g. branch name or tag)

List the commits grouped by contributors:

git shortlog v1.0..
alice (2):
      go is great
      no debates

bob (3):
      ohmy.py
      ohmy.sh
      ohmy.lua

sandbox (1):
      edit message

A couple of useful options:

  • -n (--numbered) sorts the output by descending number of commits per contributor.
  • -s (--summary) omits commit descriptions and prints only counts.

List contributors along with the number of commits they have authored:

git shortlog -ns v1.0..
3	bob
2	alice
1	sandbox

Worktree

I'm in the middle of something important in the ohmypy branch:

echo "-- pwd --"
pwd
echo "-- branches --"
git branch
echo "-- status --"
git status
-- pwd --
/tmp/repo
-- branches --
  main
* ohmypy
-- status --
On branch ohmypy
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   ohmy.py

Suddenly I need to fix an annoying typo in the main branch. I can stash the local changes with git stash, or I can checkout multiple branches at the same time with git worktree.

Checkout the main branch into /tmp/hotfix:

git worktree add -b hotfix /tmp/hotfix main
HEAD is now at 3a2bd8f add message

Fix the typo and commit:

cd /tmp/hotfix
echo "git is awesome" > message.txt
git commit -am "fix typo"
[hotfix c3485cd] fix typo
 1 file changed, 1 insertion(+), 1 deletion(-)

Push to remote main:

git push --set-upstream origin main
branch 'main' set up to track 'origin/main'.

Now I can return to /tmp/repo and continue working on the ohmypy branch.

Bisect

I have 5 poorly named commits:

git log --oneline
2f568eb main.sh
31ed915 main.sh
f8b2baf main.sh
5e0cf35 main.sh
8f0f1e4 main.sh
51c34ff test.sh

And a failing test:

sh test.sh
FAIL (exit status 1)

I will use the bisection algorithm to find the commit that introduced the bug:

git bisect start
status: waiting for both good and bad commits

The current state is obviously buggy, but I'm pretty sure the first "main.sh" commit was good:

git bisect bad HEAD
git bisect good HEAD~4
status: waiting for good commit(s), bad commit known
Bisecting: 1 revision left to test after this (roughly 1 step)
[f8b2baf93964ec9e0daa87c9ed262bbf5cf66b67] main.sh

Git has automatically checked out the middle commit. Let's test it:

sh test.sh
PASS

The test passes. Mark the commit as good:

git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[31ed915660c42a00aa30b51520a16e3f48201299] main.sh

Git has automatically checked out the middle commit. Let's test it:

sh test.sh
FAIL (exit status 1)

The test fails. Show the commit details:

git show
commit 31ed915660c42a00aa30b51520a16e3f48201299
Author: sandbox <sandbox@example.com>
Date:   Thu Mar 14 15:00:00 2024 +0000

    main.sh

diff --git a/main.sh b/main.sh
index 7f8f78c..ce533e0 100644
--- a/main.sh
+++ b/main.sh
@@ -1,2 +1,2 @@
 # sum two numbers
-echo $(expr $1 + $2)
+echo $(expr $1 - $2)

This is the commit that introduced the bug (subtraction instead of addition)!

Partial checkout

The remote repo looks like this:

.
├── go.mod
├── main.go
├── products
│   └── products.go
└── users
    └── users.go

We will selectively checkout only some of the directories.

Clone the repo, but do not checkout the working tree:

git clone --no-checkout /tmp/remote.git /tmp/repo
cd /tmp/repo
Cloning into '/tmp/repo'...
done.

Tell git to checkout only the root and users directories:

git sparse-checkout init --cone
git sparse-checkout set users
ok

Checkout the directories:

git checkout main
Your branch is up to date with 'origin/main'.

Only the root and users directories are checked out:

tree
.
├── go.mod
├── main.go
└── users
    └── users.go

1 directories, 3 files

The products directory was not checked out.

Partial clone

The partial checkout approach we tried earlier still clones the entire repo. So if the repo itself is huge (which is often the case if it has a long history or large binary files), the clone step can be slow and traffic-intensive.

To reduce the amount of data downloaded during cloning, use partial clone with one of the following commands:

# Download commits and trees (directories),
# but not blobs (file contents):
git clone --filter=blob:none file:///tmp/remote.git

# Download commits only, without trees (directories)
# or blobs (file contents):
git clone --filter=tree:0 file:///tmp/remote.git

In both cases, git will lazily fetch the missing data later when needed.

Note that for this to work, the remote server should support partial cloning (GitHub does).

Final thoughts

We've covered important Git operations, from basic editing to branching and merging, remote syncing, undoing changes, and performing some moderate magic.

To learn more about Git, check out the reference manual and the Pro Git book by Scott Chacon and Ben Straub.

And may Git be with you!

──

P.S. Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.

]]>
Banned for self-promohttps://antonz.org/self-promo/Sun, 10 Mar 2024 11:30:00 +0000https://antonz.org/self-promo/Policies that discourage self-promotion are harmful.I just learned that a good tech author was banned from Lobsters1 (a Hacker News2 like platform) for "self-promotion". I'm so angry.

This may come as a surprise, but as a blogger, it's very difficult to reach your audience and to be heard, no matter how good or deep or useful your articles are. Unless you post "hot takes" or have a huge marketing budget, that is.

One of the few options is to post to HN/Lobsters and hope for a (slim) chance of getting noticed. And yes, authors often have to submit their own stuff because readers rarely do (again, no matter how good the article is).

And yet the platform gatekeepers do everything in their power to keep authors from posting. They enforce these ridiculous "anti-self-promotion" policies (where's the promotion, by the way? It's not like authors are trying to sell anything, they're just trying to be heard).

Platforms like HN and Lobsters already have powerful collaboration-based mechanisms to protect them from bad content: voting and flagging. Bad articles don't get upvoted, so no one sees them. Offensive content and spam are flagged and removed. The community is perfectly capable of deciding what content it wants to see on the platform. And yet moderators choose to abuse their power by banning authors for "self-promotion".

Despite what the platform owners say, they don't give a damn about "quality content". Instead, they welcome all kinds of hype (which is what people generally submit), corporate blogs with fat budgets, and a small number of tech celebrities.

Well, you know what? Fuck you, Lobsters. I'm going to submit every single article by this author.

Eat that.

]]>
I'm a programmer and I'm stupidhttps://antonz.org/stupid/Wed, 06 Mar 2024 06:30:00 +0000https://antonz.org/stupid/But it kind of works.I've been writing code for money for 15 years. I've tried other roles — product management, analytics, testing — but they didn't stick. And over the years, I've learned that I'm pretty dumb. Unfortunately.

I haven't been diagnosed with any specific medical condition, but my mental capacity is very limited. I find even easier Leetcode problems challenging. Reading about a basic consensus algorithm makes my head explode. I can't really follow complex dependencies in a code base. I can't learn a fancy language like Rust (I tried, but honestly, it's too much). I hate microservices and modern frontends because there are so many moving parts, I can't keep track of them all.

So what do I do about it?

I use the simplest mainstream language available (Go) and very basic Python. I write simple (though sometimes verbose) code that is easy to understand and maintain. I avoid deep abstractions and always choose composition over inheritance or mixins. I only use generics when absolutely necessary. I prefer flat data structures whenever possible.

I introduce as few external dependencies as possible (ideally zero1). I design modules with clear APIs (not in a Robert Martin's definition of "clear"), but almost never extract them into microservices. I use JSON-over-HTTP APIs, but never GraphQL. I took the time to learn SQL and use it a lot2. I apply basic resilience patterns like timeouts, circuit breakers, and backpressure.

I try to use as few software components as possible. Ideally, just the application itself, SQLite or PostgreSQL for data storage, and Docker with a sprinkle of shell for deployments. Nginx/HAProxy as needed. No API gateways, no sharding, no distributed caches, no message queues, no NoSQL/NewSQL/Graph/whatever databases, no service discovery, no federation, no cloud native, no FAANG-level best practices.

I draw dependency graphs and sequence diagrams to understand legacy code. I write comments to remind future me why a certain function does what it does, or why a certain if branch is necessary. I write documentation, trying to make it concise and readable. I write examples, lots of them. Sometimes even interactive3.

The software I build seems to work okay. It won't impress a Google engineer, that's for sure. But it serves its users and the business reasonably well.

So being stupid kind of works for me.

]]>
Try X in Y minuteshttps://antonz.org/try-x-in-y-minutes/Tue, 20 Feb 2024 10:00:00 +0000https://antonz.org/try-x-in-y-minutes/Get a quick taste of your next $thing by studying interactive examples in the browser.My favorite way to learn is through practice. In fact, it's the only type of learning that helps me internalize the material. That's why I'm a big fan of Learn X in Y minutes, which teaches programming languages by showing the code (with comments).

The only thing that bothered me was the setup step. Sometimes I'd prefer to try something new quickly, without downloading and installing a bunch of stuff. So an ideal solution for me would be to try the code in the browser without any setup. Like this:

# Define a function named `greet`.
# Arguments are passed implicitly and
# are available in variables $1, $2, ...
greet() {
    echo "Hello, $1!"
}

# Call the function with a single argument:
greet "World"
Hello, World!

Fortunately, I had all the tools I needed to implement this vision. So I did it.

Try X in Y minutes is an interactive version of Learn X in Y minutes, where you can run (and modify) the code in your browser. And not just programming languages — databases, libraries and CLI tools too.

Try C in Y minutes
Read the explanations, play with the code.

Of course, making all the original guides interactive is a huge undertaking. I've prepared a few, but that's just a start. There is no way that I could do it all alone in any reasonable amount of time. So I invite you to join! Try X in Y minutes is completely open-source, and your contributions are very welcome.

Here are the currently available guides:

Let's Try X in Y minutes!

]]>
Code playground visualizationshttps://antonz.org/playground-visualizations/Fri, 09 Feb 2024 11:00:00 +0000https://antonz.org/playground-visualizations/Presenting playground output in a visual manner.I'm a big fan of interactive code snippets in all kinds of technical writing, from product docs to online courses to blog posts. Like this one:

def greet(name):
    print(f"Hello, {name}!")

greet("World")

And while having plain text output is fine in many cases, sometimes it's better to present results in a more visual way. Let me show you some examples, and then move on to the implementation.

Plotting with Python

Everyone knows Python's matplotlib and its pyplot module, popularized by Jupyter notebooks. But wouldn't it be nice to visualize the data in any article, not just notebooks?

Using some random data:

import numpy as np
import matplotlib.pyplot as plt

data = {
    "a": np.arange(50),
    "c": np.random.randint(0, 50, 50),
    "d": np.random.randn(50)
}
data["b"] = data["a"] + 10 * np.random.randn(50)
data["d"] = np.abs(data["d"]) * 100

plt.scatter("a", "b", c="c", s="d", data=data)
plt.xlabel("alice")
plt.ylabel("bob")
plt.title("Alice vs. Bob")
plt.tight_layout()
plt.show()

Or a dataset (employees.json):

import json
import seaborn as sns

employees = json.load(open("employees.json"))
salaries = {"hr": 0, "it": 0, "sales": 0}
for empl in employees:
    salaries[empl["department"]] += empl["salary"]

fig, ax = plt.subplots(figsize=(3.5, 3.5))
ax.pie(
    salaries.values(), labels=salaries.keys(),
    colors=sns.color_palette('Set2'),
    autopct='%1d%%',
)
plt.title("Salary by department")
plt.tight_layout()
plt.show()

Sumulations with JavaScript

We can also use JavaScript to display data (here using Chart.js):

const resp = await fetch("employees.json");
const employees = await resp.json();
const names = employees.map((empl) => empl.name);
const salaries = employees.map((empl) => empl.salary);

const data = {
    labels: names,
    datasets: [
        {
            label: "Salaries",
            data: salaries,
            borderWidth: 1,
        },
    ],
};

const el = document.createElement("canvas");
new Chart(el, { type: "bar", data: data });
return el;

But we've already tried that with Python. I'm thinking of something else. How about a dynamic simulation? A bouncing ball is a classic example, so let's go for it:

const canvas = document.createElement("canvas");
canvas.width = 340;
canvas.style.border = "1px solid #ccc";

// Ball class definition hidden for brevity
const ball = new Ball({
    canvas: canvas,
    x: canvas.width / 2,
    y: canvas.height / 2,
    dx: 2,
    dy: 2,
});
ball.bounce();

return canvas;

Try changing dx or dy to increase speed and re-run the snippet (by the way, you can use Ctrl+Enter or Cmd+Enter to run the code while editing).

Full code

Drawing with Go

Who said we should limit ourselves to "conventional" dataviz languages like Python or JavaScript? Go can be fun too!

// Circle definition hidden for brevity
circles := []Circle{
    {100, "#babeee"},
    {75, "#7573b6"},
    {50, "#70acb4"},
    {25, "#90d2c3"},
    {5, "#ecf4be"},
}

// SVG definition hidden for brevity
svg := SVG{200, 200}

svg.printHeader()
for _, c := range circles {
    svg.printCircle(c, 100, 100)
}
svg.printFooter()

Try changing the radius and color in circles and re-run the snippet.

Full code

Charting with SQL

How about drawing a bar chart for our employees dataset right in SQL? A little crazy, maybe, but doable:

with data as (
  select
    row_number() over w as rnum,
    name, salary
  from employees
  window w as (order by id)
)
-- chart rendering query part hidden for brevity

Try changing the order by clause from id to salary desc or name and re-run the snippet.

Full query

Combining playgrounds

Suppose we have an SQL playground that selects employees according to some criteria:

select id, name, salary
from employees
where department = 'it';

What I'd like to do is to take the query results and feed them into the Python playground:

import json
import seaborn as sns

employees = json.load(open("employees.json"))
names = [empl["name"] for empl in employees]
salaries = [empl["salary"] for empl in employees]

plt.figure(figsize=(5, 3))
plt.barh(names, salaries)
plt.title("Employee salaries")
plt.tight_layout()
plt.show()

Voilà! Try changing the department in the SQL playground to hr or sales, re-run the SQL playground, then re-run the Python playground, and see how the bar graph reflects the SQL results.

Implementation

Did I already told you I'm a big fan of interactive code snippets in all kinds of technical writing? I even built an open source tool called Codapi for embedding such snippets.

Suppose you have a static code example in your documentation:

```python
def greet(name):
    print(f"Hello, {name}!")

greet("World")
```

To make it interactive, add a codapi-snippet widget directly below:

<codapi-snippet sandbox="python" editor="basic">
</codapi-snippet>

The widget attaches itself to the preceding code block, allowing the reader to run and edit the code:

def greet(name):
    print(f"Hello, {name}!")

greet("World")

The code runs on the sandbox server or (for some playgrounds) directly in the browser using browser APIs or WebAssembly.

Previously, the widget only supported text output. Now it supports different output modes: text, SVG images, custom HTML, and even interactive DOM.

SVG images

To display an image, set the output-mode attribute to svg and print the image as an SVG string from your code snippet.

Code snippet (Python + matplotlib):

import io
import numpy as np
import matplotlib.pyplot as plt

data = {"a": np.arange(50), "c": np.random.randint(0, 50, 50), "d": np.random.randn(50)}
data["b"] = data["a"] + 10 * np.random.randn(50)
data["d"] = np.abs(data["d"]) * 100

plt.scatter("a", "b", c="c", s="d", data=data)
plt.xlabel("entry a")
plt.ylabel("entry b")
plt.show()

stream = io.StringIO()
plt.savefig(stream, format="svg")
print(stream.getvalue())

Widget:

<codapi-snippet sandbox="python" output-mode="svg">
</codapi-snippet>

Output when run:

SVG chart

HTML fragments

To display HTML content, set the output-mode attribute to html and print an HTML string from your code snippet.

Code snippet (Python):

html = """<blockquote>
    I am <em>so</em> <strong>excited</strong>!
</blockquote>"""
print(html)

Widget:

<codapi-snippet sandbox="python" output-mode="html">
</codapi-snippet>

Output when run:

I am so excited!

Interactive DOM

When using the JavaScript playground (engine=browser, sandbox=javascript), you can render a DOM node as an output. To do so, set the output-mode attribute to dom and return a DOM node from your code snippet.

Code snippet (JavaScript + Chart.js):

const el = document.createElement("canvas");

new Chart(el, {
    type: "bar",
    data: {
        labels: ["Red", "Blue", "Yellow", "Green", "Purple", "Orange"],
        datasets: [
            {
                label: "# of Votes",
                data: [12, 19, 3, 5, 2, 3],
                borderWidth: 1,
            },
        ],
    },
});

return el;

Widget:

<codapi-snippet engine="browser" sandbox="javascript" output-mode="dom">
</codapi-snippet>

Output when run:

Summary

Interactive code snippets are a great addition to tutorials, how-to guides, and reference documentation. Playgrounds with visual output are even better: they help readers to make sense of the data, preview an interface, or see a process simulation.

With the latest Codapi release, you can easily create visual interactive examples using the output-mode attribute.

If you are interested — try using Codapi in your product docs, blog or online course. You can start with the widget and add the sandbox server as needed (or use the cloud instance).

Let's make the world of technical documentation a little better!

]]>
SQLite 3.45: Interactive release noteshttps://antonz.org/sqlite-3-45/Tue, 09 Jan 2024 00:00:00 +0000https://antonz.org/sqlite-3-45/JSONB has landed.Based on the SQLite 3.45 release notes from the SQLite team, with interactive examples added.

SQLite 3.45 is out, and it has some decent features, so I think now it's a good time to show them off. Let's get started!

JSONB format

Starting with this release, SQLite allows its internal "parse tree" representation of JSON to be stored on disk, as a BLOB, in a format that called "JSONB". By storing SQLite's internal binary representation of JSON directly in the database, applications can bypass the overhead of parsing and rendering JSON when reading and updating JSON values. The internal JSONB format is also uses slightly less disk space then text JSON.

Any SQL function parameter that accepts text JSON as an input also accepts a BLOB in the JSONB format. The function operates the same in either case, except that it runs faster when the input is JSONB, since it does not need to run the JSON parser:

select json_extract(json('{"name":"alice"}'), '$.name');
select json_extract(jsonb('{"name":"alice"}'), '$.name');

select json('{"name":"alice"}') -> '$.name';
select jsonb('{"name":"alice"}') -> '$.name';

select json('[11,12,13]') -> 1;
select jsonb('[11,12,13]') -> 1;
alice
alice
"alice"
"alice"
12
12

JSONB is a binary representation of JSON used by SQLite and is intended for internal use by SQLite only. Applications should not use JSONB outside of SQLite nor try to reverse-engineer the JSONB format.

The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but are not binary compatible.

  • The PostgreSQL JSONB format claims to offer O(1) lookup of elements in objects and arrays. SQLite's JSONB format makes no such claim.
  • SQLite's JSONB has O(N) time complexity for most operations in SQLite, just like text JSON.

The advantage of JSONB in SQLite is that it is smaller and faster than text JSON — potentially several times faster. There is space in the on-disk JSONB format to add enhancements and future versions of SQLite might include options to provide O(1) lookup of elements in JSONB, but no such capability is currently available.

JSONB functions

Most SQL functions that return JSON text have a corresponding function that returns the equivalent JSONB. The functions that return JSON in the text format begin with json_ and functions that return the binary JSONB format begin with jsonb_.

The jsonb, jsonb_array and jsonb_object return the binary JSONB representation:

select typeof(jsonb('{"name":"alice"}'));
select jsonb('{"name":"alice"}') -> '$';

select typeof(jsonb_array(11,12,13));
select jsonb(jsonb_array(11,12,13)) -> '$';

select typeof(jsonb_object('name', 'alice'));
select jsonb(jsonb_object('name', 'alice')) -> '$';
blob
{"name":"alice"}
blob
[11,12,13]
blob
{"name":"alice"}

When extracting text, numeric, null, or boolean values jsonb_extract works exactly the same as json_extract. When extracting an array or object — returns the value in JSONB format:

select jsonb_extract(jsonb('{"name":"alice"}'), '$.name');
select jsonb_extract(jsonb('[11,12,13]'), '$[1]');

select typeof(
  jsonb_extract(jsonb('{"vals":[11,12,13]}'), '$.vals')
);
alice
12
blob

The -> operator returns JSON always returns the RFC 8565 text representation of that JSON, not JSONB:

select json('{"vals":[11,12,13]}') -> '$.vals';
select jsonb('{"vals":[11,12,13]}') -> '$.vals';
[11,12,13]
[11,12,13]

The jsonb_insert, jsonb_replace, and jsonb_set functions work just like their json_ counterparts, except that they return the result in binary JSONB format:

select jsonb_insert('[11,12]', '$[#]', 13) -> 2;
select jsonb_replace('{"name":"alice"}', '$.name', 'bob') -> '$.name';
select jsonb_set('{"name":"alice"}', '$.city', 'berlin') -> '$.city';
13
"bob"
"berlin"

The same goes for the jsonb_patch and jsonb_remove functions:

select jsonb_patch(
  '{"name":"alice"}',
  '{"name":"bob", "city":"berlin"}'
) -> '$.city';

select jsonb_remove('[11,12,13]', '$[1]') -> 1;
"berlin"
13

And the jsonb_group_array and jsonb_group_object functions:

select
  department,
  json(jsonb_group_array(name)) as names
from employees
group by department;
┌────────────┬──────────────────────────────────┐
│ department │              names               │
├────────────┼──────────────────────────────────┤
│ hr         │ ["Diane","Bob"]                  │
│ it         │ ["Emma","Grace","Henry","Irene"] │
│ sales      │ ["Cindy","Dave"]                 │
└────────────┴──────────────────────────────────┘

There are no jsonb_ counterparts for the following functions:

json_array_length
json_error_position
json_type
json_valid
json_quote
json_each
json_tree

json_valid function

The new json_valid(X,Y) function return 1 if the argument X is well-formed JSON, or returns 0 if X is not well-formed.

The optional Y parameter is an integer bitmask that defines what is meant by "well-formed". The following bits of Y are currently defined:

  • 0x01 → The input is text that strictly complies with canonical RFC-8259 JSON, without any extensions.
  • 0x02 → The input is text that is JSON with JSON5 extensions.
  • 0x04 → The input is a BLOB that superficially appears to be JSONB.
  • 0x08 → The input is a BLOB that strictly conforms to the internal JSONB format.

The most useful Y value is 6 (0x02 | 0x04), which checks if X is JSON5 text or JSONB.

If Y is omitted, it defaults to 1 (X is strictly conforming RFC-8259 JSON text without any extensions). This makes the one-argument version of json_valid compatible with older versions of SQLite, prior to the addition of support for JSON5 and JSONB.

Some examples:

.mode line
select json_valid('{"x":42}');
select json_valid('{x:42}');
select json_valid('{x:42}', 6);
select json_valid(null);
json_valid('{"x":42}') = 1
json_valid('{x:42}') = 0
json_valid('{x:42}', 6) = 1
json_valid(null) =

Performance considerations

Most JSON functions do their internal processing using JSONB. So if the input is text, they first translate the input text to JSONB. If the input is already in the JSONB format, no translation is needed, so this step can be skipped, and performance is better.

For this reason, when an argument to a JSON function is supplied by another JSON function, it is usually more efficient to use the jsonb_ variant of the function used as the argument.

... json_insert(A, '$.b', json(C)) ...    ← Less efficient
... json_insert(A, '$.b', jsonb(C)) ...   ← More efficient

The aggregate JSON SQL functions are an exception to this rule. These functions all do their processing using text instead of JSONB. So for the aggregate JSON SQL functions, it is more efficient to pass arguments using json_ functions rather than jsonb_ functions.

... json_group_array(json(A))) ...    ← More efficient
... json_group_array(jsonb(A))) ...   ← Less efficient

And a few other things

See the full release notes for details.

──

Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.

]]>
In-browser code playgroundshttps://antonz.org/in-browser-code-playgrounds/Sat, 06 Jan 2024 10:00:00 +0000https://antonz.org/in-browser-code-playgrounds/Embed and run interactive code snippets entirely in your browser, no server required.I'm a big fan of interactive code snippets in all kinds of technical writing, from product docs to online courses to blog posts. Like this one:

def greet(name):
    print(f"Hello, {name}!")

greet("World")

In fact, I even built an open source tool called Codapi1 for embedding such snippets.

Typically, a code playground consists of a client-side widget and a server-side part that executes the code and returns the result:

  browser
┌───────────────────────────────┐
│ def greet(name):              │
│   print(f"Hello, {name}!")    │
│                               │
│ greet("World")                │
└───────────────────────────────┘
  Run ►
    ↓
  server
┌───────────────────────────────┐
│ docker run codapi/python      │
│ python main.py                │
└───────────────────────────────┘
    ↓
  browser
┌───────────────────────────────┐
│ Hello, World!                 │
└───────────────────────────────┘

Personally, I'm quite happy with this setup. But often people prefer not to depend on a server and run the code entirely in the browser. So I decided to look into it and implemented embeddable in-browser code playgrounds for JavaScript, Python, PHP, Ruby, Lua, and SQLite.

Running language runtimes in the browser

The modern way to run arbitrary programs in the browser seems to be WebAssembly System Interface (WASI2) — an executable binary format based on WebAssembly. With WASI, you compile a program (originally written in C, Rust, Go, or some other language) into a WASI binary and then run it with a WASI runtime (there are a number of these runtimes from different vendors).

Just as we can compile an arbitrary program into WASI binary, we can take a language interpreter like Lua or CPython, compile it into WASI, and run it with the WASI runtime to execute Lua or Python code. In practice, however, it's not that easy, because WASI compilers do not (yet) implement all the features of traditional compilers like GCC.

Fortunately, VMWare Labs has already done the hard part and compiled PHP3, Python4 and Ruby5 into WASI. So all I had to do was publish the WASI binaries as NPM packages to make them available on the CDN. I've also compiled Lua6 and SQLite7 to WASI.

There is also Kohei Tokunaga's container2wasm initiative8, which converts arbitrary Docker images into WASI binaries. It looks promising, but it generates 100+ MB binaries for even the smallest Alpine-based images. And since downloading hundreds of megabytes just to read an interactive article is probably not the best idea, this approach is not very practical (yet).

Language runtimes compiled into WASI are one part of the equation. The other one is the WASI runtime (the thing that runs the binaries) capable of working in the browser. I chose the Runno9 runtime by Ben Taylor because it's simple and lightweight (27 KB).

The last step was to modify the JavaScript widget10 to support pluggable engines (WASI is one of them).

And that was it!

Showcase

Here are some interactive code snippets implemented as described above. Note that the language runtime is downloaded when you click the Run button, so the first run may take some time. Subsequent runs are almost instantaneous.

Python

Executes the code using the Python 3.12 WASI runtime (26.3 MB).

def greet(name):
    print(f"Hello, {name}!")

greet("World")

PHP

Executes the code using the PHP 8.2 WASI runtime (13.2 MB).

function greet($name) {
    echo "Hello, $name!";
}

greet("World");

Ruby

Executes the code using the Ruby 3.2 WASI runtime (24.5 MB).

def greet(name)
  puts "Hello, #{name}!"
end

greet("World")

Lua

Executes the code using the Lua 5.4 WASI runtime (330 KB).

function greet(name)
  print("Hello, " .. name .. "!")
end

greet("World")

JavaScript

Executes the code using the AsyncFunction11.

const greet = (name) => {
    console.log(`Hello, ${name}!`);
};

greet("World");

Fetch

Executes the code using the Fetch API12.

POST https://httpbingo.org/dump/request
content-type: application/json

{ "message": "hello" }

SQLite

Executes the code using the SQLite 3.44 WASI runtime (2.1 MB).

select id, name, department
from employees
order by id limit 3;

Advanced features

Because the WASI runtime plugs into the existing architecture, WASI-powered code snippets support advanced Codapi features such as templates or code cells.

Templates13 allow you to hide some code behind the scenes and show only the relevant part. For example, in the SQLite example above, the employees table is created as part of the template, so the snippet can take it for granted:

select id, name, department
from employees
order by id limit 3;

Code cells14 allow you to make code snippets depend on each other. For example, the first snippet defines the wrap function, while the second snippet uses it:

import textwrap

def wrap(text, width=20):
    """Wraps the text so every line is at most width characters long."""
    return textwrap.fill(text, width)
text = (
    "Python is a programming language that lets you work quickly "
    "and integrate systems more effectively."
)
print(wrap(text))

Usage

To use native browser playgrounds (e.g. JavaScript or Fetch), include the snippet.js script and add the codapi-snippet element next to the static code example. Use the browser engine:

<pre>console.log("hello")</pre>

<codapi-snippet engine="browser" sandbox="javascript" editor="basic">
</codapi-snippet>

<script src="https://unpkg.com/@antonz/codapi@0.12.0/dist/snippet.js"></script>

To use WASI-powered playgrounds (e.g. Python or SQLite), include two additional scripts and use the wasi engine:

<pre>print("hello")</pre>

<codapi-snippet engine="wasi" sandbox="python" editor="basic"></codapi-snippet>

<script src="https://unpkg.com/@antonz/runno@0.6.1/dist/runno.js"></script>
<script src="https://unpkg.com/@antonz/codapi@0.12.0/dist/engine/wasi.js"></script>
<script src="https://unpkg.com/@antonz/codapi@0.12.0/dist/snippet.js"></script>

To switch from in-browser to server-side playgrounds (which can run virtually any software), remove the engine attribute:

<pre>
fn main() {
    println!("Hello, World!");
}
</pre>

<codapi-snippet sandbox="rust" editor="basic"></codapi-snippet>

See the documentation15 for details.

Summary

WASI-powered sandboxes allow code snippets to run completely in-browser, with no server involved. They may take some time and traffic to initialize the runtime, but after that they run almost instantly.

As implemented in Codapi, they fit nicely into the overall architecture, providing access to features like templates and code cells. You can also easily switch from a browser-side to a server-side execution model.

Give them a try!

PlaygroundsSnippet widgetAbout Codapi

]]>
Go 1.22: Interactive release noteshttps://antonz.org/go-1-22/Wed, 03 Jan 2024 17:00:00 +0000https://antonz.org/go-1-22/Isolated loop variables, range over integers, math/rand v2 and enhanced routing.Based on the Go 1.22 release notes from the Go team (3-Clause BSD License), with many interactive examples added.

Go 1.22 is out, and you can try many of its features without leaving your browser. Read on and see!

Loop variablesRange over integersNew math/randomHTTP routingSlicesMinor stdlib changesOther changesSummary

No more sharing of loop variables

Previously, the variables declared by a "for" loop were created once and updated by each iteration. This led to common mistakes such as the loop-goroutine one:

// go 1.21
values := []int{1, 2, 3, 4, 5}
for _, val := range values {
    go func() {
        fmt.Printf("%d ", val)
    }()
}
5 5 5 5 5

In Go 1.22, each iteration of the loop creates new variables, to avoid accidental sharing bugs:

// go 1.22
values := []int{1, 2, 3, 4, 5}
for _, val := range values {
    go func() {
        fmt.Printf("%d ", val)
    }()
}
5 1 2 3 4

The change is backwards compatible: the new for-loop semantics only applies if the package being compiled is from a module that declares Go 1.22 or later in go.mod.

Range over integers

"For" loops may now range over integers:

for i := range 10 {
    fmt.Print(10 - i, " ")
}
fmt.Println()
fmt.Println("go1.22 has lift-off!")
10 9 8 7 6 5 4 3 2 1
go1.22 has lift-off!

See the spec for details.

Go 1.22 includes a preview of a language change that the Go team is considering for a future version of Go: range-over-function iterators. Building with GOEXPERIMENT=rangefunc enables this feature.

New math/rand/v2 package

Go 1.22 includes the first "v2" package in the standard library, math/rand/v2. The changes compared to math/rand are detailed in proposal #61716. The Go team plans to include an API migration tool in a future release, likely Go 1.23.

The most important changes are:

No Read method

The Read method, deprecated in math/rand, was not carried forward for math/rand/v2 (it remains available in math/rand). The vast majority of calls to Read should use crypto/rand's Read instead:

package main

import (
    "crypto/rand"
    "fmt"
)

func main() {
    b := make([]byte, 5)
    _, err := rand.Read(b)
    if err != nil {
        panic(err)
    }
    fmt.Printf("5 random bytes: %v\n", b)
}
5 random bytes: [245 181 23 109 149]

Otherwise a custom Read can be constructed using the Uint64 method:

package main

import (
    "fmt"
    "math/rand/v2"
)

func Read(p []byte) (n int, err error) {
    for i := 0; i < len(p); {
        val := rand.Uint64()
        for j := 0; j < 8 && i < len(p); j++ {
            p[i] = byte(val & 0xff)
            val >>= 8
            i++
        }
    }
    return len(p), nil
}

func main() {
    b := make([]byte, 5)
    Read(b)
    fmt.Printf("5 random bytes: %v\n", b)
}
5 random bytes: [135 25 55 202 33]

Generic N-function

The new generic function N is like Int64N or Uint64N but works for any integer type:

{
    // random integer
    var max int = 100
    n := rand.N(max)
    fmt.Println("integer n =", n)
}

{
    // random unsigned integer
    var max uint = 100
    n := rand.N(max)
    fmt.Println("unsigned int n =", n)
}
integer n = 55
unsigned int n = 96

Works for durations too (since time.Duration is based on int64):

// random duration
max := 100*time.Millisecond
n := rand.N(max)
fmt.Println("duration n =", n)
duration n = 78.949532ms

Fixed naming

Top-level functions and methods from math/rand:

Intn  Int31  Int31n  Int63  Int64n

are spelled more idiomatically in math/rand/v2:

IntN  Int32  Int32N  Int64  Int64N
fmt.Println("IntN   =", rand.IntN(100))
fmt.Println("Int32  =", rand.Int32())
fmt.Println("Int32N =", rand.Int32N(100))
fmt.Println("Int64  =", rand.Int64())
fmt.Println("Int64N =", rand.Int64N(100))
IntN   = 48
Int32  = 925068909
Int32N = 11
Int64  = 4225327687323893784
Int64N = 73

There are also new top-level functions and methods:

UintN  Uint32  Uint32N  Uint64  Uint64N
fmt.Println("UintN   =", rand.UintN(100))
fmt.Println("Uint32  =", rand.Uint32())
fmt.Println("Uint32N =", rand.Uint32N(100))
fmt.Println("Uint64  =", rand.Uint64())
fmt.Println("Uint64N =", rand.Uint64N(100))
UintN   = 46
Uint32  = 2549858040
Uint32N = 97
Uint64  = 3964182289933687247
Uint64N = 9

And more

The global generator accessed by top-level functions is unconditionally randomly seeded. Because the API guarantees no fixed sequence of results, optimizations like per-thread random generator states are now possible.

Many methods now use faster algorithms that were not possible to adopt in math/rand because they changed the output streams.

The Mitchell & Reeds LFSR generator provided by math/rand's Source has been replaced by two more modern pseudo-random generator sources: ChaCha8 and PCG. ChaCha8 is a new, cryptographically strong random number generator roughly similar to PCG in efficiency.

ChaCha8 is the algorithm used for the top-level functions in math/rand/v2. As of Go 1.22, math/rand's top-level functions (when not explicitly seeded) and the Go runtime also use ChaCha8 for randomness.

The Source interface now has a single Uint64 method; there is no Source64 interface.

Enhanced routing patterns

HTTP routing in the standard library is now more expressive. The patterns used by net/http.ServeMux have been enhanced to accept methods and wildcards.

Registering a handler with a method, like POST /items/create, restricts invocations of the handler to requests with the given method. A pattern with a method takes precedence over a matching pattern without one:

mux.HandleFunc("POST /items/create", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprint(w, "POST item created")
})

mux.HandleFunc("/items/create", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprint(w, "item created")
})

{
    // uses POST route
    resp, _ := http.Post(server.URL+"/items/create", "text/plain", nil)
    body, _ := io.ReadAll(resp.Body)
    fmt.Println("POST /items/create:", string(body))
    resp.Body.Close()
}

{
    // uses generic route
    resp, _ := http.Get(server.URL+"/items/create")
    body, _ := io.ReadAll(resp.Body)
    fmt.Println("GET /items/create:", string(body))
    resp.Body.Close()
}
POST /items/create: POST item created
GET /items/create: item created

As a special case, registering a handler with GET also registers it with HEAD.

Wildcards in patterns, like /items/{id}, match segments of the URL path. The actual segment value may be accessed by calling the Request.PathValue method:

mux.HandleFunc("/items/{id}", func(w http.ResponseWriter, r *http.Request) {
    id := r.PathValue("id")
    fmt.Fprintf(w, "Item ID = %s", id)
})

req, _ := http.NewRequest("GET", server.URL+"/items/12345", nil)
resp, _ := http.DefaultClient.Do(req)
body, _ := io.ReadAll(resp.Body)
fmt.Println("GET /items/12345:", string(body))
resp.Body.Close()
GET /items/{id}: Item ID: 12345

A wildcard ending in ..., like /files/{path...}, must occur at the end of a pattern and matches all the remaining segments:

mux.HandleFunc("/files/{path...}", func(w http.ResponseWriter, r *http.Request) {
    path := r.PathValue("path")
    fmt.Fprintf(w, "File path = %s", path)
})

req, _ := http.NewRequest("GET", server.URL+"/files/a/b/c", nil)
resp, _ := http.DefaultClient.Do(req)
body, _ := io.ReadAll(resp.Body)
fmt.Println("GET /files/a/b/c:", string(body))
resp.Body.Close()
GET /files/{path...}: File path: a/b/c

A pattern that ends in / matches all paths that have it as a prefix, as always. To match the exact pattern including the trailing slash, end it with {$}, as in /exact/match/{$}:

mux.HandleFunc("/exact/match/{$}", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprint(w, "exact match")
})

mux.HandleFunc("/exact/match/", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprint(w, "prefix match")
})

{
    // exact match
    req, _ := http.NewRequest("GET", server.URL+"/exact/match/", nil)
    resp, _ := http.DefaultClient.Do(req)
    body, _ := io.ReadAll(resp.Body)
    fmt.Println("GET /exact/match/:", string(body))
    resp.Body.Close()
}

{
    // prefix match
    req, _ := http.NewRequest("GET", server.URL+"/exact/match/123", nil)
    resp, _ := http.DefaultClient.Do(req)
    body, _ := io.ReadAll(resp.Body)
    fmt.Println("GET /exact/match/123:", string(body))
    resp.Body.Close()
}
GET /exact/match/: exact match
GET /exact/match/123: prefix match

If two patterns overlap in the requests that they match, then the more specific pattern takes precedence. If neither is more specific, the patterns conflict. This rule generalizes the original precedence rules and maintains the property that the order in which patterns are registered does not matter.

This change breaks backwards compatibility in small ways, some obvious — patterns with "{" and "}" behave differently — and some less so — treatment of escaped paths has been improved. The change is controlled by a GODEBUG field named httpmuxgo121. Set httpmuxgo121=1 to restore the old behavior.

Slices

The new function Concat concatenates multiple slices:

s1 := []int{1, 2}
s2 := []int{3, 4}
s3 := []int{5, 6}
res := slices.Concat(s1, s2, s3)
fmt.Println(res)
[1 2 3 4 5 6]

Functions that shrink the size of a slice (Delete, DeleteFunc, Compact, CompactFunc, and Replace) now zero the elements between the new length and the old length (see proposal #63393 for the reasoning).

Old behavior (note the src value after Delete):

// go 1.21
src := []int{11, 12, 13, 14}
// delete #1 and #2
mod := slices.Delete(src, 1, 3)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
src: [11 14 13 14]
mod: [11 14]

New behavior:

// go 1.22
src := []int{11, 12, 13, 14}
// delete #1 and #2
mod := slices.Delete(src, 1, 3)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
src: [11 14 0 0]
mod: [11 14]

Compact example:

src := []int{11, 12, 12, 12, 15}
mod := slices.Compact(src)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
src: [11 12 15 0 0]
mod: [11 12 15]

And Replace one:

src := []int{11, 12, 13, 14}
// replace #1 and #2 with 99
mod := slices.Replace(src, 1, 3, 99)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
src: [11 99 14 0]
mod: [11 99 14]

Insert now always panics if the argument i is out of range. Previously it did not panic in this situation if there were no elements to be inserted:

// go 1.21
src := []string{"alice", "bob", "cindy"}
// we are not actually inserting anything,
// so don't panic
mod := slices.Insert(src, 4)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
src: [alice bob cindy]
mod: [alice bob cindy]

But now it panics:

// go 1.22
src := []string{"alice", "bob", "cindy"}
// we are not actually inserting anything,
// but it panics anyway because 4 is out of range
mod := slices.Insert(src, 4)
fmt.Println("src:", src)
fmt.Println("mod:", mod)
panic: runtime error: slice bounds out of range [4:3]

Minor stdlib changes

Please refer to the full release notes for details:

Other changes

Tools

Go command:

  • Commands in workspaces can now use a vendor directory containing the dependencies of the workspace.
  • go get is no longer supported outside of a module in the legacy GOPATH mode
  • go mod init no longer attempts to import module requirements from configuration files for other vendoring tools (such as Gopkg.lock).
  • go test -cover now prints coverage summaries for covered packages that do not have their own test files.

Trace:

  • The trace tool's web UI has been gently refreshed as part of the work to support the new tracer, resolving several issues and improving the readability of various sub-pages.

Vet:

Runtime

The runtime now keeps type-based garbage collection metadata nearer to each heap object, improving the CPU performance (latency or throughput) of Go programs by 1–3%.

This change also reduces the memory overhead of the majority Go programs by approximately 1% by deduplicating redundant metadata.

Compiler

Profile-guided Optimization (PGO) builds can now devirtualize a higher proportion of calls than previously possible. Most programs from a representative set of Go programs now see between 2 and 14% improvement from enabling PGO.

Summary

Go 1.22 finally fixes an unfortunate design decision about loop variables that bit thousands of developers while they were learning the language. It also adds some nice syntactic sugar for iterating over integers, a new shiny package for working with random numbers, and long-awaited pattern-based HTTP routing. And a ton of other improvements, of course.

All in all, a great release!

──

P.S. Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.

]]>
Interactive code cellshttps://antonz.org/code-cells/Sat, 16 Dec 2023 09:00:00 +0000https://antonz.org/code-cells/Like Jupyter notebooks, but for any language.What would you say if Jupyter Notebooks could ➀ support any programming language, ➁ have no hidden state, and ➂ have a native look and feel for any article or blog post?

Well, I've just shipped something like that! With the latest codapi-js feature called code cells, you can create a series of interactive code snippets that depend on each other and execute them without leaving your browser.

Let's look at an example.

Code cells

Suppose we are writing an SQL tutorial using PostgreSQL. First, let's create an employees table:

create table employees (
  id integer primary key,
  name varchar(50),
  department varchar(10),
  salary integer
);

Then insert some employee data:

insert into employees
(id, name, department, salary)
values
(11, 'Diane', 'hr', 70),
(12, 'Bob', 'hr', 78),
(21, 'Emma', 'it', 84),
(22, 'Grace', 'it', 90),
(23, 'Henry', 'it', 104),
(31, 'Cindy', 'sales', 96),
(32, 'Dave', 'sales', 96),
(33, 'Alice', 'sales', 100);

Finally, rank the employees according to their salaries in each department:

select
  department as dep,
  dense_rank() over w as rank,
  name, salary
from employees
window w as (
  partition by department
  order by salary desc
)
order by department, rank;

As you can probably see, the insert snippet depends on the create snippet, while the select snippet depends on both create and insert, forming a dependency chain:

create ← insert ← select

This is how the snippets work:

  • When you run the insert snippet, it automatically creates the table before inserting the data.
  • When you run the select snippet, it automatically creates the table and inserts the records before selecting the data.

The dependencies do not have to be linear, they can form an (acyclic) graph. Suppose we want to describe the table schema:

\d employees;

The describe snippet does not need table data, so it can depend directly on the create snippet:

create ← insert ← select
   ↖
  describe

More complex dependency graphs are also possible, as long as there are no cycles:

create ← insert-1 ← select
       ↖ insert-2 ↙

You probably get the idea.

Implementation

Back to our example. Suppose we started with two static code examples — create and insert:

create table employees (
  id integer primary key,
  name varchar(50),
  department varchar(10),
  salary integer
);
insert into employees
(id, name, department, salary)
values
(11, 'Diane', 'hr', 70),
(12, 'Bob', 'hr', 78),
(21, 'Emma', 'it', 84);

First, we make them interactive by adding codapi-snippets.

➀ Create:

```sql
create table employees (
  id integer primary key,
  name varchar(50),
  department varchar(10),
  salary integer
);
```

<codapi-snippet id="create.sql" sandbox="postgres" editor="basic">
</codapi-snippet>

➁ Insert:

```sql
insert into employees
(id, name, department, salary)
values
(11, 'Diane', 'hr', 70),
(12, 'Bob', 'hr', 78),
(21, 'Emma', 'it', 84);
```

<codapi-snippet id="insert.sql" sandbox="postgres" editor="basic">
</codapi-snippet>

Then we make insert dependent on create by adding a depends-on attribute:

<codapi-snippet id="insert.sql" sandbox="postgres" editor="basic"
    depends-on="create.sql">
</codapi-snippet>

The rest is automatic!

Behind the scenes, codapi-snippet calls a codapi server (either a cloud or self-hosted instance) so it can run any programming language, database or software you've configured.

No hidden state

As you know, Jupyter notebooks are stateful — that is, they store the state of variables between cell invocations. This can be a good thing for prototyping, but not for documentation or explanation.

The last thing I want as a reader is code examples that fail or behave oddly because they were run out of order. Or because I didn't run some cells. Or because I changed a cell and didn't re-run it.

Codapi code cells, unlike Jupyter's, have no hidden state. Instead, they execute the whole chain of dependencies as needed to ensure that the reader gets a consistent result.

Summary

Codapi provides an easy way to create interactive examples in technical documentation. It gets even easier with code cells, which allow declarative dependencies between snippets of code.

Give it a try!

About CodapiCodapi serverSnippet widget

]]>