AppDividend
Latest Code Tutorials

What is Rune in Golang | Go Rune Example

0

Golang rune type is an alias for int32, and it is used to indicate than an integer represents the code point. ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters. Unicode, which is the superset of ASCII, defines the codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts (including the runic alphabet, but not Klingon) as well as multiple symbol sets.

History of ASCII

ASCII stands for American Standard Code for Information Interchange, which is a character encoding standard for electronic communication. ASCII codes represent the text in computers, telecommunications equipment, and other devices.

In the past years, we dealt with one character set, which was ASCII. It used 7 bits to represent 128 characters, including upper and lowercase English letters, digits, and a variety of punctuations and device-control characters. Due to this, a vast number of the population of the world is not able to use their writing system on the computer.

So, Unicode was invented to solve the problem. It is a superset of ASCII and contains all the characters present in the world’s writing system.

Unicode assigns each one a standard number called a Unicode code point, or in Go language, a rune. The rune type is an alias of int32.

Strings and UTF-8 encoding

However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)

Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding.

For example, in the string "Garçon" the character ç is encoded using two bytes, while the ASCII characters G, a, r, o, and n) only use one.

See the following code.

// hello.go

package main

import (
	"fmt"
)

func main() {

	fmt.Println([]byte("Garçon"))
	fmt.Println([]rune("Garçon"))
}

Output

go run hello.go
[71 97 114 195 167 111 110]
[71 97 114 231 111 110]

What is Rune in Golang

Rune literals are just 32-bit integer values (however, they are untyped constants, so their type can change). They represent the Unicode codepoints. For example, the rune literal ‘a’ is number 97.

Rune literal represents the rune constant where an integer value recognizes the Unicode code point.

In Go language, the rune expressed as one or more characters enclosed in single quotes like ‘g,’ ‘\t’, etc. In between single quotes, you have allowed a place any character except the newline and an unescaped single quote. Golang string is a sequence of bytes in Golang.

UTF-8 encodes all the Unicode in between 1 to 4 bytes, where 1 byte is used for ASCII and rest used for the rune. ASCII contains a total of 256 elements. In which 128 are characters, and 0-127 are identified as code points. Here code point refers to the element which represents a single value.

“Rune” means Unicode codepoint. (think of it as a character.) It is a term golang invented.

When you heard the word “rune,” you can think of it as any or all of the following:

  1. An integer. (possible values are from 0 to 2^32-1, but not all the values are valid Unicode codepoint.)
  2. A golang type, with keyword rune. It is an alias to the type int32
  3. A Unicode codepoint.
  4. A character.

Example of Rune

See the following code.

// hello.go

package main

import (
	"fmt"
	"reflect"
)

func main() {

	// Creating a rune
	runeK := 'K'
	runeb := 'b'
	runef := '\\'

	// Displaying rune and its type
	fmt.Printf("Rune 1: %c; Unicode: %U; Type: %s", runeK,
		runeK, reflect.TypeOf(runeK))

	fmt.Printf("\nRune 2: %c; Unicode: %U; Type: %s", runeb,
		runeb, reflect.TypeOf(runeb))

	fmt.Printf("\nRune 3: Unicode: %U; Type: %s", runef,
		reflect.TypeOf(runef))
}

Output

go run hello.go
Rune 1: K; Unicode: U+004B; Type: int32
Rune 2: b; Unicode: U+0062; Type: int32
Rune 3: Unicode: U+005C; Type: int32

Unicode Standard Notation for Codepoint

Unicode has the standard notation for codepoint, starting with U+, followed by its codepoint in hexadecimal. For example,

  1. space → U+20
  2. K → U+004B
  3. ♥ → U+
  4. 🤣 → U+1F923

Print Rune

Rune is codepoint; that is why it is an integer.

The following Printf formats work with an integer:

%c → The character as is.
%q → The rune syntax. e.g. ‘a’.
%U → Unicode notation. e.g., U+03B1.
%b → base 2
%o → base 8
%d → base 10
%x → base 16, with lower-case letters for a-f

As in the above example, we have printed the rune values.

Conclusion

Rune is the Type. It occupies 32bit and is meant to represent the Unicode CodePoint.

As an analogy, the English characters set encoded in the ‘ASCII’ has 128 code points. Thus can fit inside the byte (8bit).

From this (erroneous) assumption, C treated characters as ‘bytes’ char and ‘strings’ as a ‘sequence of characters’ char*.

But guess what, there are many other symbols invented by humans other than the ‘abcde…’ symbols. And there are so many that we need 32 bit to encode them.

In golang, a string is a sequence of bytes. However, since multiple bytes can represent the rune code-point, a string value can also contain runes. So, it can be converted to a []rune, or vice versa.

See also

Introduction to Golang

Installing Golang on MacOS

Identifiers in Golang

Golang Variables

Golang Constants

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.