Unicode and Code Points
You can use ?
before a character literal to see your code point
iex> ?a
97
iex> ?ł
322
iex> ?a
97
iex> ?ł
322
iex> "\u0061" === "a"
true
iex>0x0061 = 97 = ?a
97
iex> "\u0061" === "a"
true
iex>0x0061 = 97 = ?a
97
UTF-8 and Encodings
Elixir uses UTF-8 to encode its strings, which means that code are encoded as a series of 8-bit bytes
String.length/1
count graphemes, but byte_size/1
reveals the number of underlying raw bytes needed to store the string when using UTF-8. UTF-8 requires one byte to represent the characters h
, e
, and o
, but two bytes to represent ł
iex> string = "hełło"
iex> String.length(string)
5
iex> byte_size(string)
7
iex> string = "hełło"
iex> String.length(string)
5
iex> byte_size(string)
7
Charlist
A charlist is a list of integers where all the integers are valid code points
iex> 'hełło'
[104, 101, 322, 322, 111]
iex> is_list 'hełło'
true
iex> 'hello'
'hello'
iex> List.first('hello')
104
iex> 'hełło'
[104, 101, 322, 322, 111]
iex> is_list 'hełło'
true
iex> 'hello'
'hello'
iex> List.first('hello')
104
iex> heartbeats_per_minute = [99, 97, 116]
'cat'
iex> heartbeats_per_minute = [99, 97, 116]
'cat'
iex> to_charlist "hełło"
[104, 101, 322, 322, 111]
iex> to_string 'hełło'
"hełło"
iex> to_string :hello
"hello"
iex> to_string 1
"1"
iex> to_charlist "hełło"
[104, 101, 322, 322, 111]
iex> to_string 'hełło'
"hełło"
iex> to_string :hello
"hello"
iex> to_string 1
"1"
iex> 'this ' <> 'fails'
** (ArgumentError) expected binary argument in <> operator but got: 'this '
(elixir) lib/kernel.ex:1821: Kernel.wrap_concatenation/3
(elixir) lib/kernel.ex:1808: Kernel.extract_concatenations/2
(elixir) expanding macro: Kernel.<>/2
iex:1: (file)
iex> 'this ' ++ 'works'
'this works'
iex> "he" ++ "llo"
** (ArgumentError) argument error
:erlang.++("he", "llo")
iex> "he" <> "llo"
"hello"
iex> 'this ' <> 'fails'
** (ArgumentError) expected binary argument in <> operator but got: 'this '
(elixir) lib/kernel.ex:1821: Kernel.wrap_concatenation/3
(elixir) lib/kernel.ex:1808: Kernel.extract_concatenations/2
(elixir) expanding macro: Kernel.<>/2
iex:1: (file)
iex> 'this ' ++ 'works'
'this works'
iex> "he" ++ "llo"
** (ArgumentError) argument error
:erlang.++("he", "llo")
iex> "he" <> "llo"
"hello"
referencies
Binaries, strings, and charlists: https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html [archive]