The Cheatsheet discuss Yaml Markups for daily use. YAML, short for YAML Ain’t Markup Language, is a human-readable data serialization format. Its simplicity and readability stand out, making it popular for configuration files and data exchange between systems.
YAML was designed to be easily readable by humans, focusing on simplicity and conciseness. It uses indentation and whitespace to define the data structure, similar to how Python uses indentation.
The Markup Language is widely used for configuration files in applications, such as specifying settings for software or defining deployment configurations in infrastructure-as-code tools. YAML is used for data exchange between different systems or programming languages. Jekyll is primarily using the language for defining general datafiles like the site config _config.yml
and the frontmatter of pages, posts, and other content files.
The frontmatter provides the data to Jekyll on processing and rendering the content of posts and pages. |
The following examples give a good overview of YAML documents' most common data structures. Many modern applications, like Jekyll, use YAML documents to provide control data.
Helpful Links
The documentation of YAML is a good source of information but is challenging to read. The tools given below can help to write the correct YAML document, or test your own YAML data codes:
YAML and JSON are related as data serialization formats, with YAML providing a more human-friendly and expressive syntax, while JSON focuses on simplicity and machine readability. They have overlapping use cases but exhibit some differences in syntax, data types, and readability. JavaScript has built-in support for working with JSON data, making it a popular choice for handling data exchange in web applications. |
Scalars
In YAML, a scalar is a basic data element representing a single value. Scalars can define strings, numbers, booleans, and null values. Here are the different types of YAML scalars:
-
Strings — strings enclosed in quotes or double quotes
-
Numbers — numeric values, including integers and floats
-
Booleans — boolean values, either true or false
-
Null — empty values
integer: 1
float: 1.234
string1: 'abc'
string2: "abc"
string3: abc
boolean: false
empty: null
{
"integer": 1,
"float": 1.234,
"string1": "abc",
"string2": "abc",
"string3": "abc",
"boolean": false,
"empty": null
}
Date Strings
Date strings are scalars of type string follow the ISO 8601 standard format. The most common representations of date in YAML are:
-
Date Only —
YYYY-MM-DD
, example: 2023-05-23 -
Date and Time —
YYYY-MM-DDTHH:MM:SSZ
, example: 2023-05-23T10:30:00Z
In the above format, T
separates the date and time components, HH
represents hours in 24-hour format, MM
represents minutes, SS
represents seconds, and Z
indicates that the time is in UTC (Coordinated Universal Time).
The time component can also include milliseconds, such as HH:MM:SS.sssZ (triple s ). |
YAML also supports alternative formats for date strings, such as:
-
Date and Time with Timezone Offset —
YYYY-MM-DDTHH:MM:SS±HH:MM
, example: 2023-05-23T10:30:00-04:00. In this format, the timezone offset is represented as±HH:MM
, indicating the time difference from UTC. -
Date and Time without Timezone —
YYYY-MM-DDTHH:MM:SS
, example: 2023-05-23T10:30:00
When no timezone is specified, the time is assumed to be in the local timezone. |
date1: 2015-04-05
date2: 2022-11-03 +100
{
"date1": "2015-04-05T00:00:00.000Z",
"date2": "2022-11-03 +100"
}
Multiline strings (HereDoc)
A HereDoc (short for here document) is a way to represent multi-line strings or blocks of text within YAML documents. It allows you to define a string value that spans multiple lines while preserving the line breaks and formatting.
YAML indicates a HereDoc using the pipe character (|
) followed by an optional indentation level. The HereDoc block starts on the next line and continues until the indentation is reduced to the level of the initial pipe character or until the end of the document.
message: |
This is a multi-line
string using a HereDoc.
It preserves line breaks and formatting.
In this example, the message field contains a multi-line string defined using a HereDoc. The pipe |
character indicates the start of the HereDoc, and the following lines are indented with two spaces. The resulting value of the message field will include the line breaks and indentation specified within the HereDoc block.
{
"message": "This is a multi-line\nstring using a HereDoc.\nIt preserves line breaks and formatting.\n"
}
You can also control the handling of leading and trailing white space within a HereDoc by using additional symbols:
-
|+
symbol preserves the line breaks and removes trailing white space. It trims any spaces or tabs at the end of each line. -
|-
symbol preserves the line breaks and removes any leading white space. It trims any spaces or tabs at the beginning of each line.
Here’s an example using the different HereDoc symbols:
message1: |-
This is a HereDoc with leading and trailing spaces.
This line has leading spaces.
This line has trailing spaces.
message2: |+
This is a HereDoc with trailing spaces trimmed.
This line has trailing spaces.
This line has leading spaces.
{
"message1": "This is a HereDoc with leading and trailing spaces.\n This line has leading spaces.\nThis line has trailing spaces. ",
"message2": "This is a HereDoc with trailing spaces trimmed.\nThis line has trailing spaces. \nThis line has leading spaces. \n",
}
In this example:
-
message1 uses
|-
to trim leading spaces -
message2 uses
|+
to trim trailing spaces
Using HereDocs, you can include long, formatted text blocks in your YAML documents without requiring manual line concatenation or escaping characters. There is a good online previewer for the different heredoc modes at YAML Multiline. |
Sequences (Arrays)
A sequence is a way to represent a collection of items. It allows you to define an ordered list of values, similar to an array or a list in other programming languages. Sequences in YAML are denoted by a dash followed by a space (` `), and each item in the sequence is placed on a new line and indented.
Simple sequence
In this example, the sequence is represented by the key fruits followed by a colon (:
). The items in the sequence apple, banana, and orange are indented under the key fruits using the dash (`- `) notation.
fruits:
- apple
- banana
- orange
or written like so:
fruits: [ apple, banana, orange ]
{
"fruits": [
"apple",
"banana",
"orange"
]
}
Sequence of sequences
In YAML, a sequence of sequences (Array of arrays) refers to a structure where a sequence contains other sequences as its elements. Each item in the outer sequence is itself a sequence. It allows you to create a nested collection within an collection.
Here’s an example of a YAML sequence of sequences:
list_of_lists:
- fruits: [ apple, banana, orange ]
- colors: [ red, blue, green ]
{
"list_of_lists": [
{
"fruits": [
"apple",
"banana",
"orange"
]
},
{
"colors": [
"red",
"blue",
"green"
]
}
]
}
You can nest sequences of sequences to represent more complex structures or hierarchical data. Nesting sequences allow you to organize and represent data in a structured manner within YAML documents.
Hash (Dictionary)
A hash is a data structure used to represent key-value pairs. It is also known as a mapping or dictionary in other programming languages. Hashes in YAML are denoted using indentation and a colon to separate the key and value.
Simple hash
name: John Doe
age: 30
email: johndoe@example.com
The hash represents a collection of related key-value pairs. In the example, name, age, and email are keys
, and John Doe+, *30, and johndoe@example.com are their corresponding values
.
{
"name": "John Doe",
"age": 30,
"email": "johndoe@example.com"
}
Named hash
person:
name: John Doe
age: 30
email: johndoe@example.com
{
"person": {
"name": "John Doe",
"age": 30,
"email": "johndoe@example.com"
}
}
Nested hash
Hashes can also be nested within other hashes (Hash of Hashes), allowing for more complex data structures. Here’s an example of a nested hash in YAML.
nested_hash:
hash1:
subsubkey1: 5
subsubkey2: 6
hash2:
somethingelse: Important!
{
"nested_hash": {
"hash1": {
"subsubkey1": 5,
"subsubkey2": 6
},
"hash2": {
"somethingelse": "Important!"
}
}
}
Hashes with JSON syntax (mixing is possible)
JSON data structure
|
Content References (Aliases)
In YAML, content references are a feature that allows you to reference and reuse data from one part of a YAML document in another part. They are indicated by an ampersand (&
) followed by an identifier, and then the same identifier preceded by an asterisk (*
) where the referenced content is to be used.
Content references in YAML provide a way to avoid duplicating data and promote reusability. They are particularly useful when you have complex data structures and want to refer to them multiple times within the same document. |
default_settings: &default_settings
install:
dir: /usr/local
owner: root
config:
enabled: false
run:
enabled: true
my_app_settings:
<<: *default_settings
install:
owner: my_user
group: my_group
{
"default_settings": {
"install": {
"dir": "/usr/local",
"owner": "root"
},
"config": {
"enabled": false
},
"run": {
"enabled": true
}
},
"my_app_settings": {
"install": {
"owner": "my_user",
"group": "my_group"
},
"config": {
"enabled": false
},
"run": {
"enabled": true
}
}
}