YAML (rhymes with camel) is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of agile programming languages. It is broadly useful for programming needs ranging from configuration files to Internet messaging to object persistence to data auditing.
There is no official YAML reference guide. The YAML website only offers the {yaml-spec}[YAML specification], which is a dense and thorny tome clearly aimed at implementers. I suspect this has greatly hampered YAML’s popularity. In the hopes of improving this situation, here is a very quick YAML overview that should describe the language almost entirely. Hopefully it’s useful whether or not you use Jekyll and J1.
Hopefully it’s useful whether or not you use Jekyll or J1.
Some parts of this document are taken from YAML Specification, version 1.2 published on www.yaml.org. As this page should be seen as a summary, some parts of the original article has been removed, slightly modified or shortened. For a full reference, please refer to the original document {yaml-spec}["YAML Ain’t Markup Language (YAML™) Version 1.2^", window="_blank"] or visit the official Web Site of YAML at {yaml-home}["www.yaml.org^", window="_blank"] for more information. |
Whats all about
YAML Ain’t Markup Language (abbreviated YAML) is a data serialization language designed to be human-friendly and work well with modern programming languages for common everyday tasks. This specification is both an introduction to the YAML language and the concepts supporting it, and also a complete specification of the information needed to develop applications for processing YAML.
Open, interoperable and readily understandable tools have advanced computing immensely. YAML was designed from the start to be useful and friendly to people working with data. It uses Unicode printable characters, some of which provide structural information and the rest containing the data itself. YAML achieves a unique cleanness by minimizing the amount of structural characters and allowing the data to show itself in a natural and meaningful way. For example, indentation
may be used for structure, colons
separate key: value pairs, and dashes
are used to create bullet lists.
There are endless flavors of data structures possible, but they can all be adequately represented with three basic primitives:
-
scalar
known asvariable
, a value as string or number -
sequence
known asarray
orlists
, a structure as an ordered list of values -
mapping
known ashash
ordirectory
, a structure as an unordered list ofkey:value
pairs
YAML leverages these primitives, and adds a simple typing system and aliasing mechanism to form a complete language for serializing any native data structure. While most programming languages can use YAML for data serialization, YAML excels in working with those languages that are fundamentally built around the three basic primitives. These include the new wave of agile languages such as Perl, Python, PHP, Ruby, and Javascript.
There are hundreds of different languages for programming, but only a handful of languages for storing and transferring data. Even though its potential is virtually boundless, YAML was specifically created to work well for common use cases such as: configuration files, log files, interprocess messaging, cross-language data sharing, object persistence, and debugging of complex data structures. When data is easy to view and understand, programming becomes a simpler task.
The design goals for YAML are, in decreasing priority:
-
YAML is easily readable by humans.
-
YAML data is portable between programming languages.
-
YAML matches the native data structures of agile languages.
-
YAML has a consistent model to support generic tools.
-
YAML supports one-pass processing.
-
YAML is expressive and extensible.
-
YAML is easy to implement and use.
Prior Art of YAML
YAML’s initial direction was set by the data serialization and markup language discussions among SML-DEV members. Later on, it directly incorporated experience from Ingy döt Net’s Perl module Data::Denter
. Since then, YAML has matured through ideas and support from its user community.
YAML integrates and builds upon concepts described by C, Java, Perl, Python, Ruby, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), XML, SAX, SOAP, and JSON. The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with that standard. Further, borrowing from MIME (RFC2045), YAML’s top-level production is a stream of independent documents, ideal for message-based distributed processing systems. |
YAML’s indentation-based scoping is similar to Python’s (without the ambiguities caused by tabs). Indented blocks facilitate easy inspection of the data’s structure. YAML’s literal style leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping. YAML also allows the use of traditional indicator-based scoping similar to JSON’s and Perl’s. Such flow content can be freely nested inside indented blocks.
YAML’s double-quoted style uses familiar C-style escape sequences. This enables ASCII encoding of non-printable or 8-bit (ISO 8859-1) characters such as “\x3B”. Non-printable 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as \u003B
and \U0000003B
.
Motivated by HTML’s end-of-line normalization, YAML’s line folding employs an intuitive method of handling line breaks. A single line break is folded into a single space, while empty lines are interpreted as line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the scalar content.
YAML’s core type system is based on the requirements of agile languages such as Perl, Python, and Ruby. YAML directly supports both collections (mappings, sequences) and scalars. Support for these common types enables programmers to use their language’s native data structures for YAML manipulation, instead of requiring a special document object model (DOM).
Like XML’s SOAP
, YAML supports serializing a graph of native data structures through an aliasing mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to represent rich data structures required for modern distributed computing. YAML provides globally unique type names using a namespace mechanism inspired by Java’s DNS-based package naming convention and XML’s URI-based namespaces. In addition, YAML allows for private types specific to a single application.
YAML was designed to support incremental interfaces that include both input (“getNextEvent()”) and output (“sendNextEvent()”) one-pass interfaces. Together, these enable YAML to support the processing of large documents (e.g. transaction logs) or continuous streams (e.g. feeds from a production machine).
Relation to JSON
Both JSON
and YAML
aim to be human readable data interchange formats. However, JSON and YAML have different priorities. JSON’s foremost design goal is simplicity and universality. Thus, JSON is trivial to generate and parse, at the cost of reduced human readability. It also uses a lowest common denominator information model, ensuring any JSON data can be easily processed by every modern programming environment.
In contrast, YAML’s foremost design goals are human readability and support for serializing arbitrary native data structures. Thus, YAML allows for extremely readable files, but is more complex to generate and parse. In addition, YAML ventures beyond the lowest common denominator data types, requiring more complex processing when crossing between different programming environments.
YAML can therefore be viewed as a natural superset of JSON, offering improved human readability and a more complete information model. This is also the case in practice; every JSON file is also a valid YAML file. This makes it easy to migrate from JSON to YAML if/when the additional features are required. |
JSON’s RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be. Technically, YAML therefore complies with the JSON spec, choosing to treat duplicates as an error. In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files.
It may be useful to define a intermediate format between YAML and JSON. Such a format would be trivial to parse (but not very human readable), like JSON. At the same time, it would allow for serializing arbitrary native data structures, like YAML. Such a format might also serve as YAML’s "canonical format". Defining such a “YSON” format (YSON is a Serialized Object Notation) can be done either by enhancing the JSON specification or by restricting the YAML specification. Such a definition is beyond the scope of this specification.
Relation to XML
YAML is primarily a data serialization language. XML
was designed to be backwards compatible with the Standard Generalized Markup Language SGML
, which was designed to support structured documentation. XML therefore had many design constraints placed on it that YAML does not share. XML is a pioneer in many domains, YAML is the result of lessons learned from XML and other technologies.
Newcomers to YAML often search for its correlation to the eXtensible Markup Language |
It should be mentioned that there are ongoing efforts to define standard XML/YAML mappings. This generally requires that a subset of each language be used. For more information on using both XML and YAML, please visit https://yaml.org.
Overall Structure and Design
As I see it, YAML has two primary goals: to support encoding any arbitrary data structure; and to be easily read and written by humans. If only the spec shared that last goal. Human-readability means that much of YAML’s syntax is optional, wherever it would be unambiguous and easier on a human. The trade-off is more complexity in parsers and emitters.
Here’s an example YML document, configuration for some hypothetical application:
database:
username: admin
password: foobar # TODO get prod passwords out of config
socket: /var/tmp/database.sock
options: {use_utf8: true}
memcached:
host: 10.0.0.99
workers:
- host: 10.0.0.101
port: 2301
- host: 10.0.0.102
port: 2302
This goes for JSON that way:
{
"database": {
"username": "admin",
"password": "foobar",
"socket": "/var/tmp/database.sock",
"options": {
"use_utf8": true
}
},
"memcached": {
"host": "10.0.0.99"
},
"workers": [
{
"host": "10.0.0.101",
"port": 2301
},
{
"host": "10.0.0.102",
"port": 2302
}
]
}
YAML often has more than one way to express the same data, leaving a human free to use whichever is most convenient. More convenient syntax tends to be more contextual or whitespace-sensitive. In the above document, you can see that indenting is enough to make a nested mapping. Integers and booleans are automatically distinguished from unquoted strings, as well.
General Syntax
YAML is designed around Unicode, not bytes, and its syntax assumes Unicode input. There is no syntactic mechanism for giving a character encoding; the parser is expected to recognize BOMs for UTF-8, UTF-16 and UTF-32 but otherwise a byte stream is assumed to be UTF-8.
As of 1.2, YAML is a strict superset of JSON. Any valid JSON can be parsed in the same structure with a YAML 1.2 parser. |
The only vertical whitespace characters are U+000A
(LINE FEED) and U+000D
(CARRIAGE RETURN). The only horizontal whitespace characters are U+0009
(TAB) and U+0020
(SPACE). Other control characters are not allowed anywhere.
YAML operates on streams, which can contain multiple distinct structures, each parsed individually. Each structure is called a document.
A document begins with triple dashes ---
and ends with triple dots …
. Both are optional, though a …
can only be followed by directives or ---
. You don’t see multiple documents very often, but it’s a very useful feature for sending intermittent chunks of data over a single network connection. With JSON you’d usually put each chunk on its own line and delimit with newlines; YAML has support built in.
Documents may be preceded by directives, in which case the ---
is required to indicate the end of the directives. Directives are a %
followed by an identifier and some parameters.
This is how directives are distinguished from a bare document without |
There are only two directives at the moment: %YAML
specifies the YAML version of the document, and %TAG
is used for tag shorthand, described in yamlref-more-tags. Use of directives is, again, fairly uncommon.
Comments may appear anywhere. #
begins a comment, and it runs until the end of the line. In most cases, comments are whitespace: they don’t affect indentation level, they can appear between any two tokens, and a comment on its own line is the same as a blank line. The few exceptions are not too surprising; for example, you can’t have a comment between the key and colon in key:
.
A YAML document is a graph of values, called nodes. See mor with yamlref-kinds.
Nodes may be prefixed with up to two properties: a tag and an anchor. Order doesn’t matter, and both are optional. Properties can be given to any value, regardless of kind or style.
Nodes
A YAML node is the representation of a particular data stucture. Nodes may contain other nodes. A YAML node represents a single native data structure. Such nodes have content of one of three kinds:
-
scalar
-
sequence
-
mapping
In addition, each node has a tag which serves to restrict the set of possible values the content can have.
The content of a scalar node is an opaque datum that can be presented as a series of zero or more Unicode characters.
The content of a sequence node is an ordered series of zero or more nodes. In particular, a sequence may contain the same node more than once. It could even contain itself (directly or indirectly).
The content of a mapping node is an unordered set of key:value
pairs, with the restriction that each of the keys is unique
. YAML places no further restrictions on the nodes. In particular, keys may be arbitrary nodes, the same node may be used as the value of several key:value pairs, and a mapping could even contain itself as a key or a value (directly or indirectly).
When appropriate, it is convenient to consider sequences and mappings together, as collections. In this view, sequences are treated as mappings with integer keys starting at zero. Having a unified collections view for sequences and mappings is helpful both for theoretical analysis and for creating practical YAML tools and APIs. This strategy is also used by the Javascript programming language. |
Tags
YAML represents type information of native data structures with a simple identifier, called a tag. Global tags are URIs
and hence globally unique across all applications. The tag:
URI scheme is recommended for all global YAML tags. In contrast, local tags are specific to a single application. Local tags start with an exclamation mark !
, are not URIs and are not expected to be globally unique. YAML provides a TAG
directive to make tag notation less verbose; it also offers easy migration from local to global tags. To ensure this, local tags are restricted to the URI character set and use URI character escaping.
YAML does not mandate any special relationship between different tags that begin with the same substring. Tags ending with URI fragments (containing a hash mark #
) are no exception; tags that share the same base URI but differ in their fragment part are considered to be different, independent tags. By convention, fragments are used to identify different variants of a tag, while /
is used to define nested tag namespace
hierarchies.
However, this is merely a convention, and each tag may employ its own rules. For example, Perl tags may use a double colon ::
to express namespace hierarchies, Java tags may use dots .
etc.
YAML tags are used to associate meta information with each node. In particular, each tag must specify the expected node kind (scalar, sequence, or mapping). Scalar tags must also provide a mechanism for converting formatted content to a canonical form for supporting equality testing. Furthermore, a tag may provide additional information such as the set of allowed content values for validation, a mechanism for tag resolution, or any other data that is applicable to all of the tag’s nodes. |
Tags are prefixed with exclamation mark !
and describe the type of a node. This allows for adding new types without having to extend the syntax or mingle type information with data. Omitting the tag leaves the type to the parser’s discretion; usually that means you’ll get arrays|lists, hashes|dictionaries, strings, numbers, and other simple data types.
You’ll probably only see tags in two forms:
-
!foo
is a "local" tag, used for some custom type that’s specific to the document -
!!bar
is a built-in YAML type
Most of these are inferred from plain data !!seq
for sequences, !!int
for numbers, and so on but a few don’t have dedicated syntax and have to be given explicitly. For example, !!binary
is used for representing arbitrary binary data encoded as base64. So !!binary aGVsbG8=
would be parsed as the bytestring hello
.
There’s much more to tags, most of which is rarely used in practice. Read more on the {yaml-tag-repository}[Language-independent YAML tags, window="blank"].
Anchors
The other node property is the anchor, which is how YAML can store recursive data structures. Anchor names are prefixed with &
and can’t contain whitespace, brackets, braces, or commas.
An alias node is an anchor name prefixed with a star mark *
and indicates that the node with that anchor name should occur in both places. (Alias nodes can’t have properties themselves; the properties of the anchored node are used.) For example, you might share configuration:
host1:
&common-host
os: linux
arch: x86_64
host2: *common-host
Or serialize a list that contains itself:
&me [*me]
This is not a copy. The exact same value is reused. |
Anchor names act somewhat like variable assignments: at any point in the document, the parser only knows about the anchors it’s seen so far, and a second anchor with the same name takes precedence. This means that aliases cannot refer to anchors that appear later in the document.
Anchor names aren’t intended to carry information, which unfortunately means that most YAML parsers throw them away, and re-serializing a document will get you anchor names like ANCHOR1
.
Data Structures
Data Structures come in one of three kinds, which reflect the general shape of the data. Scalars are individual values, sequences are ordered collections and mappings are unordered associations. Each can be written in either a whitespace-sensitive block style or a more compact and explicit flow style.
Block Style
YAML’s block styles employ indentation rather than indicators to denote structure. This results in a more human readable (though less compact) notation. Block scalar styles using the greater mark >
or pipe |
allow escaping and add a new line (\n) to the end of your string.
Key: >
this is my very very very
long string
results in:
this is my very very very long string\n
Flow Style
YAML’s flow styles can be thought of as the natural extension of JSON to cover folding long content lines for readability, tagging nodes to control construction of native data structures, and using anchors and aliases to reuse constructed object instances.
The Flow Style is using explicit indicators rather than indentation to denote scope. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curly braces. |
- [name , hr, avg ]
- [Mark McGwire, 65, 0.278]
- [Sammy Sosa , 63, 0.288]
Scalars
Most values in a YAML document will be plain scalars. They’re defined by exclusion: if it’s not anything else, it’s a plain scalar. Technically, they can only be flow style, so they’re really "plain flow scalar style" scalars.
Plain scalars are the most flexible kind of value, and may resolve to a variety of types from the YAML tag repository:
-
Integers become, well, integers (
!!int
). Leading0
,0b
, and0x
are recognized as octal, binary, and hexadecimal._
is allowed, and ignored. Curiously,:
is allowed and treated as a base 60 delimiter, so you can write a time as1:59
and it’ll be loaded as the number of seconds, 119. -
Floats become floats (
!!float
). Scientific notation usinge
is also recognized. As with integers,_
is ignored and:
indicates base 60, though only the last component can have a fractional part. Positive infinity, negative infinity, and not-a-number are recognized with a leading dot:.inf
,-.inf
, and.nan
. -
true
andfalse
become booleans (!!bool
).y
,n
,yes
,no
,on
, andoff
are allowed as synonyms. Uppercase and title case are also recognized. -
~
andnull
become nulls (!!null
), which isNone
in Python. A completely empty value also becomes null. -
ISO8601 dates are recognized (
!!timestamp
), with whitespace allowed between the date and time. The time is also optional, and defaults to midnight UTC. -
=
is a special value (!!value
) used as a key in mappings. I’ve never seen it actually used, and the thing it does is nonsense in many languages anyway, so don’t worry about it. Just remember you can’t use=
as a plain string. -
<<
is another special value (!!merge
) used as a key in mappings. This one is actually kind of useful; it’s described below in yamlref-merge-keys.
The YAML spec has a notion of schemas, sets of types which are recognized. The recommended schema is "core", which doesn’t actually require |
Otherwise, it’s a string. Well. Probably. As part of tag resolution (see yamlref-more-tags), an application is allowed to parse plain scalars however it wants; you might add logic that parses 1..5
as a range type, or you might recognize keywords and replace them with special objects. But if you’re doing any of that, you’re hopefully aware of it.
Between the above parsing and conflicts with the rest of YAML’s syntax, for a plain scalar to be a string, it must meet these restrictions:
-
It must not be
true
,false
,yes
,no
,y
,n
,on
,off
,null
, or any of those words in uppercase or title case, which would all be parsed as booleans or nulls. -
It must not be
~
, which is null. If it’s a mapping key, it must not be=
or<<
, which are special key values. -
It must not be something that looks like a number or timestamp. I wouldn’t bet on anything that consists exclusively of digits, dashes, underscores, and colons.
-
The first character must not be any of:
[
]
{
}
,
#
&
*
!
|
>
'
"
%
@
`
. All of these are YAML syntax for some other kind of construct. -
If the first character is
?
,:
, or-
, the next character must not be whitespace. Otherwise it’ll be parsed as a block mapping or sequence. -
It must not contain
 #
or:Â
, which would be parsed as a comment or a key. A hash not preceded by space or a colon not followed by space is fine. -
If the string is inside a flow collection (i.e., inside
[…]
or{…}
), it must not contain any of[
]
{
}
,
, which would all be parsed as part of the collection syntax. -
Leading and trailing whitespace are ignored.
-
If the string is broken across lines, then the newline and any adjacent whitespace are collapsed into a single space.
That actually leaves you fairly wide open; the biggest restriction is on the first character. You can have spaces, you can wrap across lines, you can include whatever (non-control) Unicode you want.
If you need explicit strings, you have some other options.
Strings
YAML has lots of ways to write explicit strings. Aside from plain scalars, there are two other flow scalar styles.
Single-quoted strings are surrounded by '
. Single quotes may be escaped as ''
, but otherwise no escaping is done at all. You may wrap over multiple lines, but the newline and any surrounding whitespace becomes a single space. A line containing only whitespace becomes a newline.
Double-quoted strings are surrounded by "
. Backslash escapes are recognized:
Sequence | Result |
---|---|
| U+0000 NULL |
| U+0007 BELL |
| U+0008 BACKSPACE |
| U+0009 CHARACTER TABULATION |
| U+000A LINE FEED |
| U+000B LINE TABULATION |
| U+000C FORM FEED |
| U+000D CARRIAGE RETURN |
| U+001B ESCAPE |
| U+0022 QUOTATION MARK |
| U+002F SOLIDUS |
| U+005C REVERSE SOLIDUS |
| U+0085 NEXT LINE |
| U+00A0 NO-BREAK SPACE |
| U+2028 LINE SEPARATOR |
| U+2029 PARAGRAPH SEPARATOR |
| Unicode character |
| Unicode character |
| Unicode character |
As usual, you may wrap a double-quoted string across multiple lines, but the newline and any surrounding whitespace becomes a single space. As with single-quoted strings, a line containing only whitespace becomes a newline. You can escape spaces and tabs to protect them from being thrown away. You can also escape a newline to preserve any trailing whitespace on that line, but throw away the newline and any leading whitespace on the next line.
These rules are weird, so here’s a contrived example:
"line \
one
line two\n\
\ \ line three\nline four\n
line five
"
Which becomes:
line one
line two
line three
line four
Right, well, I hope that clears that up.
There are also two block scalar styles, both consisting of a header followed by an indented block. The header is usually just a single character, indicating which block style to use.
|
indicates literal style, which preserves all newlines in the indented block. >
indicates folded style, which performs the same line folding as with quoted strings. Escaped characters are not recognized in either style. Indentation, the initial newline, and any leading blank lines are always ignored.
So to represent this string:
This is paragraph one.
This is paragraph two.
You could use either literal style:
|
This is paragraph one.
This is paragraph two.
Or folded style:
>
This is
paragraph one.
This
is paragraph
two.
Obviously folded style is more useful if you have paragraphs with longer lines. Note that there are two blank lines between paragraphs in folded style; a single blank line would be parsed as a single newline.
The header has some other features, but I’ve never seen them used. It consists of up to three parts, with no intervening whitespace.
-
The character indicating which block style to use.
-
Optionally, the indentation level of the indented block, relative to its parent. You only need this if the first line of the block starts with a space, because the space would be interpreted as indentation.
-
Optionally, a "chomping" indicator. The default behavior is to include the final newline as part of the string, but ignore any subsequent empty lines. You can use
-
here to ignore the final newline as well, or use+
to preserve all trailing whitespace verbatim.
You can put a comment on the same line as the header, but a comment on the next line would be interpreted as part of the indented block. You can also put a tag or an anchor before the header, as with any other node.
Collections
A collection is the generic term for a YAML data grouping. YAML has two types of collections:
-
sequences
-
mappings
Sequences
Sequences are ordered collections, with type !!seq
. They’re pretty simple.
Flow style is a comma-delimited list in square brackets, just like JSON: [one, two, 3]
. A trailing comma is allowed, and whitespace is generally ignored. The contents must also be written in flow style.
Block style is written like a bulleted list:
- one
- two
- 3
- a plain scalar that's
wrapped across multiple lines
Indentation determines where each element ends, and where the entire sequence ends.
Other blocks may be nested without intervening newlines:
- - one one
- one two
- - two one
- two two
Mappings
A mapping is a YAML collection defined by key:value pairs. Mappings are unordered, er, mappings, with type !!map
. The keys must be unique, but may be of any type. Also, they’re unordered.
Did I mention that mappings are unordered? The order of the keys in the document is irrelevant and arbitrary. If you need order, you need a sequence.
Flow style looks unsurprisingly like JSON: {x: 1, y: 2}
. Again, a trailing comma is allowed, and whitespace doesn’t matter.
As a special case, inside a sequence, you can write a single-pair mapping without the braces. So [a: b, c: d, e: f]
is a sequence containing three mappings. This is allowed in block sequences too, and is used for the ordered mapping type !!omap
.
Block style is actually a little funny. The canonical form is a little surprising:
? x
: 1
? y
: 2
?
introduces a key, and :
introduces a value. You very rarely see this form, because the ?
is optional as long as the key and colon are all on one line (to avoid ambiguity) and the key is no more than 1024 characters long (to avoid needing infinite lookahead).
So that’s more commonly written like this:
x: 1
y: 2
The explicit ?
syntax is more useful for complex keys. For example, it’s the only way to use block styles in the key:
? >
If a train leaves Denver at 5:00 PM traveling at 90 MPH, and another
train leaves New York City at 10:00 PM traveling at 80 MPH, by how many
minutes are you going to miss your connection?
: Depends whether we're on Daylight Saving Time or not.
Other than the syntactic restrictions, an implicit key isn’t special in any way and can also be of any type:
true: false
null: null
up: down
[0, 1]: [1, 0]
It’s fairly uncommon to see anything but strings as keys, though, since languages often don’t support it. Python can’t have lists and dicts as dict keys; Perl 5 and JavaScript only support string keys; and so on.
Unlike sequences, you may not nest another block inside a block mapping on the same line.
This is invalid:
one: two: buckle my shoe
But this is fine:
- one: 1
two: 2
- three: 3
four: 4
You can also nest a block sequence without indenting like:
foods:
- burger
- fries
drinks:
- soda
- iced tea
One slight syntactic wrinkle: in either style, the colon must be followed by whitespace. foo:bar
is a single string, remember. (For JSON’s sake, the whitespace can be omitted, if the colon immediately follows a flow sequence, a flow mapping, or a quoted string.)
Advanced Syntax
More on Strings
YAML has lots of ways to write explicit strings. Aside from plain scalars, there are two other flow scalar styles.
Single-quoted strings are surrounded by single-quotes '
. Single-quotes may be escaped by double-quotes ''
, but otherwise no escaping is done at all. You may wrapover multiple lines, but the newline and any surrounding whitespace becomes a single space. A line containing only whitespace becomes a newline.
Double-quoted strings are surrounded by "
. Backslash escapes are recognized:
Sequence | Result |
---|---|
| U+0000 NULL |
| U+0007 BELL |
| U+0008 BACKSPACE |
| U+0009 CHARACTER TABULATION |
| U+000A LINE FEED |
| U+000B LINE TABULATION |
| U+000C FORM FEED |
| U+000D CARRIAGE RETURN |
| U+001B ESCAPE |
| U+0022 QUOTATION MARK |
| U+002F SOLIDUS |
| U+005C REVERSE SOLIDUS |
| U+0085 NEXT LINE |
| U+00A0 NO-BREAK SPACE |
| U+2028 LINE SEPARATOR |
| U+2029 PARAGRAPH SEPARATOR |
| Unicode character |
| Unicode character |
| Unicode character |
As usual, you may wrap a double-quoted
string across multiple lines, but the newline and any surrounding whitespace becomes a single space. As with single-quoted
strings, a line containing only whitespace becomes a newline. You can escape spaces and tabs to protect them from being thrown away. You can also escape a newline to preserve any trailing whitespace on that line, but throw away the newline and any leading whitespace on the next line.
These rules are weird, so here’s a contrived example:
"line \
one
line two\n\
\ \ line three\nline four\n
line five
"
Which becomes:
line one
line two
line three
line four
Right, well. Hopefully that clears that up.
Block scalar styles
There are also two block scalar styles, both consisting of a header followed by an indented block. The header is usually just a single character, indicating which block style to use.
A pipe |
indicates the literal style which preserves all newlines in the indented block.
A greater mark >
indicates the folded style which performs the same line folding as with quoted strings. Escaped characters are not recognized in either style. Indentation, the initial newline, and any leading blank lines are always ignored.
So to represent this string:
This is paragraph one.
This is paragraph two.
You could use either literal style
:
|
This is paragraph one.
This is paragraph two.
Or folded style
:
>
This is
paragraph one.
This
is paragraph
two.
Obviously folded style is more useful if you have paragraphs with longer lines. Note that there are two blank lines between paragraphs in folded style; a single blank line would be parsed as a single newline.
The header has some other features, but I’ve never seen them used. It consists of up to three parts, with no intervening whitespace.
-
The character indicating which block style to use.
-
Optionally, the indentation level of the indented block, relative to its parent. You only need this if the first line of the block starts with a space, because the space would be interpreted as indentation.
-
Optionally, a "chomping" indicator. The default behavior is to include the final newline as part of the string, but ignore any subsequent empty lines. You can use
-
here to ignore the final newline as well, or use+
to preserve all trailing whitespace verbatim.
You can put a comment on the same line as the header, but a comment on the next line would be interpreted as part of the indented block. You can also put a tag or an anchor before the header, as with any other node.
Block styles with chomping
You can control the handling of the final new line in the string, and any trailing blank lines (\n\n) by adding a block chomping indicator
character:
>
, |
, keep the line feed, remove the trailing blank lines.
>-
, |-
, Stripping is specified by the -
chomping indicator. In this case, the final line break and any trailing empty lines are excluded from the scalar’s content. Stripping remove the line feed and remove the trailing blank lines.
>`, `|
, keep the line feed, keep trailing blank lines.
See the following examples how chomping controls how final line breaks and trailing empty lines are interpreted.
- >
very "long"
'string' with
paragraph gap, \n and
spaces.
- |
very "long"
'string' with
paragraph gap, \n and
spaces.
- very "long"
'string' with
paragraph gap, \n and
spaces.
- "very \"long\"
'string' with
paragraph gap, \n and
spaces."
- 'very "long"
''string'' with
paragraph gap, \n and
spaces.'
- >-
very "long"
'string' with
paragraph gap, \n and
spaces.
Renders to JSON as:
[
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.\n",
"very \"long\"\n'string' with\n\nparagraph gap, \\n and \nspaces.\n",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces.",
"very \"long\" 'string' with\nparagraph gap, \\n and spaces."
]
More on Sequences
Merge Keys
Merge keys are written <<
and are of type !!merge
. A merge key should have another mapping (or sequence of mappings) as its value. Each mapping is merged into the containing mapping, with any existing keys left alone. The actual <<
key is never shown to the application.
This is generally used in conjunction with anchors to share default values:
defaults: &DEFAULTS
use-tls: true
verify-host: true
host1:
<<: *DEFAULTS
hostname: example.com
host2:
<<: *DEFAULTS
hostname: example2.com
host3:
<<: *DEFAULTS
hostname: example3.com
# we have a really, really good reason for doing this, really
verify-host: false
More on Tags
!!str
is actually an illusion.
Tag names are actually URIs, using UTF-8 percent-encoding. YAML suggests using the tag:
scheme and your domain name to help keep tags globally unique; for example, the string tag is really tag:yaml.org,2002:str
. (Domain names can change hands over time, hence the inclusion of a year.)
That’s quite a mouthful, and wouldn’t be recognized as a tag anyway, because tags have to start with !
. So tags are written in shorthand with a prefix, like !foo!bar
. The !foo!
is a named tag handle that expands to a given prefix, kind of like XML namespacing. Named tag handles must be defined by a %TAG
directive before the document:
%TAG !foo! tag:example.com,2015:app/
A tag of !foo!bar
would then resolve to tag:example.com,2015:app/bar
.
I’ve never seen %TAG
used in practice. Instead, everyone uses the two special tag handles.
-
The primary tag handle is
!
, which by default expands to!
. So!bar
just resolves to!bar
, a local tag, specific to the document and not expected to be unique. -
The secondary tag handle is
!!
, which by default expands totag:yaml.org,2002:
, the prefix YAML uses for its own built-in types. So!!bar
resolves totag:yaml.org,2002:bar
, and the tag for a string would more commonly be written as!!str
. Defining new tags that use!!
is impolite.
Both special handles can be reassigned with %TAG
, just like any other handle. An important (and confusing) point here is that the resolved name determines whether or not a tag is local; how it’s written is irrelevant. You’re free to do this:
%TAG !foo! !foo-types/
Now !foo!bar
is shorthand for !foo-types/bar
, which is a local tag. You can also do the reverse:
%TAG ! tag:example.com,2015:legacy-types/
Which would make !bar
a global tag! This is deliberate, as a quick way to convert an entire document from local tags to global tags.
You can reassign !!
, too. But let’s not.
Tags can also be written verbatim as !<foo>
, in which case foo
is taken to be the resolved final name of the tag, ignoring %TAG
and any other resolution mechanism. This is the only way to write a global tag without using %TAG
, since tags must start with a !
.
Every node has a tag, whether it’s given one explicitly or not. Nodes without explicit tags are given one of two special non-specific tags: !
for quoted and folded scalars; or ?
for sequences, mappings, and plain scalars.
The ?
tag tells the application to do tag resolution. Technically, this means the application can do any kind of arbitrary inspection to figure out the type of the node. In practice, it just means that scalars are inspected to see whether they’re booleans, integers, floats, whatever else, or just strings.
The !
tag forces a node to be interpreted as a basic built-in type, based on its kind: !!str
, !!seq
, or !!map
. You can explicitly give the !
tag to a node if you want, for example writing ! true
or ! 133
to force parsing as strings. Or you could use quotes. Just saying.