21 KiB
Tutorial
This tutorial introduces the basics of the Document Object Model(DOM) API.
As shown in Usage at a glance, a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON.
[TOC]
Value & Document
Each JSON value is stored in a type called Value
. A
Document
, representing the DOM, contains the root
Value
of the DOM tree. All public types and functions of
RapidJSON are defined in the rapidjson
namespace.
Query Value
In this section, we will use excerpt of
example/tutorial/tutorial.cpp
.
Assumes we have a JSON stored in a C string
(const char* json
): ~~~~~~~~js { “hello”: “world”,
“t”: true , “f”: false, “n”: null, “i”: 123, “pi”: 3.1416, “a”: [1, 2,
3, 4] }~~~~~~~~
Parse it into a Document
: ~~~~~~~~~~cpp #include
“rapidjson/document.h”
using namespace rapidjson;
// … Document document; document.Parse(json); ~~~~~~~~~~
The JSON is now parsed into document
as a DOM
tree:
Since the update to RFC 7159, the root of a conforming JSON document
can be any JSON value. In earlier RFC 4627, only objects or arrays were
allowed as root values. In this case, the root is an object.
~~~~~~~~cpp assert(document.IsObject());~~~~~~~~
Let’s query whether a "hello"
member exists in the root
object. Since a Value
can contain different types of value,
we may need to verify its type and use suitable API to obtain the value.
In this example, "hello"
member associates with a JSON
string. ~~~~~~~~cpp assert(document.HasMember(“hello”));
assert(document[“hello”].IsString()); printf(“hello = %s”,
document[“hello”].GetString());~~~~~~~~
world
JSON true/false values are represented as bool
.
~~~~~~~~cpp assert(document[“t”].IsBool()); printf(“t = %s”,
document[“t”].GetBool() ? “true” : “false”);~~~~~~~~
true
JSON null can be queryed by IsNull()
. ~~~~~~~~cpp
printf(“n = %s”, document[“n”].IsNull() ? “null” :
“?”);~~~~~~~~
null
JSON number type represents all numeric values. However, C++ needs more specific type for manipulation.
assert(document["i"].IsNumber());
// In this case, IsUint()/IsInt64()/IsUInt64() also return true.
assert(document["i"].IsInt());
("i = %d\n", document["i"].GetInt());
printf// Alternative (int)document["i"]
assert(document["pi"].IsNumber());
assert(document["pi"].IsDouble());
("pi = %g\n", document["pi"].GetDouble()); printf
i = 123
pi = 3.1416
JSON array contains a number of elements. ~~~~~~~~cpp // Using a
reference for consecutive access is handy and faster. const Value& a
= document[“a”]; assert(a.IsArray()); for (SizeType i = 0; i <
a.Size(); i++) // Uses SizeType instead of size_t printf(“a[%d] = %d”,
i, a[i].GetInt());~~~~~~~~
a[0] = 1
a[1] = 2
a[2] = 3
a[3] = 4
Note that, RapidJSON does not automatically convert values between
JSON types. If a value is a string, it is invalid to call
GetInt()
, for example. In debug mode it will fail an
assertion. In release mode, the behavior is undefined.
In the following, details about querying individual types are discussed.
Query Array
By default, SizeType
is typedef of
unsigned
. In most systems, array is limited to store up to
2^32-1 elements.
You may access the elements in array by integer literal, for example,
a[0]
, a[1]
, a[2]
.
Array is similar to std::vector
, instead of using
indices, you may also use iterator to access all the elements.
~~~~~~~~cpp for (Value::ConstValueIterator itr = a.Begin(); itr !=
a.End(); ++itr) printf(“%d”, itr->GetInt());~~~~~~~~
And other familiar query functions: *
SizeType Capacity() const
*
bool Empty() const
Query Object
Similar to array, we can access all object members by iterator:
static const char* kTypeNames[] =
{ "Null", "False", "True", "Object", "Array", "String", "Number" };
for (Value::ConstMemberIterator itr = document.MemberBegin();
!= document.MemberEnd(); ++itr)
itr {
("Type of member %s is %s\n",
printf->name.GetString(), kTypeNames[itr->value.GetType()]);
itr}
Type of member hello is String
Type of member t is True
Type of member f is False
Type of member n is Null
Type of member i is Number
Type of member pi is Number
Type of member a is Array
Note that, when operator[](const char*)
cannot find the
member, it will fail an assertion.
If we are unsure whether a member exists, we need to call
HasMember()
before calling
operator[](const char*)
. However, this incurs two lookup. A
better way is to call FindMember()
, which can check the
existence of member and obtain its value at once:
::ConstMemberIterator itr = document.FindMember("hello");
Valueif (itr != document.MemberEnd())
("%s %s\n", itr->value.GetString()); printf
Querying Number
JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser.
As C++ provides several integer and floating point number types, the DOM tries to handle these with widest possible range and good performance.
When a Number is parsed, it is stored in the DOM as either one of the following type:
Type | Description |
---|---|
unsigned |
32-bit unsigned integer |
int |
32-bit signed integer |
uint64_t |
64-bit unsigned integer |
int64_t |
64-bit signed integer |
double |
64-bit double precision floating point |
When querying a number, you can check whether the number can be obtained as target type:
Checking | Obtaining |
---|---|
bool IsNumber() |
N/A |
bool IsUint() |
unsigned GetUint() |
bool IsInt() |
int GetInt() |
bool IsUint64() |
uint64_t GetUint64() |
bool IsInt64() |
int64_t GetInt64() |
bool IsDouble() |
double GetDouble() |
Note that, an integer value may be obtained in various ways without
conversion. For example, A value x
containing 123 will make
x.IsInt() == x.IsUint() == x.IsInt64() == x.IsUint64() == true
.
But a value y
containing -3000000000 will only makes
x.IsInt64() == true
.
When obtaining the numeric values, GetDouble()
will
convert internal integer representation to a double
. Note
that, int
and unsigned
can be safely convert
to double
, but int64_t
and
uint64_t
may lose precision (since mantissa of
double
is only 52-bits).
Query String
In addition to GetString()
, the Value
class
also contains GetStringLength()
. Here explains why.
According to RFC 4627, JSON strings can contain Unicode character
U+0000
, which must be escaped as "\u0000"
. The
problem is that, C/C++ often uses null-terminated string, which treats
`\0'
as the terminator symbol.
To conform RFC 4627, RapidJSON supports string containing
U+0000
. If you need to handle this, you can use
GetStringLength()
API to obtain the correct length of
string.
For example, after parsing a the following JSON to
Document d
:
"s" : "a\u0000b" } {
The correct length of the value "a\u0000b"
is 3. But
strlen()
returns 1.
GetStringLength()
can also improve performance, as user
may often need to call strlen()
for allocating buffer.
Besides, std::string
also support a constructor:
(const char* s, size_t count); string
which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.
Comparing values
You can use ==
and !=
to compare values.
Two values are equal if and only if they are have same type and
contents. You can also compare values with primitive types. Here is an
example.
if (document["hello"] == document["n"]) /*...*/; // Compare values
if (document["hello"] == "world") /*...*/; // Compare value with literal string
if (document["i"] != 123) /*...*/; // Compare with integers
if (document["pi"] != 3.14) /*...*/; // Compare with double.
Array/object compares their elements/members in order. They are equal if and only if their whole subtrees are equal.
Note that, currently if an object contains duplicated named member,
comparing equality with any object is always false
.
Create/Modify Values
There are several ways to create values. After a DOM tree is created
and/or modified, it can be saved as JSON again using
Writer
.
Change Value Type
When creating a Value or Document by default constructor, its type is
Null. To change its type, call SetXXX()
or assignment
operator, for example:
; // Null
Document d.SetObject();
d
; // Null
Value v.SetInt(10);
v= 10; // Shortcut, same as above v
Overloaded Constructors
There are also overloaded constructors for several types:
(true); // calls Value(bool)
Value b(-123); // calls Value(int)
Value i(123u); // calls Value(unsigned)
Value u(1.5); // calls Value(double) Value d
To create empty object or array, you may use
SetObject()
/SetArray()
after default
constructor, or using the Value(Type)
in one shot:
(kObjectType);
Value o(kArrayType); Value a
Move Semantics
A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example,
(123);
Value a(456);
Value b= a; // a becomes a Null value, b becomes number 123. b
Why? What is the advantage of this semantics?
The simple answer is performance. For fixed size JSON types (Number, True, False, Null), copying them is fast and easy. However, For variable size JSON types (String, Array, Object), copying them will incur a lot of overheads. And these overheads are often unnoticed. Especially when we need to create temporary object, copy it to another variable, and then destruct it.
For example, if normal copy semantics was used:
;
Document d(kObjectType);
Value o{
(kArrayType);
Value contacts// adding elements to contacts array.
// ...
.AddMember("contacts", contacts, d.GetAllocator()); // deep clone contacts (may be with lots of allocations)
o// destruct contacts.
}
The object o
needs to allocate a buffer of same size as
contacts, makes a deep clone of it, and then finally contacts is
destructed. This will incur a lot of unnecessary
allocations/deallocations and memory copying.
There are solutions to prevent actual copying these data, such as reference counting and garbage collection(GC).
To make RapidJSON simple and fast, we chose to use move
semantics for assignment. It is similar to std::auto_ptr
which transfer ownership during assignment. Move is much faster and
simpler, it just destructs the original value, memcpy()
the
source to destination, and finally sets the source as Null type.
So, with move semantics, the above example becomes:
;
Document d(kObjectType);
Value o{
(kArrayType);
Value contacts// adding elements to contacts array.
.AddMember("contacts", contacts, d.GetAllocator()); // just memcpy() of contacts itself to the value of new member (16 bytes)
o// contacts became Null here. Its destruction is trivial.
}
This is called move assignment operator in C++11. As RapidJSON
supports C++03, it adopts move semantics using assignment operator, and
all other modifying function like AddMember()
,
PushBack()
.
Move semantics and temporary values
Sometimes, it is convenient to construct a Value in place, before
passing it to one of the “moving” functions, like
PushBack()
or AddMember()
. As temporary
objects can’t be converted to proper Value references, the convenience
function Move()
is available:
(kArrayType);
Value a::AllocatorType& allocator = document.GetAllocator();
Document// a.PushBack(Value(42), allocator); // will not compile
.PushBack(Value().SetInt(42), allocator); // fluent API
a.PushBack(Value(42).Move(), allocator); // same as above a
Create String
RapidJSON provide two strategies for storing string.
- copy-string: allocates a buffer, and then copy the source data into it.
- const-string: simply store a pointer of string.
Copy-string is always safe because it owns a copy of the data. Const-string can be used for storing string literal, and in-situ parsing which we will mentioned in Document section.
To make memory allocation customizable, RapidJSON requires user to pass an instance of allocator, whenever an operation may require allocation. This design is needed to prevent storing a allocator (or Document) pointer per Value.
Therefore, when we assign a copy-string, we call this overloaded
SetString()
with allocator:
;
Document document;
Value authorchar buffer[10];
int len = sprintf(buffer, "%s %s", "Milo", "Yip"); // dynamically created string.
.SetString(buffer, len, document.GetAllocator());
author(buffer, 0, sizeof(buffer));
memset// author.GetString() still contains "Milo Yip" after buffer is destroyed
In this example, we get the allocator from a Document
instance. This is a common idiom when using RapidJSON. But you may use
other instances of allocator.
Besides, the above SetString()
requires length. This can
handle null characters within a string. There is another
SetString()
overloaded function without the length
parameter. And it assumes the input is null-terminated and calls a
strlen()
-like function to obtain the length.
Finally, for string literal or string with safe life-cycle can use
const-string version of SetString()
, which lacks allocator
parameter. For string literals (or constant character arrays), simply
passing the literal as parameter is safe and efficient:
;
Value s.SetString("rapidjson"); // can contain null character, length derived at compile time
s= "rapidjson"; // shortcut, same as above s
For character pointer, the RapidJSON requires to mark it as safe
before using it without copying. This can be achieved by using the
StringRef
function:
const char * cstr = getenv("USER");
size_t cstr_len = ...; // in case length is available
;
Value s// s.SetString(cstr); // will not compile
.SetString(StringRef(cstr)); // ok, assume safe lifetime, null-terminated
s= StringRef(cstr); // shortcut, same as above
s .SetString(StringRef(cstr,cstr_len)); // faster, can contain null character
s= StringRef(cstr,cstr_len); // shortcut, same as above s
Modify Array
Value with array type provides similar APIs as
std::vector
.
Clear()
Reserve(SizeType, Allocator&)
Value& PushBack(Value&, Allocator&)
template <typename T> GenericValue& PushBack(T, Allocator&)
Value& PopBack()
ValueIterator Erase(ConstValueIterator pos)
ValueIterator Erase(ConstValueIterator first, ConstValueIterator last)
Note that, Reserve(...)
and PushBack(...)
may allocate memory for the array elements, therefore require an
allocator.
Here is an example of PushBack()
:
(kArrayType);
Value a::AllocatorType& allocator = document.GetAllocator();
Document
for (int i = 5; i <= 10; i++)
.PushBack(i, allocator); // allocator is needed for potential realloc().
a
// Fluent interface
.PushBack("Lua", allocator).PushBack("Mio", allocator); a
Differs from STL, PushBack()
/PopBack()
returns the array reference itself. This is called fluent
interface.
If you want to add a non-constant string or a string without sufficient lifetime (see Create String) to the array, you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a temporary value in place:
// in-place Value parameter
.PushBack(Value("copy", document.GetAllocator()).Move(), // copy string
contact.GetAllocator());
document
// explicit parameters
("key", document.GetAllocator()); // copy string
Value val.PushBack(val, document.GetAllocator()); contact
Modify Object
Object is a collection of key-value pairs (members). Each key must be a string value. To modify an object, either add or remove members. THe following APIs are for adding members:
Value& AddMember(Value&, Value&, Allocator& allocator)
Value& AddMember(StringRefType, Value&, Allocator&)
template <typename T> Value& AddMember(StringRefType, T value, Allocator&)
Here is an example.
(kObject);
Value contact.AddMember("name", "Milo", document.GetAllocator());
contact.AddMember("married", true, document.GetAllocator()); contact
The name parameter with StringRefType
is similar to the
interface of SetString
function for string values. These
overloads are used to avoid the need for copying the name
string, as constant key names are very common in JSON objects.
If you need to create a name from a non-constant string or a string without sufficient lifetime (see Create String), you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a temporary value in place:
// in-place Value parameter
.AddMember(Value("copy", document.GetAllocator()).Move(), // copy string
contact().Move(), // null value
Value.GetAllocator());
document
// explicit parameters
("key", document.GetAllocator()); // copy string name
Value key(42); // some value
Value val.AddMember(key, val, document.GetAllocator()); contact
For removing members, there are several choices:
bool RemoveMember(const Ch* name)
: Remove a member by search its name (linear time complexity).bool RemoveMember(const Value& name)
: same as above butname
is a Value.MemberIterator RemoveMember(MemberIterator)
: Remove a member by iterator (constant time complexity).MemberIterator EraseMember(MemberIterator)
: similar to the above but it preserves order of members (linear time complexity).MemberIterator EraseMember(MemberIterator first, MemberIterator last)
: remove a range of members, preserves order (linear time complexity).
MemberIterator RemoveMember(MemberIterator)
uses a
“move-last” trick to achieve constant time complexity. Basically the
member at iterator is destructed, and then the last element is moved to
that position. So the order of the remaining members are changed.
Deep Copy Value
If we really need to copy a DOM tree, we can use two APIs for deep
copy: constructor with allocator, and CopyFrom()
.
;
Document d::AllocatorType& a = d.GetAllocator();
Document("foo");
Value v1// Value v2(v1); // not allowed
(v1, a); // make a copy
Value v2assert(v1.IsString()); // v1 untouched
.SetArray().PushBack(v1, a).PushBack(v2, a);
dassert(v1.IsNull() && v2.IsNull()); // both moved to d
.CopyFrom(d, a); // copy whole document to v2
v2assert(d.IsArray() && d.Size() == 2); // d untouched
.SetObject().AddMember("array", v2, a);
v1.PushBack(v1, a); d
Swap Values
Swap()
is also provided.
(123);
Value a("Hello");
Value b.Swap(b);
aassert(a.IsString());
assert(b.IsInt());
Swapping two DOM trees is fast (constant time), despite the complexity of the trees.
What’s next
This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON:
- Streams are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams.
- Encoding defines which character encoding is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally.
- DOM’s basics are already covered in this tutorial. Uncover more advanced features such as in situ parsing, other parsing options and advanced usages.
- SAX is the foundation of parsing/generating
facility in RapidJSON. Learn how to use
Reader
/Writer
to implement even faster applications. Also tryPrettyWriter
to format the JSON. - Performance shows some in-house and third-party benchmarks.
- Internals describes some internal designs and techniques of RapidJSON.
You may also refer to the FAQ, API documentation, examples and unit tests.