mirror of
https://github.com/boostorg/serialization.git
synced 2026-01-24 18:32:14 +00:00
466 lines
21 KiB
HTML
466 lines
21 KiB
HTML
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<!--
|
|
(C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com .
|
|
Use, modification and distribution is subject to the Boost Software
|
|
License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt)
|
|
-->
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
|
<link rel="stylesheet" type="text/css" href="style.css">
|
|
<title>Serialization - Special Considerations</title>
|
|
</head>
|
|
<body link="#0000ff" vlink="#800080">
|
|
<table border="0" cellpadding="7" cellspacing="0" width="100%" summary="header">
|
|
<tr>
|
|
<td valign="top" width="300">
|
|
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
|
</td>
|
|
<td valign="top">
|
|
<h1 align="center">Serialization</h1>
|
|
<h2 align="center">Special Considerations</h2>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<hr>
|
|
<dl class="page-index">
|
|
<dt><a href="#derivedpointers">Pointers to Objects of Derived Classes</a>
|
|
<dl class="page-index">
|
|
<dt><a href="#registration">Registration</a>
|
|
<dt><a href="#instantiation">Instantiation</a>
|
|
<dt><a href="#selectivetracking">Selective Tracking</a>
|
|
<dt><a href="#runtimecasting">Runtime Casting</a>
|
|
</dl>
|
|
<dt><a href="#objecttracking">Object Tracking</a>
|
|
<dt><a href="#classinfo">Class Information</a>
|
|
<dt><a href="#portability">Archive Portability</a>
|
|
<dl class="page-index">
|
|
<dt><a href="#numerics">Numerics</a>
|
|
<dt><a href="#traits">Traits</a>
|
|
</dl>
|
|
<dt><a href="exceptions.html">Archive Exceptions</a>
|
|
<dt><a href="exception_safety.html">Exception Safety</a>
|
|
</dl>
|
|
<h3><a name="derivedpointers">Pointers to Objects of Derived Classes</a></h3>
|
|
<h4><a name="registration">Registration</a></h4>
|
|
Consider the following:
|
|
<pre><code>
|
|
class base {
|
|
...
|
|
};
|
|
class derived_one : public base {
|
|
...
|
|
};
|
|
class derived_two : public base {
|
|
...
|
|
};
|
|
main(){
|
|
...
|
|
base *b;
|
|
ar & b;
|
|
}
|
|
</code></pre>
|
|
When loading <code style="white-space: normal">b</code> what kind of object should be created?
|
|
An object of class <code style="white-space: normal">derived_one</code>,
|
|
<code style="white-space: normal">derived_two</code>, or maybe <code style="white-space: normal">base</code>?
|
|
<p>
|
|
If this situation is not addressed by one of the methods described below,
|
|
an <a target="detail" href="exceptions.html#unregistered_class">
|
|
<code style="white-space: normal">unregistered_class</code></a> exception will be thrown when serialization is
|
|
invoked.
|
|
<p>Many times this situation is resolved automatically by the serialization
|
|
library.
|
|
<p>
|
|
The system "registers" each class in an archive the first time an object of that
|
|
class it is serialized and assigns a sequential number to it. Next time an
|
|
object of that class is serialized in that same archive, this number is written
|
|
in the archive. So every class is identified uniquely within the archive.
|
|
When the archive is read back in, each new sequence number is re-associated with
|
|
the class being read. Note that this implies that "registration" has to occur
|
|
during both save and load so that the class-integer table built on load
|
|
is identical to the class-integer table built on save. In fact, the key to
|
|
whole serialization system is that things are always saved and loaded in
|
|
the same sequence. This includes "registration"
|
|
<p>
|
|
In many situations the problem never comes up. Consider:
|
|
<pre><code>
|
|
main(){
|
|
derived_one d1;
|
|
derived_two d2:
|
|
...
|
|
ar >> d1;
|
|
ar >> d2;
|
|
// A side effect of serialization of objects d1 and d2 is that
|
|
// the classes derived_one and derived_two become known to the archive.
|
|
// So subsequent serialization of those classes by base pointer works
|
|
// without any special considerations.
|
|
base *b;
|
|
ar & b;
|
|
}
|
|
</code></pre>
|
|
Here, the problem doesn't present itself. When <code style="white-space: normal">b</code> is read it is
|
|
preceded by a unique (to the archive) class identifier which
|
|
has previously been related to class <code style="white-space: normal">derived_one</code> or
|
|
<code style="white-space: normal">derived_two</code>.
|
|
<p>
|
|
If a derived class hasn't been automatically "registered" as described
|
|
above, we have the option of registering it explicitly. All archives are
|
|
derived from a base class which implements the following template:
|
|
<pre><code>
|
|
template<class T>
|
|
register_type();
|
|
</code></pre>
|
|
So our problem could just as well be addressed by writing:
|
|
<pre><code>
|
|
main(){
|
|
...
|
|
ar.template register_type<derived_one>();
|
|
ar.template register_type<derived_two>();
|
|
base *b;
|
|
ar & b;
|
|
}
|
|
</code></pre>
|
|
Note that if the serialization function is split between save and load, both
|
|
functions must include the registration. This is required to keep the save
|
|
and corresponding load in syncronization.
|
|
<p>
|
|
This will work but may be inconvenient. We don't always know which derived
|
|
classes we are going to serialize when we write the code to serialize through
|
|
a base class pointer. Every time a new derived class is written we have to
|
|
go back to all the places where the base class is serialized and update the
|
|
code.
|
|
<p>
|
|
So we have another method:
|
|
<pre><code>
|
|
#include <boost/serialization/export.hpp>
|
|
...
|
|
BOOST_CLASS_EXPORT_GUID(derived_one, "derived_one")
|
|
BOOST_CLASS_EXPORT_GUID(derived_two, "derived_two")
|
|
|
|
main(){
|
|
...
|
|
base *b;
|
|
ar & b;
|
|
}
|
|
</code></pre>
|
|
The macro <code style="white-space: normal">BOOST_CLASS_EXPORT_GUID</code> associates a string literal
|
|
with a class. In the above example we've used a string rendering
|
|
of the class name. If a object of such an "exported" class is serialized
|
|
through a pointer and is otherwise unregistered, the "export" string is
|
|
included in the archive. When the archive
|
|
is later read, the string literal is used to find the class which
|
|
should be created by the serialization library.
|
|
This permits each class to be in a separate header file along with its
|
|
string identifier. There is no need to maintain a separate "pre-registration"
|
|
of derived classes that might be serialized. This method of
|
|
registration is referred to as "key export". More information on this
|
|
topic is found in the section Class Traits -
|
|
<a target="detail" href="traits.html#export">Export Key</a>.
|
|
<p>
|
|
<h4><a name="instantiation">Instantiation</a></h4>
|
|
Registration by means of any of the above methods fulfill another role
|
|
whose importance might not be obvious. This system relies on templated
|
|
functions of the form <code style="white-space: normal">template<class Archive, class T></code>.
|
|
This means that serialization code must be instantiated for each
|
|
combination of archive and data type that is serialized in the program.
|
|
<p>
|
|
Polymorphic pointers of derived classes may never be referred to
|
|
explictly by the program so normally code to serialize such classes
|
|
would never be instantiated. So in addition to including export key
|
|
strings in an archive, <code style="white-space: normal">BOOST_CLASS_EXPORT_GUID</code> explicitly
|
|
instantiates the class serialization code for all archive classes used
|
|
by the program.
|
|
<p>
|
|
In order to do this,
|
|
<a href="../../../boost/serialization/export.hpp" target="export_hpp">export.hpp</a>
|
|
includes meta programming code to build a <code style="white-space: normal">mpl::list</code>
|
|
of all the file types used by the module by checking for definition of the header
|
|
inclusion guards.
|
|
|
|
Using this list,
|
|
<code style="white-space: normal">BOOST_CLASS_EXPORT_GUID</code> will explicitly instantiate serialization
|
|
code for all exported classes.
|
|
For this implementaton to function, the header file
|
|
<a href="../../../boost/serialization/export.hpp" target="export_hpp">export.hpp</a>
|
|
has to come after all the archive header files. This is enforced
|
|
by code at the end of the header file:
|
|
<a href="../../../boost/archive/basic_archive.hpp" target="basic_archive_hpp">basic_archive.hpp</a>
|
|
which will trip a STATIC_ASSERT if this requirement is violated.
|
|
|
|
<h4><a name="selectivetracking">Selective Tracking</a></h4>
|
|
Whether or not an object is tracked is determined by its
|
|
<a target="detail" href="traits.html#tracking">object tracking trait</a>.
|
|
The default setting for user defined types is <code style="white-space: normal">track_selectively</code>.
|
|
That is, track objects if and only if they are serialized through pointers anywhere
|
|
in the program. Any objects that are "registered" by any of the above means are presumed
|
|
to be serialized through pointers somewhere in the program and therefore
|
|
would be tracked. In certain situations this could lead to an inefficiency.
|
|
Suppose we have a class module used by multiple programs. Because
|
|
some programs serializes polymorphic pointers to objects of this class, we
|
|
<a target="detail" href="traits.html#export">export</a> a class
|
|
identifier by specifying <code style="white-space: normal">BOOST_CLASS_EXPORT</code> in the
|
|
class header. When this module is included by another program,
|
|
objects of this class will always be tracked even though it
|
|
may not be necessary. This situation could be addressed by using
|
|
<a target="detail" href="traits.html#tracking"><code style="white-space: normal">track_never</code></a>
|
|
in those programs.
|
|
<p>
|
|
It could also occur that even though a program serializes through
|
|
a pointer, we are more concerned with efficiency than avoiding the
|
|
the possibility of creating duplicate objects. It could be
|
|
that we happen to know that there will be no duplicates. It could
|
|
also be that the creation of a few duplicates is benign and not
|
|
worth avoiding given the runtime cost of tracking duplicates.
|
|
Again, <a target="detail" href="traits.html#tracking"><code style="white-space: normal">track_never</code></a>
|
|
can be used.
|
|
<h4><a name="runtimecasting">Runtime Casting</a></h4>
|
|
In order to properly translate between base and derived pointers
|
|
at runtime, the system requires each base/derived pair be found
|
|
in a table. A side effect of serializing a base object with
|
|
<code style="white-space: normal">boost::serialization::base_object<Base>(Derived &)</code>
|
|
is to ensure that the base/derived pair is added to the table
|
|
before the <code style="white-space: normal">main</code> function is entered.
|
|
This is very convenient and results in a clean syntax. The only
|
|
problem is that it can occur where a derived class serialized
|
|
through a pointer has no need to invoke the serialization of
|
|
its base class. In such a case, there are two choices. The obvious
|
|
one is to invoke the base class serialization with <code style="white-space: normal">base_object</code>
|
|
and specify an empty function for the base class serialization.
|
|
The alternative is to "register" the Base/Derived relationship
|
|
explicitly by invoking the template
|
|
<code style="white-space: normal">void_cast_register<Base, Derived>();</code>.
|
|
Note that this usage of the term "register" is not related
|
|
to its usage in the previous section. Here is an example of how this is done:
|
|
<pre><code>
|
|
#include <sstream>
|
|
#include <boost/serialization/serialization.hpp>
|
|
#include <boost/archive/text_iarchive.hpp>
|
|
#include <boost/serialization/export.hpp>
|
|
|
|
class base {
|
|
friend class boost::serialization::access;
|
|
//...
|
|
// only required when using method 1 below
|
|
// no real serialization required - specify a vestigial one
|
|
template<class Archive>
|
|
void serialize(Archive & ar, const unsigned int file_version){}
|
|
};
|
|
|
|
class derived : public base {
|
|
friend class boost::serialization::access;
|
|
template<class Archive>
|
|
void serialize(Archive & ar, const unsigned int file_version){
|
|
// method 1 : invoke base class serialization
|
|
boost::serialization::base_object<base>(*this);
|
|
// method 2 : explicitly register base/derived relationship
|
|
boost::serialization::void_cast_register<base, derived>();
|
|
}
|
|
};
|
|
|
|
BOOST_CLASS_EXPORT_GUID(derived, "derived")
|
|
|
|
main(){
|
|
//...
|
|
std::stringstream ss;
|
|
boost::archive::text_iarchive ar(ss);
|
|
base *b;
|
|
ar >> b;
|
|
}
|
|
</code></pre>
|
|
|
|
<h3><a name="objecttracking">Object Tracking</a></h3>
|
|
Depending on how the class is used and other factors, serialized objects
|
|
may be tracked by memory address. This prevents the same object from being
|
|
written to or read from an archive multiple times. This could cause problems in
|
|
progams where the copies of different objects are serialized from the same address.
|
|
<pre><code>
|
|
template<class Archive>
|
|
void save(boost::basic_oarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
A x = a[i];
|
|
ar << x;
|
|
}
|
|
}
|
|
</code></pre>
|
|
In this case, the data to be saved exists on the stack. Each iteration
|
|
of the loop updates the value on the stack. So although the data changes
|
|
each iteration, the address of the data doesn't. If a[i] is an array of
|
|
objects being tracked by memory address, the library will skip storing
|
|
objects after the first as it will be assumed that objects at the same address
|
|
are really the same object.
|
|
<p>
|
|
To help detect such cases, output archive operators expect to be passed
|
|
<code style="white-space: normal">const</code> reference arguments.
|
|
<p>
|
|
Given this, the above code will invoke a compile time assertion.
|
|
The obvious fix in this example is to use
|
|
<pre><code>
|
|
template<class Archive>
|
|
void save(boost::basic_oarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
ar << a[i];
|
|
}
|
|
}
|
|
</code></pre>
|
|
which will compile and run without problem.
|
|
<p>
|
|
The usage of <code style="white-space: normal">const</code> by the output archive operators
|
|
will ensure that the process of serialization doesn't
|
|
change the state of the objects being serialized. An attempt to do this
|
|
would constitute augmentation of the concept of saving of state with
|
|
some sort of non-obvious side effect. This would almost surely be a mistake
|
|
and a likely source of very subtle bugs. As described
|
|
<a target="detail" href="traits.html#tracking">above</a>,
|
|
addresses of objects serialized as pointers are stored in memory to
|
|
prevent saving/loading of duplicate objects. These stored addresses
|
|
can also be used to delete objects created during a loading process
|
|
that has been interrupted by throwing of an exception. By default, code
|
|
to implement this tracking is instantiated if and only if an object of the class
|
|
is serialized through a pointer. If it is known a priori that no pointer
|
|
values are duplicated, overhead associated with object tracking can
|
|
be eliminated by setting the object tracking class serialization trait
|
|
appropriately.
|
|
<p>
|
|
By definition, data types designated primitive by
|
|
<a target="detail" href="traits.html#level">Implementation Level</a>
|
|
class serialization trait are never tracked. If it is desired to
|
|
track a shared primitive object through a pointer (e.g. a
|
|
<code style="white-space: normal">long</code> used as a reference count), It should be wrapped
|
|
in a class/struct so that it is an identifiable type.
|
|
The alternative of changing the implementation level of a <code style="white-space: normal">long</code>
|
|
would affect all <code style="white-space: normal">long</code>s serialized in the whole
|
|
program - probably not what one would intend.
|
|
<p>
|
|
It is possible that we may want to track addresses even though
|
|
the object is never serialized through a pointer. For example,
|
|
a virtual base class need be saved/loaded only once. By setting
|
|
this serialization trait to <code style="white-space: normal">track_always</code>, we can suppress
|
|
redundant save/load operations.
|
|
<pre><code>
|
|
BOOST_CLASS_TRACKING(my_virtual_base_class, boost::serialization::track_always)
|
|
</code></pre>
|
|
<h3><a name="classinfo">Class Information</a></h3>
|
|
By default, for each class serialized, class information is written to the archive.
|
|
This information includes version number, implementation level and tracking
|
|
behavior. This is necessary so that the archive can be correctly
|
|
deserialized even if a subsequent version of the program changes
|
|
some of the current trait values for a class. The space overhead for
|
|
this data is minimal. There is a little bit of runtime overhead
|
|
since each class has to be checked to see if it has already had its
|
|
class information included in the archive. In some cases, even this
|
|
might be considered too much. This extra overhead can be eliminated
|
|
by setting the
|
|
<a target="detail" href="traits.html#level">implementation level</a>
|
|
class trait to: <code style="white-space: normal">boost::serialization::object_serializable</code>.
|
|
<p>
|
|
<i>Turning off tracking and class information serialization will result
|
|
in pure template inline code that in principle could be optimised down
|
|
to a simple stream write/read.</i> Elimination of all serialization overhead
|
|
in this manner comes at a cost. Once archives are released to users, the
|
|
class serialization traits cannot be changed without invalidating the old
|
|
archives. Including the class information in the archive assures us
|
|
that they will be readable in the future even if the class definition
|
|
is revised. A light weight structure such as display pixel might be
|
|
declared in a header like this:
|
|
|
|
<pre><code>
|
|
#include <boost/serialization/serialization.hpp>
|
|
#include <boost/serialization/level.hpp>
|
|
#include <boost/serialization/tracking.hpp>
|
|
|
|
// a pixel is a light weight struct which is used in great numbers.
|
|
struct pixel
|
|
{
|
|
unsigned char red, green, blue;
|
|
template<class Archive>
|
|
void serialize(Archive & ar, const unsigned int /* version */){
|
|
ar << red << green << blue;
|
|
}
|
|
};
|
|
|
|
// elminate serialization overhead at the cost of
|
|
// never being able to increase the version.
|
|
BOOST_CLASS_IMPLEMENTATION(pixel, boost::serialization::object_serializable);
|
|
|
|
// eliminate object tracking (even if serialized through a pointer)
|
|
// at the risk of a programming error creating duplicate objects.
|
|
BOOST_CLASS_TRACKING(pixel, boost::serialization::track_never)
|
|
</code></pre>
|
|
|
|
<h3><a name="portability">Archive Portability</a></h3>
|
|
Several archive classes create their data in the form of text or portable a binary format.
|
|
It should be possible to save such an of such a class on one platform and load it on another.
|
|
This is subject to a couple of conditions.
|
|
<h4><a name="numerics">Numerics</a></h4>
|
|
The architecture of the machine reading the archive must be able hold the data
|
|
saved. For example, the gcc compiler reserves 4 bytes to store a variable of type
|
|
<code style="white-space: normal">wchar_t</code> while other compilers reserve only 2 bytes.
|
|
So its possible that a value could be written that couldn't be represented by the loading program. This is a
|
|
fairly obvious situation and easily handled by using the numeric types in
|
|
<a target="cstding" href="../../../boost/cstdint.hpp"><boost/cstdint.hpp></a>
|
|
|
|
<h4><a name="traits">Traits</a></h4>
|
|
Another potential problem is illustrated by the following example:
|
|
<pre><code>
|
|
template<class T>
|
|
struct my_wrapper {
|
|
template<class Archive>
|
|
Archive & serialize ...
|
|
};
|
|
|
|
...
|
|
|
|
class my_class {
|
|
wchar_t a;
|
|
short unsigned b;
|
|
template<<class Archive>
|
|
Archive & serialize(Archive & ar, unsigned int version){
|
|
ar & my_wrapper(a);
|
|
ar & my_wrapper(b);
|
|
}
|
|
};
|
|
</code></pre>
|
|
If <code style="white-space: normal">my_wrapper</code> uses default serialization
|
|
traits there could be a problem. With the default traits, each time a new type is
|
|
added to the archive, bookkeeping information is added. So in this example, the
|
|
archive would include such bookkeeping information for
|
|
<code style="white-space: normal">my_wrapper<wchar_t></code> and for
|
|
<code style="white-space: normal">my_wrapper<short_unsigned></code>.
|
|
Or would it? What about compilers that treat
|
|
<code style="white-space: normal">wchar_t</code> as a
|
|
synonym for <code style="white-space: normal">unsigned short</code>?
|
|
In this case there is only one distinct type - not two. If archives are passed between
|
|
programs with compilers that differ in their treatment
|
|
of <code style="white-space: normal">wchar_t</code> the load operation will fail
|
|
in a catastrophic way.
|
|
<p>
|
|
One remedy for this is to assign serialization traits to the template
|
|
<code style="white-space: normal">my_template</code> such that class
|
|
information for instantiations of this template is never serialized. This
|
|
process is describe <a target="detail" href="traits.html#templates">above</a> and
|
|
has been used for <a target="detail" href="wrappers.html#nvp"><strong>Name-Value Pairs</strong></a>.
|
|
Wrappers would typically be assigned such traits.
|
|
<p>
|
|
Another way to avoid this problem is to assign serialization traits
|
|
to all specializations of the template <code style="white-space: normal">my_wrapper</code>
|
|
for all primitive types so that class information is never saved. This is what has
|
|
been done for our implementation of serializations for STL collections.
|
|
|
|
<h3><a href="exceptions.html">Archive Exceptions</a></h3>
|
|
<h3><a href="exception_safety.html">Exception Safety</a></h3>
|
|
|
|
<hr>
|
|
<p>Revised
|
|
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
|
24 January, 2004
|
|
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
|
</p>
|
|
<p><i>© Copyright <a href="http://www.rrsd.com">Robert Ramey</a>
|
|
2002-2004. All Rights Reserved.</i></p>
|
|
</body>
|
|
</html>
|