office-gobmx/unoidl
Stephan Bergmann 87bad231d6 fdo#65589: Treat legacy types.rdb w/o /UCR key as empty
...as they are encountered in the wild.

Change-Id: Iae46d3b4b4aca18a09680caabc0e343f8a534989
2013-07-09 19:48:21 +02:00
..
source
Executable_reg2unoidl.mk
Library_unoidl.mk
Makefile
Module_unoidl.mk
README

Support for the new UNOIDL types.rdb format

...that replaces the old types.rdb format based on modules [[store]] and
[[registry]].

Library_unoidl contains the unoidl::Manager and unoidl::Provider implementations
for both the new and the old types.rdb formats (unoidl::loadProvider tries both
implementations in turn for a given file, so the old format is still supported
transparently for now).

Executable_reg2unoidl is a helper tool to convert from the old to the new
types.rdb format.  It is currently used at build-time.  idlc still generates the
old format, and any new-format files (used at build-time only, or included in
installation sets in URE or program/types/ or as part of bundled extensions that
are created during the build and not merely included as pre-built .oxt files)
are explicitly generated via reg2unoidl.  The SDK is still designed to generate
old-format files exclusively (especially, any non-bundled extensions will only
contain old-format files for now; that allows to modify the new format further
without having to worry about compatibility with multiple versions of that
format).

== Specification of the new UNOIDL types.rdb format ==

The format uses byte-oriented, platform-independent, binary files.  Larger
quantities are stored LSB first, without alignment requirements.  Offsets are
32 bit, effectively limiting the overall file size to 4GB, but that is not
considered a limitation in practice (and avoids unnecessary bloat compared to
64 bit offsets).

Annotations can be added for (non-module) entities and certain parts of such
entities (e.g., both for an interface type definition and for a direct method of
an interface type definition; the idea is that it can be added for direct parts
that forma a "many-to-one" relationship; there is a tradeoff between generality
of concept and size of representation, esp. for the C++ representation types in
namespace unoidl) and consist of arbitrary sequences of name/value strings.
Each name/value string is encoded as a single UTF-8 string containing a name (an
arbitrary sequence of Unicode code points not containing U+003D EQUALS SIGN),
optionally followed by U+003D EQUALS SIGN and a value (an abritrary sequence of
Unicode code points).  The only annotation name currently in use is "deprecated"
(without a value).

The following definitions are used throughout:

* UInt16: 2-byte value, LSB first
* UInt32: 4-byte value, LSB first
* UInt64: 8-byte value, LSB first
* Offset: UInt32 value, counting bytes from start of file
* NUL-Name: zero or more non-NUL US-ASCII bytes followed by a NUL byte
* Len-String: UInt32 number of characters, with 0x80000000 bit 0, followed by
   that many US-ASCII (for UNOIDL related names) resp. UTF-8 (for annotations)
   bytes
* Idx-String: either an Offset (with 0x80000000 bit 1) of a Len-String, or a
   Len-String
* Annotations: UInt32 number N of annotations followed by N * Idx-String
* Entry: Offset of NUL-Name followed by Offset of payload
* Map: zero or more Entries

The file starts with an 8 byte header, followed by information about the root
map (reg2unoidl generates files in a single depth-first pass, so the root map
itself is at the end of the file):

* 7 byte magic header "UNOIDL\xFF"
* version byte 0
* Offset of root Map
* UInt32 number of entries of root Map
...

Files generated by reg2unoidl follow that by a

  "\0** Created by LibreOffice " LIBO_VERSION_DOTTED " reg2unoidl **\0"

banner (cf. config_host/config_version.h.in), as a debugging aid.

Layout of per-entry payload in the root or a module Map:

* kind byte:

** 0: module
*** followed by:
**** UInt32 number N1 of entries of Map
**** N1 * Entry

** otherwise:
*** 0x80 bit: 1 if published
*** 0x40 bit: 1 if annotated
*** 0x20 bit: flag (may only be 1 for certain kinds, see below)
*** remaining bits:

**** 1: enum type
***** followed by:
****** UInt32 number N1 of members
****** N1 * tuple of:
******* Idx-String
******* UInt32
******* if annotated: Annotations

**** 2: plain struct type (with base if flag is 1)
***** followed by:
****** if "with base": Idx-String
****** UInt32 number N1 of direct members
****** N1 * tuple of:
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 3: polymorphic struct type template
***** followed by:
****** UInt32 number N1 of type parameters
****** N1 * Idx-String
****** UInt32 number N2 of members
****** N2 * tuple of:
******* kind byte: 0x01 bit is 1 if parameterized type
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 4: exception type (with base if flag is 1)
***** followed by:
****** if "with base": Idx-String
****** UInt32 number N1 of direct members
****** N1 * tuple of:
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 5: interface type
***** followed by:
****** UInt32 number N1 of direct mandatory bases
****** N1 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N2 of direct optional bases
****** N2 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N3 of direct attributes
****** N3 * tuple of:
******* kind byte:
******** 0x02 bit: 1 if read-only
******** 0x01 bit: 1 if bound
******* Idx-String name
******* Idx-String type
******* UInt32 number N4 of get exceptions
******* N4 * Idx-String
******* UInt32 number N5 of set exceptions
******* N5 * Idx-String
******* if annotated: Annotations
****** UInt32 number N6 of direct methods
****** N6 * tuple of:
******* Idx-String name
******* Idx-String return type
******* UInt32 number N7 of parameters
******* N7 * tuple of:
******** direction byte: 0 for in, 1 for out, 2 for in-out
******** Idx-String name
******** Idx-String type
******* UInt32 number N8 of exceptions
******* N8 * Idx-String
******* if annotated: Annotations

**** 6: typedef
***** followed by:
****** Idx-String

**** 7: constant group
***** followed by:
****** UInt32 number N1 of entries of Map
****** N1 * Entry

**** 8: single-interface--based service (with default constructor if flag is 1)
***** followed by:
****** Idx-String
****** if not "with default constructor":
******* UInt32 number N1 of constructors
******* N1 * tuple of:
******** Idx-String
******** UInt32 number N2 of parameters
******** N2 * tuple of
********* kind byte: 0x04 bit is 1 if rest parameter
********* Idx-String name
********* Idx-String type
******** UInt32 number N3 of exceptions
******** N3 * Idx-String
******** if annotated: Annotations

**** 9: accumulation-based service
***** followed by:
****** UInt32 number N1 of direct mandatory base services
****** N1 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N2 of direct optional base services
****** N2 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N3 of direct mandatory base interfaces
****** N3 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N4 of direct optional base interfaces
****** N4 * tuple of:
******* Idx-String
******* if annotated: Annotations
****** UInt32 number N5 of direct properties
****** N5 * tuple of:
******* UInt16 kind:
******** 0x0100 bit: 1 if optional
******** 0x0080 bit: 1 if removable
******** 0x0040 bit: 1 if maybedefault
******** 0x0020 bit: 1 if maybeambiguous
******** 0x0010 bit: 1 if readonly
******** 0x0008 bit: 1 if transient
******** 0x0004 bit: 1 if constrained
******** 0x0002 bit: 1 if bound
******** 0x0001 bit: 1 if maybevoid
******* Idx-String name
******* Idx-String type
******* if annotated: Annotations

**** 10: interface-based singleton
***** followed by:
****** Idx-String

**** 11: service-based singleton
***** followed by:
****** Idx-String

*** if annotated, followed by: Annotations

Layout of per-entry payload in a constant group Map:

* kind byte:
** 0x80 bit: 1 if annotated
** remaining bits:

*** 0: BOOLEAN
**** followed by value byte, 0 represents false, 1 represents true

*** 1: BYTE
**** followed by value byte, representing values with two's complement

*** 2: SHORT
**** followed by UInt16 value, representing values with two's complement

*** 3: UNSIGNED SHORT
**** followed by UInt16 value

*** 4: LONG
**** followed by UInt32 value, representing values with two's complement

*** 5: UNSIGNED LONG
**** followed by UInt32 value

*** 6: HYPER
**** followed by UInt64 value, representing values with two's complement

*** 7: UNSIGNED HYPER
**** followed by UInt64 value

*** 8: FLOAT
**** followed by 4-byte value, representing values in ISO 60599 binary32 format,
      LSB first

*** 9: DOUBLE
**** followed by 8-byte value, representing values in ISO 60599 binary64 format,
      LSB first

* if annotated, followed by: Annotations