office-gobmx/unoidl
Stephan Bergmann 741f88c9ec Some improvements to the UNOIDL SourceTreeProvider
Trying to do

$ instdir/sdk/bin/unoidl-read udkapi

(i.e., parsing the udkapi/com/sun/star/**.idl tree and printing it out): first
failed with

> Bad input <file:///.../udkapi/com/sun/star/beans/theIntrospection.idl>: source file defines more than one entity

because of its

> published interface XIntrospection;

forward declaration, so ignore such interface forward declarations; then failed
with

> Bad input <file:///.../udkapi/com/sun/star/modules.idl>: source file defines no entity

because "this file contains documentation of the modules", so ignore such
effectively empty files; then failed with

> Bad input <file:///.../udkapi/type_reference/udkapi.idl>: source file defines more than one entity

so instead do

$ mkdir TMP
$ cp -r udkapi/com TMP/
$ instdir/sdk/bin/unoidl-read TMP
$ rm -r TMP

which succeeded.

Change-Id: I465052539b474733e7945b312c996af27a034335
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/164094
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <stephan.bergmann@allotropia.de>
2024-02-28 16:01:38 +01:00
..
qa/unoidl-check
source
CustomTarget_unoidl-check_test.mk
CustomTarget_unoidl-write_test.mk
Executable_unoidl-check.mk
Executable_unoidl-read.mk
Executable_unoidl-write.mk
IwyuFilter_unoidl.yaml
Library_unoidl.mk
Makefile
Module_unoidl.mk
README.md

Support for UNOIDL Registry Formats

Library_unoidl contains the unoidl::Manager and unoidl::Provider implementations for the following registry formats:

  • The new UNOIDL binary types.rdb format.
  • The old legacy binary types.rdb format (based on modules "store" and "registry").
  • A source-file format, reading (multiple) UNOIDL entity definitions directly from a single .idl source file.
  • A source-tree format, reading UNOIDL entity definitions directly from a tree of .idl source files rooted at a given directory. (Where an entity named foo.bar.Baz is expected in a file named foo/bar/Baz.idl within that tree.)

(While .idl files still contain #include directives for legacy idlc, the source- based formats ignore any preprocessing directives starting with # in the .idl files.) unoidl::Manager::addProvider transparently detects the registry format for a given URI and instantiates the corresponding provider implementation.

Executable_unoidl-write is a helper tool to convert from any of the registry formats to the UNOIDL format. It is used at build-time to compile UNOIDL format .rdb files (that are used at build-time only, or included in installation sets in URE or program/types/ or as part of bundled extensions that are created during the build and not merely included as pre-built .oxt files) from source .idl files.

Executable_unoidl-read is a helper tool to convert from any of the registry formats to the source-file format. It can be used manually after a LibreOffice version update to create new reference registries for Executable_unoidl-check.

Executable_unoidl-check is a helper tool to check that one registry is backwards-compatible with another registry. It is used at build-time to detect inadvertent breakage of the udkapi and offapi APIs.

Specification of the New UNOIDL types.rdb Format

The format uses byte-oriented, platform-independent, binary files. Larger quantities are stored LSB first, without alignment requirements. Offsets are 32 bit, effectively limiting the overall file size to 4GB, but that is not considered a limitation in practice (and avoids unnecessary bloat compared to 64 bit offsets).

Annotations can be added for (non-module) entities and certain parts of such entities (e.g., both for an interface type definition and for a direct method of an interface type definition; the idea is that it can be added for direct parts that forma a "many-to-one" relationship; there is a tradeoff between generality of concept and size of representation, esp. for the C++ representation types in namespace unoidl) and consist of arbitrary sequences of name/value strings. Each name/value string is encoded as a single UTF-8 string containing a name (an arbitrary sequence of Unicode code points not containing U+003D EQUALS SIGN), optionally followed by U+003D EQUALS SIGN and a value (an arbitrary sequence of Unicode code points). The only annotation name currently in use is "deprecated" (without a value).

The following definitions are used throughout:

  • UInt16: 2-byte value, LSB first
  • UInt32: 4-byte value, LSB first
  • UInt64: 8-byte value, LSB first
  • Offset: UInt32 value, counting bytes from start of file
  • NUL-Name: zero or more non-NUL US-ASCII bytes followed by a NUL byte
  • Len-String: UInt32 number of characters, with 0x80000000 bit 0, followed by that many US-ASCII (for UNOIDL related names) resp. UTF-8 (for annotations) bytes
  • Idx-String: either an Offset (with 0x80000000 bit 1) of a Len-String, or a Len-String
  • Annotations: UInt32 number N of annotations followed by N * Idx-String
  • Entry: Offset of NUL-Name followed by Offset of payload
  • Map: zero or more Entries

The file starts with an 8 byte header, followed by information about the root map (unoidl-write generates files in a single depth-first pass, so the root map itself is at the end of the file):

  • 7 byte magic header UNOIDL\xFF
  • version byte 0
  • Offset of root Map
  • UInt32 number of entries of root Map ...

Files generated by unoidl-write follow that by a

"\0** Created by LibreOffice " LIBO_VERSION_DOTTED " unoidl-write **\0"

banner (cf. config_host/config_version.h.in), as a debugging aid. (Old versions used reg2unoidl instead of unoidl-write in that banner.)

Layout of per-entry payload in the root or a module Map:

  • kind byte:

    • 0: module

      • followed by:
        • UInt32 number N1 of entries of Map
        • N1 * Entry
    • otherwise:

      • 0x80 bit: 1 if published

      • 0x40 bit: 1 if annotated

      • 0x20 bit: flag (may only be 1 for certain kinds, see below)

      • remaining bits:

        • 1: enum type

          • followed by:
            • UInt32 number N1 of members
            • N1 * tuple of:
              • Idx-String
              • UInt32
              • if annotated: Annotations
        • 2: plain struct type (with base if flag is 1)

          • followed by:
            • if "with base": Idx-String
            • UInt32 number N1 of direct members
            • N1 * tuple of:
              • Idx-String name
              • Idx-String type
              • if annotated: Annotations
        • 3: polymorphic struct type template

          • followed by:
            • UInt32 number N1 of type parameters
            • N1 * Idx-String
            • UInt32 number N2 of members
            • N2 * tuple of:
              • kind byte: 0x01 bit is 1 if parameterized type
              • Idx-String name
              • Idx-String type
              • if annotated: Annotations
        • 4: exception type (with base if flag is 1)

          • followed by:
            • if "with base": Idx-String
            • UInt32 number N1 of direct members
            • N1 * tuple of:
              • Idx-String name
              • Idx-String type
              • if annotated: Annotations
        • 5: interface type

          • followed by:
            • UInt32 number N1 of direct mandatory bases
            • N1 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N2 of direct optional bases
            • N2 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N3 of direct attributes
            • N3 * tuple of:
              • kind byte:
                • 0x02 bit: 1 if read-only
                • 0x01 bit: 1 if bound
              • Idx-String name
              • Idx-String type
              • UInt32 number N4 of get exceptions
              • N4 * Idx-String
              • UInt32 number N5 of set exceptions
              • N5 * Idx-String
              • if annotated: Annotations
            • UInt32 number N6 of direct methods
            • N6 * tuple of:
              • Idx-String name
              • Idx-String return type
              • UInt32 number N7 of parameters
              • N7 * tuple of:
                • direction byte: 0 for in, 1 for out, 2 for in-out
                • Idx-String name
                • Idx-String type
              • UInt32 number N8 of exceptions
              • N8 * Idx-String
              • if annotated: Annotations
        • 6: typedef

          • followed by:
            • Idx-String
        • 7: constant group

          • followed by:
            • UInt32 number N1 of entries of Map
            • N1 * Entry
        • 8: single-interface--based service (with default constructor if flag is 1)

          • followed by:
            • Idx-String
            • if not "with default constructor":
              • UInt32 number N1 of constructors
              • N1 * tuple of:
                • Idx-String
                • UInt32 number N2 of parameters
                • N2 * tuple of
                  • kind byte: 0x04 bit is 1 if rest parameter
                  • Idx-String name
                  • Idx-String type
                • UInt32 number N3 of exceptions
                • N3 * Idx-String
                • if annotated: Annotations
        • 9: accumulation-based service

          • followed by:
            • UInt32 number N1 of direct mandatory base services
            • N1 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N2 of direct optional base services
            • N2 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N3 of direct mandatory base interfaces
            • N3 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N4 of direct optional base interfaces
            • N4 * tuple of:
              • Idx-String
              • if annotated: Annotations
            • UInt32 number N5 of direct properties
            • N5 * tuple of:
              • UInt16 kind:
                • 0x0100 bit: 1 if optional
                • 0x0080 bit: 1 if removable
                • 0x0040 bit: 1 if maybedefault
                • 0x0020 bit: 1 if maybeambiguous
                • 0x0010 bit: 1 if readonly
                • 0x0008 bit: 1 if transient
                • 0x0004 bit: 1 if constrained
                • 0x0002 bit: 1 if bound
                • 0x0001 bit: 1 if maybevoid
                • Idx-String name
                • Idx-String type
                • if annotated: Annotations
        • 10: interface-based singleton

          • followed by:
          • Idx-String
        • 11: service-based singleton

          • followed by:
            • Idx-String
      • if annotated, followed by: Annotations

Layout of per-entry payload in a constant group Map:

  • kind byte:

    • 0x80 bit: 1 if annotated

    • remaining bits:

      • 0: BOOLEAN

        • followed by value byte, 0 represents false, 1 represents true
      • 1: BYTE

        • followed by value byte, representing values with two's complement
      • 2: SHORT

        • followed by UInt16 value, representing values with two's complement
      • 3: UNSIGNED SHORT

        • followed by UInt16 value
      • 4: LONG

        • followed by UInt32 value, representing values with two's complement
      • 5: UNSIGNED LONG

        • followed by UInt32 value
      • 6: HYPER

        • followed by UInt64 value, representing values with two's complement
      • 7: UNSIGNED HYPER

        • followed by UInt64 value
      • 8: FLOAT

        • followed by 4-byte value, representing values in ISO 60599 binary32 format, LSB first
      • 9: DOUBLE

        • followed by 8-byte value, representing values in ISO 60599 binary64 format, LSB first
  • if annotated, followed by: Annotations