office-gobmx/writerfilter
Mike Kaganski dcae6615ed tdf#158556: provide objects anchored to node as a hidden property
This introduces a hidden property of SwXParagraph, named
OOXMLImport_AnchoredShapes.

Testing on my system, starting the main process first, then launching
another one like

  time soffice path/to/bugdoc.docx

... so that it only measures time spent in import, gave the following
figures:

LibreOffice 7.5.0.3 (TDF build):
real    1m49.016s
user    0m0.000s
sys     0m0.000s

LibreOffice 7.6.0.3 (TDF build):
real    8m37.386s
user    0m0.000s
sys     0m0.000s

Current master (my no-debug build):
real    10m6.776s
user    0m0.000s
sys     0m0.000s

Current master with this patch (my no-debug build):
real    5m41.524s
user    0m0.000s
sys     0m0.015s

Indeed, it is not as fast as it used to be; and the fix doesn't really
remove the quadratic complexity, just uses faster iteration. If there
is a way to directly list objects anchored to a given paragraph, rather
than iterating over all objects checking their anchors, that would get
much faster, but that would be a rather large change.

Change-Id: Ie50515815e85fdce498d065185199c9b31d95794
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160813
Tested-by: Mike Kaganski <mike.kaganski@collabora.com>
Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
2023-12-15 10:29:14 +01:00
..
documentation
inc cid#1545577 COPY_INSTEAD_OF_MOVE 2023-12-11 14:06:54 +01:00
qa tdf#136472 adjust ooxml import to handle first header/footer 2023-12-01 08:26:38 +01:00
source tdf#158556: provide objects anchored to node as a hidden property 2023-12-15 10:29:14 +01:00
util
CppunitTest_writerfilter_dmapper.mk
CppunitTest_writerfilter_filters_test.mk
CppunitTest_writerfilter_misc.mk
CppunitTest_writerfilter_ooxml.mk
CppunitTest_writerfilter_rtftok.mk
CustomTarget_source.mk
IwyuFilter_writerfilter.yaml
Library_writerfilter.mk
Makefile
Module_writerfilter.mk
README.md

Import Filters for LibreOffice Writer

The writerfilter module contains import filters for Writer, using its UNO API.

Import filter for DOCX and RTF.

  • Module contents

    • documentation: RNG schema for the OOXML tokenizer, etc.
    • inc: module-global headers (can be included by any files under source)
    • qa: cppunit tests
    • source: the filters themselves
    • util: UNO passive registration config
  • Source contents

    • dmapper: the domain mapper, hiding UNO from the tokenizers, used by DOCX and RTF import
      • The incoming traffic of dmapper can be dumped into an XML file in /tmp in dbgutil builds, start soffice with the SW_DEBUG_WRITERFILTER=1 environment variable if you want that.
    • filter: the UNO filter service implementations, invoked by UNO and calling the dmapper + one of the tokenizers
    • ooxml: the docx tokenizer
    • rtftok: the rtf tokenizer