office-gobmx/xmloff
László Németh 6e8819f29b tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type
Support XSL attribute "column" and CSS 4 attribute "spread",
stored in loext:hyphenation-keep-type, to give better control
over hyphenation-keep. E.g. spread: both parts of a hyphenated
word shall lie within a single spread, i.e. when the next page
is not visible at the same time (e.g. the next page is not a
right page of a book).

– css::style::ParaHyphenationKeep is a boolean property now,
  importing hyphenation-keep = "page" as true.

– type of ParaHyphenationKeep, including the new non-ODF types
  is stored in the new ParagraphProperties::ParaHyphenationKeepType.

– default value of ParaHyphenationKeepType is COLUMN for
  interoperability.

– Add checkboxes to Text Flow -> Hyphenation Across in
  paragraph dialog:

  * Column (previously: Hyphenate across column and page)
  * Page
  * Spread

  – enabling/disabling them follows XSL/CSS 4/loext, i.e.
    possible combinations:

  * No Hyphenation across
    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")

  * Hyphenation across [x] Column
    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")

  * Hyphenation across [x] Column [x] Page
    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")

  * Hyphenation across [x] Column [x] Page [x] Spread
    (hyphenation-keep = "auto")

– Add ODF import/export

– Update DOCX import

– Add ODF unit tests

Note: recent implementation depends on widow settings: disabling widow
handling allows hyphenation across columns and pages not only in table
cells.

Note: RTF import-only, but not used bPageEnd has been renamed to bKeep.
Depending on the RTF test results, likely it will need to disable
the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO,
if PageEnd uses obsolete hyphenation rule, i.e. shifting only the
hyphenated word to the next page, not the full line.

More information:

– COLUMN (standard XSL value, defined in
  https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,
  equivalent of hyphenation-keep, defined in
  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

Follow-up to commit 9574a62add
"tdf#132599 cui offapi sw xmloff: implement hyphenate-keep" and
commit c8ee0e8f58
"tdf160518 DOCX: import hyphenation-keep to fix layout".

Change-Id: I3ac6d9e86d0ed1646f105de8607c0e8ebc534eaa
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/165954
Tested-by: László Németh <nemeth@numbertext.org>
Reviewed-by: László Németh <nemeth@numbertext.org>
2024-04-11 10:20:41 +02:00
..
documentation
dtd
inc tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type 2024-04-11 10:20:41 +02:00
qa
source tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type 2024-04-11 10:20:41 +02:00
util
CppunitTest_xmloff_draw.mk
CppunitTest_xmloff_style.mk
CppunitTest_xmloff_text.mk
CppunitTest_xmloff_uxmloff.mk
CustomTarget_generated.mk
IwyuFilter_xmloff.yaml
JunitTest_xmloff_unoapi.mk
Library_xo.mk
Library_xof.mk
Makefile
Module_xmloff.mk
Package_dtd.mk
README.md

ODF Import and Export Filter Logic

The main library "xo" contains the basic ODF import/export filter implementation for most applications. The document is accessed via its UNO API, which has the advantage that the same import/export code can be used for text in all applications (from/to Writer/EditEngine). The filter consumes/produces via SAX UNO API interface (implemented in "sax"). Various bits of the ODF filters are also implemented in applications, for example [git:sw/source/filter/xml].

There is a central list of all element or attribute names in [git:include/xmloff/xmltoken.hxx]. The main class of the import filter is SvXMLImport, and of the export filter SvXMLExport.

The Import filter maintains a stack of contexts for each element being read. There are many classes specific to particular elements, derived from SvXMLImportContext.

Note that for export several different versions of ODF are supported, with the default being the latest ODF version with "extensions", which means it may contain elements and attributes that are only in drafts of the specification or are not yet submitted for specification. Documents produced in the other (non-extended) ODF modes are supposed to be strictly conforming to the respective specification, i.e., only markup defined by the ODF specification is allowed.

There is another library "xof" built from the source/transform directory, which is the filter for the OpenOffice.org XML format. This legacy format is a predecessor of ODF and was the default in OpenOffice.org 1.x versions, which did not support ODF. This filter works as a SAX transformation from/to ODF, i.e., when importing a document the transform library reads the SAX events from the file and generates SAX events that are then consumed by the ODF import filter.

OpenOffice.org XML File Format

There is some stuff in the "dtd" directory which is most likely related to the OpenOffice.org XML format but is possibly outdated and obsolete.

Add New XML Tokens

When adding a new XML token, you need to add its entry in the following three files:

  • [git:include/xmloff/xmltoken.hxx]
  • [git:xmloff/source/core/xmltoken.cxx]
  • [git:xmloff/source/token/tokens.txt]