diff --git a/xmerge/source/aportisdoc/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html b/xmerge/source/aportisdoc/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html new file mode 100644 index 000000000000..dcac3f421012 --- /dev/null +++ b/xmerge/source/aportisdoc/java/org/openoffice/xmerge/converter/xml/sxw/aportisdoc/package.html @@ -0,0 +1,263 @@ + + +
+Provides the tools for doing the conversion of StarWriter XML to +and from AportisDoc format.
+ +It follows the {@link org.openoffice.xmerge} framework for the conversion process.
+ +Since it converts to/from a Palm application format, these converters
+follow the
+PalmDB
stream format for writing out to the Palm sync client or
+reading in from the Palm sync client.
Note that PluginFactoryImpl
also provides a
+DocumentMerger
object, i.e. {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentMergerImpl DocumentMergerImpl}.
+This functionality was derived from its superclass
+{@link org.openoffice.xmerge.converter.xml.sxw.SxwPluginFactory
+SxwPluginFactory}.
The AportisDoc pdb format is widely used by different Palm applications, +e.g. QuickWord, AportisDoc Reader, MiniWrite, etc. Note that some +of these applications put tweaks into the format. The converters will only +support the default AportisDoc format, plus some very minor tweaks to accommodate +other applications.
+ +The text content of the format is plain text, i.e. there are no styles +or structures. There is no notion of lists, list items, paragraphs, +headings, etc. The format does have support for bookmarks.
+ +For most Doc applications, the default character encoding supported is +the extended ASCII character set, i.e. ISO-8859-1. StarWriter XML is in +UTF-8 encoding scheme. Since UTF-8 encoding scheme covers more characters, +converting UTF-8 strings into extended ASCII would mean that there can be +possible loss of character mappings.
+ +Using JAXP, XML files can be parsed and read in as Java String
s
+which is in Unicode format, there is no loss of character mapping from UTF-8
+to Java Strings. There is possible loss of character mapping in
+converting Java String
s to ASCII bytes. Java characters that
+cannot be represented in extended ASCII are converted into the ASCII
+character '?' or x3F in hex digit via the String.getBytes(encoding)
+API.
The DocumentSerializerImpl
class implements the
+org.openoffice.xmerge.DocumentSerializer
.
+This class specifically provides the conversion process from a given
+SxwDocument
object to DOC formatted records, which are
+then passed back to the client via the ConvertData
object.
The following XML tags are handled. [Note that some may not be implemented yet.]
+Paragraphs <text:p> and Headings <text:h>
+ +Heading elements are classified the same as paragraph + elements since both have the same possible elements inside. + Their main difference is that they refer to different types + of style information, which is outside of their element tags. + Since there are no styles on the DOC format, headings should + be treated the same way a paragraph is converted.
+ +For paragraph elements, convert and transfer text nodes + that are essential. Text nodes directly contained within paragraph + nodes are such. There are also a number of elements that + a paragraph element may contain. These are explained in their + own context.
+ +At the end of the paragraph, an EOL character is added by + the converter to provide a separation for each paragraph, + since the Doc format does not have a notion of a paragraph.
+White spaces <text:s> and Tabs <text:tab-stop>
+ +In SXW, normally 2 or more white-space characters are collapsed into + a single space character. In order to make sure that the document + content really contains those white-space characters, there are special + elements assigned to them.
+ +The space element specifies the number of spaces are in it. + Thus, converting it just means providing the specific number of spaces + that the element requires.
+ +There is also the tab-stop element. This is a bit tricky. In a + StarWriter document, tab-stops are specified by a column position. + A tab is not an exact number of space, but rather a specific column + positioning. Say, regular tab-stops are set at every 5th column. + At column 4, if I hit a tab, it goes to column 5. At column 1, hitting + a tab would put the cursor at column 5 as well. SmartDoc and AporticDoc + applications goes by columns for the ASCII tab character. The only problem + is that in StarWriter, one could specify a different tab-stop, but not + in most of these Doc applications, at least I have not seen one. + Solution for this is just to go with the converting to the ASCII tab + character and not do anything for different tab-stop positioning.
+Line breaks <text:line-break>
+ +To represent line breaks, it is simpliest to just put an ASCII LF + character. Note that the side effect of this is that an end of paragraph + also contains an ASCII LF character. Thus, for the DOC to SXW conversion, + line breaks are not distinguishable from specifying the end of a + paragraph.
+Text spans <text:span>
+ +Text spans contain text that have different style attributes + from the paragraphs'. Text spans can be embedded within another + text span. Since it is purely for style tagging, we only needed + to convert and transfer the text elements within these.
+Hyperlinks <text:a> + +
Convert and transfer the text portion.
+Bookmarks <text:bookmark> <text:bookmark-start> + <text:bookmark-end> [Not implemented yet]
+ +In SXW, bookmark elements are embedded inside paragraph elements. + Bookmarks can either mark a text position or a text range. <text:bookmark> + marks a position while the pair <text:bookmark-start> and + <text:bookmark-end>
marks a text range. The DOC format only + supports bookmarking a text position. Thus, for the conversion, + <text:bookmark> and <text:bookmark-start> will both mark + a text position. +Change Tracking <text:tracked-changes> + <text:change*> [Not implemented yet]
+ +Change tracking elements are not supported yet on the current + OpenOffice XML filters, will have to watch out on this. The text + within these elements have to be interpreted properly during the + conversion process.
+Lists <text:unordered-list> and + <text:ordered-lists>
+ +A list can only contain one optional <text:list-header> + and one or more <text:list-item> elements.
+ +A <text:list-header> contains one or more paragraph + elements. Since there are no styles, the conversion process does not + do anything special for list headers, conversion for the paragraphs + within list headers are the same as explained above.
+ +A <text:list-item> may contain one or more of paragraphs, + headings, list, etc. Since the Doc format does not support any list + structure, there will not be any special handling for this element. + Conversion for elements within it shall be applied according to the + element type. Thus, lists with paragraphs within it will result in just + plain paragraphs. Sublists will not be identifiable. Paragraphs in + sublists will still appear.
+<text:section>
+ +I am not sure what this is yet, will need to investigate more on this.
+There may be other tags that will still need to be addressed for this conversion.
+ +Refer to {@link org.openoffice.xmerge.converter.xml.sxw.aportisdoc.DocumentSerializerImpl DocumentSerializerImpl}
+for details of implementation. It uses DocEncoder
class to do the encoding
+part.
The DocumentDeserializerImpl
class implements the
+org.openoffice.xmerge.DocumentDeserializer
. It is
+passed the device document in the form of a ConvertData
object.
+It will then create a SxwDocument
object from the conversion of
+the DOC formatted records.
The text content of the Doc format will be transferred as text. Paragraph +elements will be formed based on the existence of an ASCII LF character. There +will be at least one paragraph element.
+ +Bookmarks in the Doc format will be converted to the bookmark element +<text:bookmark> [Not implemented yet].
+ + +As mentioned above, the DocumentMerger
object produced by
+PluginFactoryImpl
is DocumentMergerImpl
.
+Refer to the javadocs for that package/class on its merging specifications.
+