office-gobmx/sal
Stephan Bergmann cef555e9a4 Make ExternalReferenceUriTranslator more robust against broken UTF-8
<https://lists.freedesktop.org/archives/libreoffice/2023-November/091151.html>
"CppunitTest_stoc_uriproc failed on Windows" reports that
translateToExternal("file:///abc/%feef") produces an empty string (indicating
failure) instead of "file:///abc/%FEef" (as expected in
stoc/test/uriproc/test_uriproc.cxx) when osl_getThreadTextEncoding() is Shift
JIS.

This was due to how the call to rtl::Uri::encode in
Translator::translateToExternal (in
stoc/source/uriproc/ExternalUriReferenceTranslator.cxx) behaved:  It internally
interpreted its input "%FE" as the single-byte Shift JIS character 0xFE.  Which
gets mapped to U+2122 as an extension (see "APPLE additions over SJIS, we
convert this like Apple, because I think, this gives better result, then [sic]
we take a replacement char" in sal/textenc/tcvtjp6.tab) in readUcs4, but which
in turn doesn't get mapped back to any Shift JIS character in writeEscapeChar.

Translator::translateToExternal is the only user of
rtl_UriEncodeStrictKeepEscapes, as introduced by
6ff5d3341d "INTEGRATION: CWS c07v013_SRC680
(1.4.40); FILE MERGED: 2007/06/21 13:00:56 sb 1.4.40.1: #b6550116# Made
XExternalUriReferenceTranslator.translateToExternal more robust when the input
URL contains spurious non--UTF-8 octets like %FE (which are now copied verbatim,
instead of signalling error)."

To make the claim true that such "spurious non--UTF-8 octets like %FE" are
always "copied verbatim", regardless of text encoding being used, repurpose
rtl_UriEncodeStrictKeepEscapes to always treat any escape sequences that are
present as (potentially broken) UTF-8.

Change-Id: I0fa0b14d3e3d44e4b5514e1b73c84c407a947ce9
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/158888
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2023-11-03 17:26:58 +01:00
..
android
cppunittester
emscripten
inc
osl
qa Make ExternalReferenceUriTranslator more robust against broken UTF-8 2023-11-03 17:26:58 +01:00
rtl Make ExternalReferenceUriTranslator more robust against broken UTF-8 2023-11-03 17:26:58 +01:00
test
textenc
util
CompilerTest_sal_rtl_oustring.mk
CppunitTest_Module_DLL.mk
CppunitTest_sal_comtools.mk
CppunitTest_sal_osl.mk
CppunitTest_sal_osl_security.mk
CppunitTest_sal_retry_if_failed.mk
CppunitTest_sal_rtl.mk
CppunitTest_sal_types.mk
Executable_cppunittester.mk
Executable_osl_process_child.mk
IwyuFilter_sal.yaml
Library_lo-bootstrap.mk
Library_sal.mk
Library_sal_textenc.mk
Makefile
Module_sal.mk
README.md

System Abstraction Layer (SAL)

System abstraction layer; rtl, osl and sal

rtl: Platform independent strings

osl: platform specific stuff, threads, dynamic loading, process, ipc, etc

Exports only C API and some inline-methods (only C++ API).