office-gobmx/include/o3tl/char16_t2wchar_t.hxx
Mike Kaganski a5a49657dc tdf#158442: fix opening hybrid PDFs on Windows
Commit 046e954595 (Try to revert to use
of file_iterator from boost on Windows, 2023-10-31) had introduced a
problem that pdfparse::PDFReader::read couldn't create file_iterator
for files already opened with write access: mmap_file_iterator ctor
on Windows used single FILE_SHARE_READ as dwSharedMode parameter for
CreateFileA WinAPI; and that failed, when the file was already opened
using GENERIC_WRITE in dwDesiredAccess - which happens when opening
stream in TypeDetection::impl_detectTypeFlatAndDeep.

Fix this by patching boosts' mmap_file_iterator constructor to use
FILE_SHARE_READ | FILE_SHARE_WRITE, like we do in osl_openFile.

But there was a pre-existing problem of using char-based CreateFileA
API, which disallows opening any files with names not representable
in current Windows codepage. Such hybrid PDF files would still fail
creation of the file_iterator, and open as PDF.

Fix that by further patching boost to have wstring-based constructors
for file_iterator and mmap_file_iterator on Windows, which would call
CreateFileW.

Change-Id: Ib190bc090636159ade390b3dd120957d06d7b89b
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/160218
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
2023-12-01 16:13:38 +01:00

48 lines
2.3 KiB
C++

/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4; fill-column: 100 -*- */
/*
* This file is part of the LibreOffice project.
*
* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/.
*/
#pragma once
#include <sal/config.h>
#include <string_view>
namespace o3tl
{
#if defined _WIN32
// Helpers for safe conversion between wchar_t and char16_t in MSVC
static_assert(sizeof(char16_t) == sizeof(wchar_t),
"These helper functions are only applicable to implementations with 16-bit wchar_t");
// While other implementations define wchar_t as 32-bit integral value, and mostly use
// char-based UTF-8 string APIs, in MSVC wchar_t is (non-conformant) 16-bit, and Unicode
// support is implemented by Unicode-aware WinAPI functions taking UTF-16 LE strings,
// and also stdlib functions taking those strings.
//
// In LibreOffice, internal string representation is also UTF-16 with system endianness
// (sal_Unicode that is typedef for char16_t); so it is an important implementation concept
// to treat internal strings as directly usable by WinAPI/stdlib functions and vice versa.
// Also, it's important to use safe conversion between unrelated underlying C++ types
// used for MSVC and LibreOffice string storage without plain reinterpret_cast that brings
// risks of masking errors like casting char buffers to wchar_t/char16_t.
//
// Use these helpers for wchar_t (WSTR, WCHAR, OLESTR etc) to char16_t (sal_Unicode) string
// conversions instead of reinterpret-cast in Windows-specific code.
inline wchar_t* toW(char16_t* p) { return reinterpret_cast<wchar_t*>(p); }
inline wchar_t const* toW(char16_t const* p) { return reinterpret_cast<wchar_t const*>(p); }
inline char16_t* toU(wchar_t* p) { return reinterpret_cast<char16_t*>(p); }
inline char16_t const* toU(wchar_t const* p) { return reinterpret_cast<char16_t const*>(p); }
inline std::u16string_view toU(std::wstring_view v) { return { toU(v.data()), v.size() }; }
inline std::wstring_view toW(std::u16string_view v) { return { toW(v.data()), v.size() }; }
#endif
}
/* vim:set shiftwidth=4 softtabstop=4 expandtab cinoptions=b1,g0,N-s cinkeys+=0=break: */