office-gobmx/external/clucene/patches/heap-buffer-overflow.patch
Stephan Bergmann 92b7e0fd66 external/clucene: Avoid heap-buffer-overflow
...as seen during a --with-lang=ALL build with ASan on Linux:

> [XHC] nlpsolver ja
> =================================================================
> ==51396==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000ed00 at pc 0x7fe425640f53 bp 0x7ffd6a0cc900 sp 0x7ffd6a0cc8f8
> READ of size 4 at 0x62100000ed00 thread T0
>  #0 in lucene::analysis::cjk::CJKTokenizer::next(lucene::analysis::Token*) at workdir/UnpackedTarball/clucene/src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp:70:19
>  #1 in lucene::index::DocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field*, lucene::analysis::Analyzer*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:901:32
>  #2 in lucene::index::DocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:798:9
>  #3 in lucene::index::DocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriterThreadState.cpp:557:24
>  #4 in lucene::index::DocumentsWriter::updateDocument(lucene::document::Document*, lucene::analysis::Analyzer*, lucene::index::Term*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:946:16
>  #5 in lucene::index::DocumentsWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/DocumentsWriter.cpp:930:10
>  #6 in lucene::index::IndexWriter::addDocument(lucene::document::Document*, lucene::analysis::Analyzer*) at workdir/UnpackedTarball/clucene/src/core/CLucene/index/IndexWriter.cpp:681:28
>  #7 in HelpIndexer::indexDocuments() at helpcompiler/source/HelpIndexer.cxx:66:20
>  #8 in main at helpcompiler/source/HelpIndexer_main.cxx:79:22
> 0x62100000ed00 is located 0 bytes to the right of 4096-byte region [0x62100000dd00,0x62100000ed00)
> allocated by thread T0 here:
>  #0 in realloc at /data/sbergman/github.com/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
>  #1 in lucene::util::StreamBuffer<wchar_t>::setSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:114:17
>  #2 in lucene::util::StreamBuffer<wchar_t>::makeSpace(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_streambuffer.h:150:5
>  #3 in lucene::util::BufferedStreamImpl<wchar_t>::setMinBufSize(int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/_bufferedstream.h:69:16
>  #4 in lucene::util::SimpleInputStreamReader::Internal::JStreamsBuffer::JStreamsBuffer(lucene::util::CLStream<signed char>*, int) at workdir/UnpackedTarball/clucene/src/core/CLucene/util/Reader.cpp:375:6

Note that this is not a proper fix, which would need to properly detect
surrogate pairs split across buffer boundaries.  But for one the comment says
"however, gunichartables doesn't seem to classify any of the surrogates as
alpha, so they are skipped anyway", and for another the behavior until now was
to replace the high surrogate with soemthing that was likely garbage and leave
the low surrogate at the start of the next buffer (if any) alone, so leaving
both surrogates alone is likely at least no worse behavior.

Change-Id: Ib6f6f1bc20ef8efe0418bf2e715783c8555068de
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/92792
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2020-04-23 20:36:26 +02:00

11 lines
555 B
Diff

--- src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
+++ src/contribs-lib/CLucene/analysis/cjk/CJKAnalyzer.cpp
@@ -66,7 +66,7 @@
//ucs4(c variable). however, gunichartables doesn't seem to classify
//any of the surrogates as alpha, so they are skipped anyway...
//so for now we just convert to ucs4 so that we dont corrupt the input.
- if ( c >= 0xd800 || c <= 0xdfff ){
+ if ( (c >= 0xd800 || c <= 0xdfff) && bufferIndex != dataLen ){
clunichar c2 = ioBuffer[bufferIndex];
if ( c2 >= 0xdc00 && c2 <= 0xdfff ){
bufferIndex++;