Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Bump `com.nimbusds:nimbus-jose-jwt` from 10.3 to 10.4.2 ([#19099](https://github.com/opensearch-project/OpenSearch/pull/19099), [#19101](https://github.com/opensearch-project/OpenSearch/pull/19101))
- Bump netty from 4.1.121.Final to 4.1.124.Final ([#19103](https://github.com/opensearch-project/OpenSearch/pull/19103))
- Bump google cloud storage from 1.113.1 to 2.55.0 ([#4547](https://github.com/opensearch-project/OpenSearch/pull/4547))
- Bump tika from 2.9.2 to 3.2.2 ([#19125](https://github.com/opensearch-project/OpenSearch/pull/19125))
- Bump commons-compress from 1.26.1 to 1.28.0 ([#19125](https://github.com/opensearch-project/OpenSearch/pull/19125))

### Deprecated

Expand Down
7 changes: 6 additions & 1 deletion distribution/tools/plugin-cli/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,10 @@ thirdPartyAudit.ignoreMissingClasses(
'org.tukaani.xz.XZOutputStream',
'org.apache.commons.codec.digest.PureJavaCrc32C',
'org.apache.commons.codec.digest.XXHash32',
'org.apache.commons.lang3.reflect.FieldUtils'
'org.apache.commons.lang3.reflect.FieldUtils',
'org.apache.commons.lang3.ArrayFill',
'org.apache.commons.lang3.ArrayUtils',
'org.apache.commons.lang3.StringUtils',
'org.apache.commons.lang3.SystemProperties',
'org.apache.commons.lang3.function.Suppliers'
)

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e482f2c7a88dac3c497e96aa420b6a769f59c8d7
2 changes: 1 addition & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ httpasyncclient = "4.1.5"
commonslogging = "1.2"
commonscodec = "1.18.0"
commonslang = "3.18.0"
commonscompress = "1.26.1"
commonscompress = "1.28.0"
commonsio = "2.16.0"
# plugin dependencies
aws = "2.30.31"
Expand Down
7 changes: 4 additions & 3 deletions plugins/ingest-attachment/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ opensearchplugin {
}

versions << [
'tika' : '2.9.2',
'pdfbox': '2.0.31',
'tika' : '3.2.2',
'pdfbox': '3.0.5',
'poi' : '5.4.1',
'mime4j': '0.8.11'
]
Expand Down Expand Up @@ -75,10 +75,11 @@ dependencies {

// external parser libraries
// HTML
api 'org.ccil.cowan.tagsoup:tagsoup:1.2.1'
api 'org.jsoup:jsoup:1.20.1'
// Adobe PDF
api "org.apache.pdfbox:pdfbox:${versions.pdfbox}"
api "org.apache.pdfbox:fontbox:${versions.pdfbox}"
api "org.apache.pdfbox:pdfbox-io:${versions.pdfbox}"
api "org.apache.pdfbox:jempbox:1.8.17"
api "commons-logging:commons-logging:${versions.commonslogging}"
// OpenOffice
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e482f2c7a88dac3c497e96aa420b6a769f59c8d7

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
b4a068e1dba2b9832a108cdf6e9a3249680e3ce8
1 change: 1 addition & 0 deletions plugins/ingest-attachment/licenses/jsoup-1.20.1.jar.sha1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
769377896610be1736f8d6d51fc52a6042d1ce82
21 changes: 21 additions & 0 deletions plugins/ingest-attachment/licenses/jsoup-LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2009-2025 Jonathan Hedley <https://jsoup.org/>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

This file was deleted.

1 change: 1 addition & 0 deletions plugins/ingest-attachment/licenses/pdfbox-3.0.5.jar.sha1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c34109061c3a0d85d871d9edc469ac0682f81856
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
402151a8d1aa427ea879cc7160e9227e9f5088ba
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Expand Down Expand Up @@ -199,3 +200,145 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

EXTERNAL COMPONENTS

Apache PDFBox includes a number of components with separate copyright notices
and license terms. Your use of these components is subject to the terms and
conditions of the following licenses.

Contributions made to the original PDFBox and FontBox projects:

Copyright (c) 2002-2007, www.pdfbox.org
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of pdfbox; nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

Adobe Font Metrics (AFM) for PDF Core 14 Fonts

This file and the 14 PostScript(R) AFM files it accompanies may be used,
copied, and distributed for any purpose and without charge, with or without
modification, provided that all copyright notices are retained; that the
AFM files are not distributed without this file; that all modifications
to this file or any of the AFM files are prominently noted in the modified
file(s); and that this paragraph is not modified. Adobe Systems has no
responsibility or obligation to support the use of the AFM files.

CMaps for PDF Fonts (http://opensource.adobe.com/wiki/display/cmap/Downloads)

Copyright 1990-2009 Adobe Systems Incorporated.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

Neither the name of Adobe Systems Incorporated nor the names of its
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.

PaDaF PDF/A preflight (http://sourceforge.net/projects/padaf)

Copyright 2010 Atos Worldline SAS

Licensed by Atos Worldline SAS under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
Atos Worldline SAS licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

OSXAdapter

Version: 2.0

Disclaimer: IMPORTANT: This Apple software is supplied to you by
Apple Inc. ("Apple") in consideration of your agreement to the
following terms, and your use, installation, modification or
redistribution of this Apple software constitutes acceptance of these
terms. If you do not agree with these terms, please do not use,
install, modify or redistribute this Apple software.

In consideration of your agreement to abide by the following terms, and
subject to these terms, Apple grants you a personal, non-exclusive
license, under Apple's copyrights in this original Apple software (the
"Apple Software"), to use, reproduce, modify and redistribute the Apple
Software, with or without modifications, in source and/or binary forms;
provided that if you redistribute the Apple Software in its entirety and
without modifications, you must retain this notice and the following
text and disclaimers in all such redistributions of the Apple Software.
Neither the name, trademarks, service marks or logos of Apple Inc.
may be used to endorse or promote products derived from the Apple
Software without specific prior written permission from Apple. Except
as expressly stated in this notice, no other rights or licenses, express
or implied, are granted by Apple herein, including but not limited to
any patent rights that may be infringed by your derivative works or by
other works in which the Apple Software may be incorporated.

The Apple Software is provided by Apple on an "AS IS" basis. APPLE
MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND
OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS.

IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION,
MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED
AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE),
STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

Copyright (C) 2003-2007 Apple, Inc., All Rights Reserved
22 changes: 22 additions & 0 deletions plugins/ingest-attachment/licenses/pdfbox-io-NOTICE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Apache PDFBox
Copyright 2014 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

Based on source code originally developed in the PDFBox and
FontBox projects.

Copyright (c) 2002-2007, www.pdfbox.org

Based on source code originally developed in the PaDaF project.
Copyright (c) 2010 Atos Worldline SAS

Includes the Adobe Glyph List
Copyright 1997, 1998, 2002, 2007, 2010 Adobe Systems Incorporated.

Includes the Zapf Dingbats Glyph List
Copyright 2002, 2010 Adobe Systems Incorporated.

Includes OSXAdapter
Copyright (C) 2003-2007 Apple, Inc., All Rights Reserved

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f1f16ecac7a81e145051f906927ea6b58ce7e914

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3ee2907773fe2aaa1013829e00cd62778d6a2ff9

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
fde21727740a39beead899c9ca6e642f92d86e3a

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
e6acd314da558703977a681661c215f3ef92dbbd

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
41ff68abccde91ab17d7b181eb7a5fccf16e8b5c

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d4078f950ca55c5235cdfcad744235242f9edc05

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
a972d70ef0762b460c048c5e0e8a46c46bb170aa

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
a19be47ecca1a061349dc2d019ab6f2741ff1dee

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
9dd2f1c52ab2663600e82dae3a8003ce6ede372f

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
f1dfa02a2c672153013d44501e0c21d5682aa822

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
d46b71ea5697f575c3febfd7343e5d8b2c338bd5

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
c91fb85f5ee46e2c1f1e3399b04efb9d1ff85485
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ final class TikaImpl {
/** subset of parsers for types we support */
private static final Parser PARSERS[] = new Parser[] {
// documents
new org.apache.tika.parser.html.HtmlParser(),
new org.apache.tika.parser.html.JSoupParser(),
new org.apache.tika.parser.pdf.PDFParser(),
new org.apache.tika.parser.txt.TXTParser(),
new org.apache.tika.parser.microsoft.rtf.RTFParser(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,7 @@ grant {
permission java.lang.RuntimePermission "accessDeclaredMembers";
// PDFBox checks for the existence of this class
permission java.lang.RuntimePermission "accessClassInPackage.sun.java2d.cmm.kcms";
permission java.io.FilePermission "/System/Library/Fonts/-", "read";
permission java.io.FilePermission "/Library/Fonts/-", "read";
permission java.io.FilePermission "/usr/share/fonts/-", "read";
};
Loading
Loading