Releases: apache/orc
Releases · apache/orc
v1.8.0
Milestone
Changelog
New Feature and Notable Changes
- ORC-450 Support selecting list indices without materializing list items
- ORC-824 Add column statistics for List and Map
- ORC-1004 Java ORC writer supports the selection vector
- ORC-1075 Support reading ORC files with no column statistics
- ORC-1125 Support decoding decimals in RLE
- ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
- ORC-1138 Seek vs Read Optimization
- ORC-1172 Add row count limit config for one stripe
- ORC-1212 Upgrade protobuf-java to 3.17.3
- ORC-1220 Set min.hadoop.version to 2.7.3
- ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
- ORC-1256 Publish test-jar to maven central
- ORC-1260 Publish
shaded-protobuf
classifier artifacts
Improvement
- ORC-825 Use Empty Array For Collections toArray
- ORC-826 Do Not Use Collection Contains/Get
- ORC-828 Improve Fetch Data Set Process
- ORC-829 Optimize Serialization percentileBits
- ORC-831 Do Not Copy String When Flushing Dictionary
- ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
- ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
- ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
- ORC-836 StringGroupFromDoubleTreeReader Use Double toString
- ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
- ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
- ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
- ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
- ORC-842 Remove newKey from StringHashTableDictionary
- ORC-844 Improve hashCode Methods
- ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
- ORC-852 Allow DynamicByteArray to Return a ByteBuffer
- ORC-853 Optimize writeDouble Implementation
- ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
- ORC-865 Bump opencsv from 3.9 to 5.5.1
- ORC-883 Dependency Audit and QA
- ORC-897 optimization loop termination condition in readerIsCompatible method
- ORC-935 Bump commons-csv from 1.8 to 1.9.0
- ORC-937 Replace deprecated method
- ORC-958 Convert command support overwrite option
- ORC-969 Evaluate SearchArguments using file and stripe level stats
- ORC-975 Avoid double counting closestFixedBits in percentileBits method
- ORC-982 Extract checkstyle to a single file, help newcomers check code style
- ORC-988 Bump opencsv from 5.5.1 to 5.5.2
- ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
- ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
- ORC-1007 Fix a warning from the shade plugin
- ORC-1013 Renaming a parameter in constructors of TreeWriter's derived classes
- ORC-1014 Add details when we get IOExceptions from file system
- ORC-1020 Improve orc::RleDecoderV2::nextDirect
- ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
- ORC-1047 Handle quoted field names during string schema parsing
- ORC-1077 Remove commons-codec dependency and use java.util.Base64
- ORC-1099 Extend ReadIntent to support MAP and UNION type
- ORC-1101 Improve malformed STRUCT handling
- ORC-1122 Add buffer to decode the whole run in RleDecoderV2
- ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
- ORC-1149 Bump slf4j.version to 1.7.36
- ORC-1150 Improve RowReaderImpl::computeBatchSize()
- ORC-1152 Support encoding short decimals in RLEv2
- ORC-1156 Update opencsv to 5.6
- ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
- ORC-1169 Use Hadoop 3.3.2 on Java 17+
- ORC-1178 Use hadoop 3.3.3 on Java 17+
Bug
- ORC-845 Fix NPE in DynamicIntArray toString
- ORC-929 Fix NaN at orc-tools 'meta' command
- ORC-1129 The build of tool-test should depends on cpp tools
- ORC-1159 Crash when the last stripe is skipped
- ORC-1242 Bump threeten-extra to 1.7.1
Test
- ORC-860 Add dependabot
- ORC-864 Bump jackson.version from 2.12.2 to 2.12.4
- ORC-877 Bump junit-vintage-engine from 5.7.0 to 5.7.2
- ORC-888 Bump objenesis from 3.1 to 3.2
- ORC-905 Add an integration test for
example
- ORC-917 Bump mockito-core from 3.7.0 to 3.11.2
- ORC-919 Spark bench objenesis should be the same as Spark.
- ORC-920 Use junit.version and mockito.version property and bump junit to 5.7.2
- ORC-925 Simplify assertions
- ORC-928 Bump checkstyle from 8.44 to 8.45.1
- ORC-932 Bump byte-buddy from 1.10.19 to 1.11.12
- ORC-934 Add integration tests for Java bench
- ORC-940 Use Hadoop 3.3.1 in bench module
- ORC-955 Add Javadoc generation GitHub Action job
- ORC-963 Build
benchmark
module always for integration testing - ORC-966 Bump byte-buddy from 1.11.12 to 1.11.13
- ORC-967 Bump mockito.version from 3.11.2 to 3.12.1
- ORC-986 Bump mockito.version from 3.12.1 to 3.12.4
- ORC-987 Bump jackson.version from 2.12.4 to 2.12.5
- ORC-1001 Bump maven-enforcer-plugin to 3.0.0
- ORC-1019 Remove redundant jackson dependencies
- ORC-1022 Bump byte-buddy from 1.11.13 to 1.11.19
- ORC-1038 Bump mockito.version from 3.12.4 to 4.0.0
- ORC-1074 Bump byte-buddy from 1.11.19 to 1.12.6
- ORC-1079 Add Linux clang GitHub Action job
- ORC-1085 Bump auto-service from 1.0 to 1.0.1
- ORC-1089 Add test cases verifying writers with selected vector
- ORC-1104 Use Spark 3.2.1 in benchmark
- ORC-1107 Fix NPE at benchmark data schema loading
- ORC-1110 Bump mockito.version from 4.0.0 to 4.3.1
- ORC-1126 Bump byte-buddy from 1.12.6 to 1.12.8
- ORC-1139 Benchmark for Seek vs Read
- ORC-1141 Bump mockito.version from 4.3.1 to 4.4.0
- ORC-1145 Add Java 18 to GitHub Action CI.
- ORC-1153 Bump byte-buddy from 1.12.8 to 1.12.9
- ORC-1157 Update guava to 31.1-jre
- ORC-1168 Update byte-buddy to 1.12.10
- ORC-1177 Upgrade mockito.version to 4.5.1
- ORC-1179 Upgrade checkstyle to 10.2 on Java 11+
- ORC-1187 Use main instead of master in merge_orc_pr.py
- ORC-1194 Bump mockito.version to 4.6.0
- ORC-1195 Bump checkstyle to 10.3
- ORC-1196 Add spark benchmark integration tests to GHA
- ORC-1197 Bump mockito.version from 4.6.0 to 4.6.1
- ORC-1201 Remove Debian 9 from Docker Tests
- ORC-1203 Bump maven-enforcer-plugin to 3.1.0
- ORC-1206 Bump netty-all to 4.1.78.Final
- ORC-1207 Upgrade Spark to 3.3.0
- ORC-1208 Bump byte-buddy to 1.12.12
- ORC-1209 Bump checkstyle to 10.3.1
- ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
- ORC-1236 Bump checkstyle to 10.3.2
- ORC-1243 Bump byte-buddy to 1.12.13
- ORC-1253 Add Fedora 37 docker test
- ORC-1254 Add spotbugs check
Task
- ORC-868 Pin gson to 2.2.4
- ORC-869 Pin jmh 1.20
- ORC-872 Bump kryo-shaded from 3.0.3 to 4.0.2
- ORC-874 Bump zookeeper from 3.6.2 to 3.7.0
- ORC-884 Bump jettison from 1.1 to 1.4.1
- ORC-887 Remove ORC Twitter link from
news
page - ORC-890 Pin minimum support Hadoop version to 2.2.0
- ORC-892 Pin scala-library to 2.12.10
- ORC-898 Bump threeten-extra from 1.5.0 to 1.7.0
- ORC-899 Archive Apache ORC 1.4.x in
releases
page - ORC-900 Update doap_orc.rdf for Apache Projects page
- ORC-908 Use https instead of http for website links in
pom.xml
- ORC-914 Pin maven-dependency-plugin to 3.1.2
- ORC-916 Bump annotations from 17.0.0 to 21.0.1
- ORC-918 Pin protobuf-java to 2.5.0
- ORC-923 Bump apache from 23 to 24
- ORC-946 Unified json library
- ORC-949 Add CustomImportOrder rule
- ORC-956 Bump annotations from 21.0.1 to 22.0.0
- ORC-977 Update webpages and TestVectorOrcFile.java to be more neutral
- ORC-1045 Bump commons-cli to 1.5
- ORC-1056 Bump annotations from 22.0.0 to 23.0.0
- ORC-1103 Use Maven 3.8.4
- ORC-1140 Documentation for Seek vs Read
- ORC-1158 Add notification settings to .asf.yam
- ORC-1162 Fix Apache Project Website Checks Warningl
- ORC-1165 Enable GitHub Action in branch-1.8
- ORC-1166 Enable snapshot publishing in branch-1.8
- ORC-1171 Skip build and test on docker and site updates
- ORC-1173 Pin jodd-core to 3.5.2
- ORC-1176 Upgrade maven-jar-plugin to 3.2.2
- ORC-1185 Add merge_orc_pr.py
- ORC-1210 Upgrade maven to 3.8.6
- ORC-1216 Pin org.jetbrains.annotations dependency to 17.0.0
- ORC-1211 Upgrade maven-assembly-plugin to 3.4.0
- ORC-1214 Bump maven-assembly-plugin to 3.4.1
- ORC-1217 Downgrade org.jetbrains.annotations to 17.0.0
- ORC-1223 Move DirectDecompressWrapper to org.apache.orc.impl
- ORC-1224 Move getDecompressor to HadoopShimsCurrent
- ORC-1226 Add a deprecation warning for Hadoop 2.7.2 and below
- ORC-1229 Move KeyProviderImpl to org.apache.orc.impl
- ORC-1230 Move encryption utility functions to HadoopShimsCurrent
- ORC-1246 Revamp ORC Website
- ORC-1247 Improve Apache ORC website and docs
- ORC-1249 Move site/_docs/releases.md to site/releases/index.md
- ORC-1255 Fix ORC website navbar highlight
- ORC-1257 Publish multi-architecture ORC-dev docker images
- ORC-1261 Rename shaded pattern
com.google.protobuf25
toorg.apache.orc.protobuf
- ORC-1263 Add decimal type to ORC Website
- ORC-1221 Move NullKeyProvider to org.apache.orc.impl
v1.7.6
Milestone
Changelog
Bug Fixes
- ORC-1204: ORC MapReduce writer to flush when long arrays
- ORC-1205:
nextVector
should invokeensureSize
when reusing vectors - ORC-1215: Remove a wrong
NotNull
annotation onvalue
ofsetAttribute
- ORC-1222: Upgrade
tools.hadoop.version
to 2.10.2 - ORC-1227: Use
Constructor.newInstance
instead ofClass.newInstance
- ORC-1228: Fix
setAttribute
to handle null value
Tests
- ORC-932: Bump byte-buddy from 1.10.19 to 1.11.12 (#842)
- ORC-1169: Use Hadoop to 3.3.2 on Java 17+ (#1113)
- ORC-1178: Use Hadoop 3.3.3 on Java 17+ (#1129)
- ORC-1193: Bump parquet.version to 1.12.3
- ORC-1207: Upgrade Spark to 3.3.0
- ORC-1210: Upgrade maven to 3.8.6
- ORC-1234: Upgrade
objenesis
to 3.2 in Spark benchmark - ORC-1235: Bump avro.version to 1.11.1
- ORC-1240: Update site README to use
apache/orc-dev
DockerHub image - ORC-1241: Use
apache/orc-dev
DockerHub repository in Docker tests - ORC-1244: Upgrade
byte-buddy
to 1.12.13 in branch-1.7 - ORC-1245: Use Hadoop 3.3.4 on Java 17+ and benchmark
Documentation
- MINOR: Update DOAP with new releases (#1127)
- ORC-900: Update
doap_orc.rdf
for Apache Projects page (#806) - ORC-1231: Update supported OS list in
building.md
- ORC-1237: Remove a wrong image link to
article-footer.png
- ORC-1238: Update DOAP with 1.7.5
Task
v1.7.5
Milestone
Changelog
Bug Fixes
- ORC-1151: [C++] Fix ColumnWriter for non-UTC Timestamp columns (#1088)
- ORC-1160: [C++] Fix seekToRow can't seek within selected row group (#1102)
- ORC-1133: [C++] Fix csv-import tool options
- ORC-1183: Upgrade gson to 2.9.0
- ORC-1186: Limit
family
inaarch64
profile - ORC-1188: Fix
ORC_PREFER_STATIC_ZLIB
Improvements
- ORC-1198: Add a new
PhysicalFsWriter
constructor withFSDataOutputStream
parameter - ORC-1199: Use Google mirror of Maven Central as the primary
Tests
- ORC-1155: Add Ubuntu 22.04 to docker tests (#1093)
- ORC-1154: Bump hive.version from 3.1.2 to 3.1.3 (#1090)
- ORC-1161: Add MacOS 12 and remove MacOS 10
- ORC-1174: Add
Ubuntu 22.04
to GitHub Action (#1128) - ORC-1182: Use
slf4j-simple
instead of deprecatedslf4j-log4j12
- ORC-1184: Use Hadoop 3.3.3 in benchmark module
- ORC-1189: Update
README.md
andhelp
command message inbenchmark
module and.gitignore
- ORC-1190: Fix
ORCWriterBenchMark
dumpDir
initialization - ORC-1191: Updated TLC Taxi Benchmark Dataset
- ORC-1192: Use
orc.zstd
instead oforc.none
(#1144) - ORC-1196: Add Spark benchmark integration tests to GHA
- ORC-1201: Remove
Debian 9
from Docker Tests
Documentation
- MINOR: Add ASF verification instruction link (#1134)
v1.6.14
Milestone
Changelog
Bug Fixes
- ORC-1121: Fix column coversion check bug which causes column filters don't work (#1055)
- ORC-1146: Float category missing check if the statistic sum is a finite value (#1078)
- ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values (#1082)
Tests
- ORC-1016: Use
[email protected]
in GitHub Action MacOS CIs - ORC-1113: Remove CentOS 8 from docker-based tests (#1040)
v1.7.4
Milestone
Changelog
Bug Fixes
- ORC-1120: Remove C++ library limitation about write version (#1054)
- ORC-1121: Fix column conversion check bug which causes column filters don't work (#1055)
- ORC-1127: [C++] add missing version of UNSTABLE-PRE-2.0 (#1064)
- ORC-1146: Float category missing check if the statistic sum is a finite value (#1078)
- ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values (#1082)
Improvements
- ORC-236: Support
UNION
type in Java Convert tool (#1025) - ORC-1116: [C++] Fix csv-import tool when exporting long bytes (#1044)
- ORC-1123: Add estimationMemory method for writer (#1057)
Tests
- ORC-1145: Add Java 18 to GitHub Action CI (#1074)
- ORC-1118: Support Java 17 and ARM64 docker tests (#1047)
Documentation
v1.7.3
Milestone
Changelog
Bug Fixes
- ORC-1060: Reduce memory usage when vectorized reading dictionary string encoding columns (#971)
- ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail #979
- ORC-1067: [C++] Upgrade ZSTD to 1.5.1 (#981)
- ORC-1078: Row group end offset doesn't accommodate all the blocks (#996)
- ORC-1081: Fix heap-use-after-free in SearchArgumentBuilderImpl::end() (#998)
- ORC-1087: [C++] Handle unloaded seek positions when seeking in an uncompressed chunk (#1008)
- ORC-1092: [C++] Upgrade LZ4 to version 1.9.3 (#1012)
- ORC-1102: [C++] Upgrade ZSTD to 1.5.2 (#1026)
Improvements (orc-tools
)
- ORC-1055: [C++] Add the timezone option for the csv-import tool (#975)
- ORC-1082: Improve FileDump and JsonFileDump to be robust on missing column statistics (#1003)
- ORC-1098: [C++] Support specifying type ids or column names in cpp tools (#1020)
Documentation
- ORC-1050: Update ORC site README.md and release process page (#963)
- ORC-1069: Update building.md (#982)
- ORC-1071: Update adopters page (#985)
- ORC-1091: Add
Tests
section at ORCdevelop
page (#1011) - ORC-1112: Add
Using with Python
web page (#1039) - ORC-1114: Update
Using with Python
page withPyArrow
7.0.0 (#1042)
Task
- ORC-1070: Upgrade site docker image to use Ubuntu 20.04 (#983)
- ORC-1072: Add 'Stale' GitHub Action job (#986)
- ORC-1094: Enable GitHub issues tab (#1015)
- ORC-1095: Deprecate
UnknownFormatException
(#1016)
Tests
- ORC-875: Add GitHub Action job for Windows Server 2019 (#872)
- ORC-878: Bump auto-service from 1.0-rc7 to 1.0
- ORC-881: Bump slf4j.version from 1.7.30 to 1.7.32 (#786)
- ORC-989: Bump checkstyle from 8.45.1 to 9.0 (#899)
- ORC-993: Bump junit.version from 5.7.2 to 5.8.0 (#906)
- ORC-1018: Bump checkstyle from 9.0 to 9.0.1 (#927)
- ORC-1033: Bump junit.version from 5.8.0 to 5.8.1 (#938)
- ORC-1044: Bump reproducible-build-maven-plugin to 0.14 (#955)
- ORC-1048: Bump checkstyle from 9.0.1 to 9.1 (#960)
- ORC-1052: Bump avro.version from 1.10.2 to 1.11.0 (#965)
- ORC-1057: Bump junit.version from 5.8.1 to 5.8.2 (#969)
- ORC-1061: Bump checkstyle from 9.1 to 9.2 (#970)
- ORC-1066: Bump guava from 30.1.1-jre to 31.0.1-jre #978
- ORC-1068: [C++] Stabilize HAS_POST_2038 test (#980)
- ORC-1073: Remove appveyor.yml (#987)
- ORC-1076: Remove Travis PR Builder Link from README.md (#991)
- ORC-1079: Add
Linux Clang 11
GitHub Action test coverage (#995) - ORC-1080: Remove .travis.yml (#997)
- ORC-1084: Bump checkstyle from 9.2 to 9.2.1 (#1007)
- ORC-1086: Bump reproducible-build-maven-plugin from 0.14 to 0.15 (#1005)
- ORC-1090: Disable Clang 13.0-specific compilation warnings (#1017)
- ORC-1093: Remove debian8 specific code in run-one.sh (#1013)
- ORC-1096: Bump slf4j.version to 1.7.33 (#1019)
- ORC-1103: Use Maven 3.8.4 (#1029)
- ORC-1104: Use Spark 3.2.1 in benchmark (#1030)
- ORC-1105: fetch-data.sh should use zsh instead of bash (#1031)
- ORC-1106: Use transitive commons-lang3 dependency in bench module (#1032)
- ORC-1107: Fix NPE at benchmark data schema loading (#1033)
- ORC-1108: Use
RawLocalFileSystem
to skip checksum files during benchmark data generation (#1034) - ORC-1109: Use
zstd
instead ofnone
in the default compress option (#1035) - ORC-1111: Bump build-helper-maven-plugin from 3.2.0 to 3.3.0 (#1038)
- ORC-1113: Remove CentOS 8 from docker-based tests (#1040)
- ORC-1115: Suppress
Illegal reflective access
warnings on Java9+ Tests (#1043)
v1.6.13
v1.7.2
Milestone
Changelog
Bug Fixes
- ORC-492: Avoid potential ArrayIndexOutOfBoundsException when getting WriterVersionn (#961)
- ORC-1041: Use
memcpy
during LZO decompression (#958) - ORC-1053: Fix time zone offset precision when convert tool converts
LocalDateTime
toTimestamp
is not consistent with the internal default precision of ORC (#967) - ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release (#972)
Improvements (orc-tools
)
- ORC-1012: Support specifying columns in orc-scan (#921)
- ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files. (#925)
- ORC-1023: Support writing bloom filters in ConvertTool (#933)
Tests
- ORC-915: Remove io.netty.netty from Spark benchmark (#822)
- ORC-938: Bump netty-all from 4.1.42.Final to 4.1.66.Final (#819)
- ORC-948: Add hive benchmark integration tests (#860)
- ORC-957: Bump netty-all from 4.1.66.Final to 4.1.67.Final (#870)
- ORC-1021: Add -fno-omit-frame-pointer in DEBUG and RELWITHDEBINFO builds (#932)
- ORC-1051: Update benchmark dependencies (#964)
v1.7.1
Milestone
Changelog
Bug Fixes
- ORC-879 - Flaky Test for TestJsonReader
- ORC-1008 - Overflow detection code is incorrect in IntegerColumnStatisticsImpl
- ORC-1009 - [C++] Missing string include causes build failure with MSVC++
- ORC-1015 - Update OrcFile.WriterOptions::memory javadoc
- ORC-1016 - Use
[email protected]
in GitHub Action MacOS CIs - ORC-1024 - BloomFilter hash computation is inconsistent between Java and C++ clients
- ORC-1029 - Could not load 'org.apache.orc.DataMask.Provider' when using orc encryption and spark executor with multi cores!
- ORC-1030 - Java Tools Recover File command does not accurately find OrcFile.MAGIC
- ORC-1034 - The search byte array algorithm is incorrectly implemented in FileDump.java
- ORC-1035 -
backupDataPath
may be incorrect in recoverFile - ORC-1039 - Make FileDump.recoverFile handle side files only if they exist
Test
- ORC-1000 - Use Java 17 in GitHub Action
- ORC-1002 - Add java17 profile for Java17 unit testing
- ORC-1010 - Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
- ORC-1011 - Activate
java17
profile automatically - ORC-1032 - Bump parquet.version from 1.12.0 to 1.12.2
- ORC-1036 - Due to tzdata upgrade, the fixed download links in CI are often not working
- ORC-1037 - Bump spark.version from 3.1.2 to 3.2.0
- ORC-1040 - Add Debian 11 docker test
- ORC-1042 - Ignore unused-function C++ compile warning on CentOS 7
- ORC-1043 - Fix C++ conversion compilation error in CentOS 7
v1.7.0
New Feature
- [ORC-40] - [C++] Support building SearchArgument
- [ORC-577] - Allow row-level filtering
- [ORC-602] - Create adaptor for using FSDataInputStream for Java ORC reader
- [ORC-716] - Build and test on Java 17-EA
- [ORC-731] - Improve
Java Tools
- [ORC-747] - Abstract Dictionary interface and refactoring
- [ORC-751] - [C++] Implement Predicate Pushdown for C++ Reader
- [ORC-765] - Added build option to compile libraries with position independent code
- [ORC-819] - Add GitHub labeler
Improvement
- [ORC-377] - [C++] Adding writing with snappy compression to orc c++ writing lib
- [ORC-480] - [C++] Deactivate WARN_FLAGS in release build
- [ORC-566] - Add docker file for building site
- [ORC-568] - Make the convert tool sort the old _col column names by number
- [ORC-574] - Performance: Use const references for string statistics min and max to avoid copy construction
- [ORC-588] - Static field or method should be directly referred by its class
- [ORC-595] - Optimize Decimal64 scale calculation
- [ORC-597] - Row-level filtering bench
- [ORC-606] - Optimize Timestamp parseNanos calculation
- [ORC-607] - Sync orc-benchmarks module to the others
- [ORC-608] - Fix DecimalBench reader options
- [ORC-609] - Upgrade aircompressor to 0.16
- [ORC-614] - Implement efficient seek() in decompression streams
- [ORC-615] - Refactor decompression streams into common base class
- [ORC-622] - Refactoring of TreeReader into TypeReader and BatchReader
- [ORC-638] - ORCMapredRecordWriter enlarge columnVector with factors when child array size is not large enough
- [ORC-639] - Improve zstd compression performance
- [ORC-646] - Add Ubuntu 20.04 docker file
- [ORC-651] - Use GitHub Pull Request Template
- [ORC-652] - Upgrade ZSTD to 1.4.5
- [ORC-655] - Update bench to use Spark 2.4.6
- [ORC-656] - Use gharchive.org instead of githubarchive.org
- [ORC-657] - Remove com.netflix.iceberg dependency in java/bench module
- [ORC-683] - PPD: Make Floating point NaN check more strict
- [ORC-684] - [C++] Make Floating point NaN check more strict
- [ORC-687] - Upgrade to JUnit5
- [ORC-688] - Allow CHAR, VARCHAR to be promoted to STRING
- [ORC-689] - Add GitHubAction job to publish snapshot
- [ORC-693] - Update credential according to INFRA setup
- [ORC-694] - Update docker files adding Java11 support
- [ORC-696] - Consistent TypeDescription handling for quoted field names
- [ORC-697] - Improve Scan tool to report where files are corrupted.
- [ORC-699] - Minor improvements to the scan tool
- [ORC-704] - Publish snapshots at only apache repo
- [ORC-710] - Update maven plugins
- [ORC-712] - Add
USING IN SPARK
to website - [ORC-722] - Improve code quality using static analysis.
- [ORC-734] - Use org.apache.commons.lang3
- [ORC-736] - Upgrade Hive to 3.1.2
- [ORC-737] - Upgrade Spark to 3.1.0
- [ORC-744] - LazyIO of non-filter columns
- [ORC-745] - Migrate to travis-ci.com
- [ORC-748] - Add separate writer implementation for Trino
- [ORC-749] - Add checkstyle to -Panalzye
- [ORC-750] - Fix benchmark to pass checkstyle:check
- [ORC-757] - Add Hashtable implementation for dictionary
- [ORC-760] - Update spark to 3.1.1
- [ORC-761] - Replace MAINTAINER command with LABEL command in Dockerfile
- [ORC-766] - Generalize the docker scripts to handle build-args
- [ORC-767] - Add docker support for jdk 8 in debian 10
- [ORC-768] - Update commons-csv to 1.8
- [ORC-769] - Support ZSTD in ORC data benchmark
- [ORC-770] - Support ZSTD in Avro data benchmark
- [ORC-776] - Include source jars during publishing snapshot
- [ORC-777] - Make the vectorized row batch size configurable in MR record readers and writers
- [ORC-779] - Upgrade commons-cli to 1.4
- [ORC-780] - Add LZ4 Compression to the C++ Writer
- [ORC-791] - Upgrade guava test dependency to 30.1.1-jre
- [ORC-792] - Upgrade commons-lang to 3.12.0
- [ORC-796] - Upgrade apache parent pom version to the latest, 23
- [ORC-797] - Allow writers to get the stripe information
- [ORC-799] - Remove Ubuntu 16 docker test
- [ORC-800] - [ORC]if map.value is selected, map.key should be selected automatically to prevent segment fault.
- [ORC-801] - Clean up Logging
- [ORC-802] - Document Maven Version and mvnw
- [ORC-803] - MemoryManagerImpl Simplify removeWriter
- [ORC-806] - Upgrade to Apache POM 23
- [ORC-807] - Separate Jackson Versions in POM
- [ORC-808] - Update Spark to 3.1.2
- [ORC-812] - Simplify getClosestBufferSize in Writer
- [ORC-813] - Upgrade ZSTD to 1.5.0
- [ORC-818] - Build and test in Apple Silicon
- [ORC-821] - Use mvnw instead of mvn
- [ORC-823] - Upgrade maven-assembly-plugin to 3.3.0
- [ORC-848] - Recycle Internal Buffer in StringHashTableDictionary
- [ORC-849] - Core Benchmark Cleanup
- [ORC-893] - Remove junit-vintage-engine from shims module.
- [ORC-913] - Support data/format/compress options in Spark benchmark
- [ORC-921] - Add an encrypted example file
- [ORC-922] - Remove redundant conditional statements
- [ORC-927] - Extracting duplicate codes for RowFilterBenchmark
- [ORC-930] - Ignore unsupported JSON x ZSTD combination in bench
- [ORC-931] - Optimize RunLengthIntegerWriterV2 code for better readability
- [ORC-933] - extend the example with advanced reader
- [ORC-941] - Move MacOS 10.15 and 11.5 test from Travis to GitHub Action
- [ORC-943] - Add Intellij conf to support JIRA/PR autolinks
- [ORC-945] - Add OUTPUT_QUIET, ERROR_QUIET to suppress Java8
addopen
error messages - [ORC-970] - Reordering statements, improve readability in WriterImpl
- [ORC-976] - Optimize compute zigZagLiterals
- [ORC-984] - Save the software version that wrote each ORC file
Sub-task
- [ORC-599] - Bump guava version to 28.1-jre
- [ORC-663] - [C++] Support nanosecond in timestamp column statistics
- [ORC-713] - Add Java 15 test to github action
- [ORC-714] - Remove MRUnit dependency and its usage
- [ORC-715] - Add MapReduce test cases
- [ORC-718] - Enable Checkstyle plugin and FileTabCharacter rule.
- [ORC-719] - Enable UnusedImports.
- [ORC-720] - Run mvn checkstyle:check in GitHub action.
- [ORC-721] - Use
org.junit.Assert
instead of deprecatedjunit.framework.Assert
. - [ORC-723] - Upgrade Mockito to 3.7.0.
- [ORC-726] - Support Map type in
orc-tools convert
- [ORC-727] - Update
Java Tools
documentation - [ORC-728] - Support
head
command inJava Tools
- [ORC-733] - Upgrade Zookeeper from 3.4.x to 3.6.2
- [ORC-735] - ConvertTool should not fail at a single ORC file
- [ORC-738] - Add date type conversion support in
Java Tools
- [ORC-741] - Schema Evolution missing column is not handled in the presence of filters
- [ORC-742] - LazyIO of non-filter columns in the presence of filters
- [ORC-743] - Conversion of SArg into Filters, to take advantage of LazyIO
- [ORC-754] - Code cleanup
- [ORC-755] - Introduce OrcFilterContext
- [ORC-758] - Avoid decompressing compressed streams if already decompressed
- [ORC-759] - StructBatchReader should always skip processing on the rootReader
- [ORC-778] - Add "NewlineAtEndOfFile" checkstyle rule
- [ORC-783] - Add a checkstyle rule to prevent trailing white spaces.
- [ORC-795] - Add "LineLength" rule to checkstyle
- [ORC-811] - Benchmarks for Filters
- [ORC-814] - Build and test Java module on Apple Silicon
- [ORC-815] - Build and test C++ module on CLang12
- [ORC-816] - Rename and enable aarch64 profile automatically
- [ORC-820] - Add Java 16 to GitHub Action
- [ORC-822] - Add Java 17-ea to GitHub Action
- [ORC-839] - Fix head command for batch reader
- [ORC-851] - Fix CNFE in ORC tools uber jar to include required classes.
- [ORC-857] - Add OuterTypeFilename/UpperEll/ArrayTypeStyle checkstyle rules.
- [ORC-858] - Add NoLineWrap/OneStatementPerLine/NeedBraces checkstyle rules
- [ORC-859] - Update maven-checkstyle-plugin to 3.1.2.
- [ORC-866] - Reduce LineLength from 125 to 120
- [ORC-867] - Upgrade hive-storage-api to 2.8.1
- [ORC-871] -
orc-tools json-schema
fails at empty json file with EOFException - [ORC-882] - Remove hamcrest-core test dependency
- [ORC-886] - Add an integration test for ORC Java tools
- [ORC-889] - Remove orc-mapreduce build warnings due to overlapping resources
- [ORC-895] - Use snappy-java 1.1.8.4 in bench/core to support Apple Silicon
- [ORC-901] - Remove junit-vintage-engine from
mapreduce/tools
module - [ORC-905] - Add an integration test for
example
- [ORC-907] - Remove junit-vintage-engine from core module
- [ORC-909] - Remove commons-io 2.1 dependency
- [ORC-910] - Enforce maven-dependency-plugin
- [ORC-911] - Remove janino dependency in favor of Spark's transitive dependency
- [ORC-912] - Exclude Spark transitive avro/parquet dependency from Spark benchmark
- [ORC-917] - Bump mockito-core from 3.7.0 to 3.11.2
- [ORC-919] - Spark bench objenesis should be the same as Spark.
- [ORC-920] - Use junit.version and mockito.version property and bump junit to 5.7.2
- [ORC-924] - Add redundant modifier/modifier order checkstyle rules.
- [ORC-926] - Consolidate license header style in Java files.
- [ORC-928] - Bump checkstyle from 8.44 to 8.45.1
- [ORC-929] - Fix NaN at orc-tools 'meta' command
- [ORC-934] - Add integration tests for Java bench
- [ORC-939] - Remove threetenbp dependency
- [ORC-942] - Remove javax.xml.bind:jaxb-api dependency
- [ORC-944] - Add "RedundantImport" checkstyle rule
- [ORC-947] - Update coding guide to max line length 100 and enforce it.
- [ORC-950] - Bump aircompressor to 0.20
- [ORC-951] - Add
since
tag toorg.apache.orc.Reader
interface - [ORC-952] - Add
since
tag toorg.apache.orc.RecordReader
interface - [ORC-953] - Add
since
tag toorg.apache.orc.Writer
interface - [ORC-959] - C++ reader crash in resolving nested List columns for SearchArgument
- [ORC-960] - Create SearchArgument using column ids
- [ORC-971] - LESS_THAN_EQUALS doesn't handle the case when min=max
- [ORC-973] - [C++] Pr...