-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Overhauled to Arrow Back-End and Better Memory Safety (#78)
* Fixed #70. * Initial cleanup. * Split into multiple files. * Moved Arrow.jl to its own directory. * Fixed method ambiguity in getmetadata. * Initial implementation with arrow backend. * Fixed errors; now works for basic bits types and strings. * Now correctly implement datetime. * Rewrote column constructors to be reasonable and sane. * Dict encoding now working. * Started writing sink. * Reads now work with new version of Arrow. * Continuing to work on sinks. * Everything works except dictionary encoding, which is currently completely fucked on write side. * Most column types now supported. * Most functionality now properly implemented. * Finally supports bools! * Trying to fixed DictEncoding but it's still fucked up. * Finally completely fixed DictEncoding. * Fixed unit testing. * Removed old reference files. * Added some materialize methods. * Removed old comment about DictEncoding being fucked up. * Removed old reference file fileio.jl * Removed spurious comment. * Removed spurious comment. * Tried to fix appveyor yaml. * Tried to fix appveyor yaml. * DictEncoding now works for non Int32, cleaned up some things. * Added a materialize method for a DataFrame. * Updated for 0.7. * Fixed scary metadata bug. * Replaced uninitialized with undef. * Fixed file potential file validation bug. * Updated for new Arrow locator interface. * Cleaned up Source functions a bit. * Started adding extra tests. * Removed references to pre 0.6 in README. * Added more unit tests. * Removed explicit Arrow clone commands from travis and appveyor (now that it's registered). * Small fixes. * Fixed breaking test due to poor type inference on 0.6.
- Loading branch information
1 parent
03a19c4
commit a2558d1
Showing
101 changed files
with
585 additions
and
640 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
julia 0.6 | ||
Arrow | ||
FlatBuffers 0.3.0 | ||
CategoricalArrays 0.3.0 | ||
DataFrames 0.11.0 | ||
DataStreams 0.3.0 | ||
WeakRefStrings 0.4.0 | ||
Compat 0.63.0 |
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
|
||
getoutputlength(version::Int32, x::Integer) = version < FEATHER_VERSION ? x : padding(x) | ||
|
||
function validatefile(filename::AbstractString, data::AbstractVector{UInt8}) | ||
if length(data) < MIN_FILE_LENGTH | ||
throw(ArgumentError("'$file' is not in feather format: total length of file: $(length(data))")) | ||
end | ||
header = data[1:4] | ||
footer = data[(end-3):end] | ||
if header ≠ FEATHER_MAGIC_BYTES || footer ≠ FEATHER_MAGIC_BYTES | ||
throw(ArgumentError(string("'$filename' is not in feather format: header = $header, ", | ||
"footer = $footer."))) | ||
end | ||
end | ||
|
||
function loadfile(filename::AbstractString; use_mmap::Bool=SHOULD_USE_MMAP) | ||
isfile(filename) || throw(ArgumentError("'$file' is not a valid file.")) | ||
data = SHOULD_USE_MMAP ? Mmap.mmap(filename) : read(filename) | ||
validatefile(filename, data) | ||
data | ||
end | ||
|
||
function metalength(data::AbstractVector{UInt8}) | ||
read(IOBuffer(data[(length(data)-7):(length(data)-4)]), Int32) | ||
end | ||
|
||
function metaposition(data::AbstractVector{UInt8}, metalen::Integer=metalength(data)) | ||
length(data) - (metalen+7) | ||
end | ||
|
||
function rootposition(data::AbstractVector{UInt8}, mpos::Integer=metaposition(data)) | ||
read(IOBuffer(data[mpos:(mpos+4)]), Int32) | ||
end | ||
|
||
function getctable(data::AbstractVector{UInt8}) | ||
metapos = metaposition(data) | ||
rootpos = rootposition(data, metapos) | ||
ctable = FlatBuffers.read(Metadata.CTable, data, metapos + rootpos - 1) | ||
if ctable.version < FEATHER_VERSION | ||
@warn("This feather file is old and may not be readable.") | ||
end | ||
ctable | ||
end | ||
|
||
|
||
function Data.schema(ctable::Metadata.CTable) | ||
ncols = length(ctable.columns) | ||
header = Vector{String}(undef, ncols) | ||
types = Vector{Type}(undef, ncols) | ||
for (i, col) ∈ enumerate(ctable.columns) | ||
header[i] = col.name | ||
types[i] = juliatype(col) | ||
end | ||
Data.Schema(types, header, ctable.num_rows) | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.