-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change dump of reinit_list to 8 bytes when size_t is 8 bytes #11363
Conversation
@@ -129,6 +129,18 @@ static jl_array_t *datatype_list=NULL; // (only used in MODE_SYSTEM_IMAGE) | |||
#define write_int8(s, n) write_uint8(s, n) | |||
#define read_int8(s) read_uint8(s) | |||
|
|||
static void write_int64(ios_t *s, int64_t i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use write_uint64
instead? (or at least implement this in a similar way?)
Yea, that was dumb. I looked for write_int64 but neglected to search for write_uint64. Also doesn't work; sysimage must have been prebuilt or something earlier (note how works on travis for 32 bit only), so I guess static void jl_load_sysimg_so() is the thing to look at? |
Well this feels more correct. At least this loads and runs after make clean. Will see about Travis. This would necessarily make sysimage different on 32 and 64 bit systems, maybe it's that way already? I'm not sure about this on a big endian system. |
The branches just add code noise. The values in the array are guaranteed by construction to fit in 32 bits. |
Yea, but this is annoying to look at. Maybe a double cast then?
|
yes, a double cast would be good to eliminate the warning |
This troubles me - where is the documentation in the source that reinit_list.items are guaranteed to only hold 32-bit values? I looked, and didn't see anything. If they only can hold 32-bit values, why are they declared as something else? |
From what I saw, the 2nd parameter is always a 0x1 (currently), so that will fit, but not sure about parameter 1. The underlying issue is this field is a void**, but the value being saved is not that. Maybe a C union to make it explicit? There would not be any loss of speed. I'm also worried about big-endian vs small endian machines. Casting something that is 8 bytes (size_t) to something that is 4 bytes (int) could be problematic especially when the type is void** |
Casting isn't a problem, because the truncation happens on the value like in a register... using a union is, |
@vtjnash Why the close? Casting like this might hide the problem, but there is still an issue that the code lacks at least documentation to show that the cast is even safe to do. |
The first parameter seems to be a counter so it should also be fine |
And the fact that it is a |
@yuyichao, that's just very bad C programming, sorry! Esp. when the underlying assumptions don't seem to be documented in the code. |
I agree that it deserves a comment in the source (the dump.c code is full of assumptions, it has grown quite organically), but I don't see how it is particularly bad C programming. C does not have a good way to implement polymorphic containers short of a macro nightmare, using void* to store integer seems reasonable to me. |
Macro nightmare? It really just needs a union, something any C programmer should be able to handle. |
It's this way because this structure is the default "simple list of pointers" used everywhere in the source. Using it to store small integers is fine (but could be documented) as far as I'm concerned. I'm not sure what you want here ? a |
Actually, I just looked at all the code that uses these arraylist functions... It would be pretty easy to fix this, have the arraylist functions take an element size, instead of just using void *, and the code would be a lot cleaner also, you'd simply define a struct for each the different things being stored via these functions... the structures would take less space also... |
I don't care too much but I think it wouldn't make anything clearer, would add code, and gain negligible amount of space. I'll let other people decide on that. |
It would be cleaner, certainly, and would make it clear exactly what, with what sizes, was being stored in these arrays, and allow the compiler to warn you if you try to store things that don't fit, instead of the silent truncation that will happen now if there are more than 2^31 items (the bottom bit is used, so it's 2^31 instead of 2^32). |
closed because the underlying issue is fixed. other parts of the code will assert first (like OOM) long before this counter overflows, since this is a counter for the number of live objects allocations. i also disagree that this idiom is uncommon. for example, it is a suggested use-case for a Gtk GList: |
I didn't say whether it was common or not, there's a lot of bad code out there... |
for the Gtk use case, it gives predictable semantics (mostly important for allocation) without needing to know the type of the list. C is not a type-safe language anyhow, so a static declaration isn't really possible (although Gtk often does try to use its own class system so that it can do some runtime checks).
that would likely be called "argument by example" https://en.wikipedia.org/wiki/Argument_by_example? Calling it "argument by authority" would be more accurate if I wrote "Alice*, who is a major contributor to the Gtk project, thinks this is good idiom" (where in this case, the statement lacks any information about whether the Gtk project uses this idiom, or if it is even written in C for that matter) https://en.wikipedia.org/wiki/Argument_from_authority [*Alice is a purely ficticious name] |
Not so sure about that. 2^32 is only 4 billion -- could it happen on this? |
each of those objects must be a minimum of 16 bytes (and likely many times that), so it would require at least 32 GB of sysimg data to hit the limit. it's not impossible, but the julia |
It's breaks at 2^31, not 2^32, and I've had code that had more than 2 billion objects, easily... (terabytes of memory, many many cores...) |
@vtjnash I'll shortly be able to show you what I mean, with a real PR that can clean up some of this... |
Sorry to interupt but are you chaning the definition of |
OK, each different array type would have it's own structure. For example: struct reinit_str {
int32_t pos; // Position shifted up by 1 bit from backref_table_numel, bottom bit is module flag
int32_t ind; // Currently always set to 1???
}; This makes it clear just what is being stored in the array (and by doing so, I've determined that instead of 16 bytes, only 4 bytes are needed for each entry (the second part is always set to 1, for some undocumented reason). For the finalizer list, it's just making 1 call instead of two, not a big difference probably, but it does make things clearer, and the structure is longer defined by the order of the push operations. |
About the finalizers, and the performance of running them, why isn't that a simple hash table? Should be simple enough to do... |
it was a hashtable before. the cost of hashing on finalizer attachment was measurably way too high, especially since manually finalizing is not used much, compared to the base use case of creating a gazillion BigFloats and killing them at the next young gen collection. As we already discussed somewhere, we could revamp this whole thing and make both efficient but short of that, the manual finalizing is not the priority use case. If you control the object it's easy enough to add a finalized flag and a finalize! method that the finalizer calls. The only problematic case is when you want to attach a finalizer to an object you don't control. |
I'm surprised that there was a problem, a few shifts and masks of the value for(int i = 0; i < finalizer_list.len; i+=2) {
if (o == (jl_value_t*)finalizer_list.items[i]) {
f = (jl_value_t*)finalizer_list.items[i+1];
if (i < finalizer_list.len - 2) {
finalizer_list.items[i] = finalizer_list.items[finalizer_list.len-2];
finalizer_list.items[i+1] = finalizer_list.items[finalizer_list.len-1];
i -= 2;
}
finalizer_list.len -= 2;
run_finalizer(o, f);
success = 1;
}
} |
@carnaval: This is not true, in my use cases, i.e. use cases dealing with objects created by C/C++.
Also I believe @quinnj has been using finalize for |
Come on, I told you I am aware that this is slow, I never said no one used it, I said that for now, since the BigFloat thing is much more perf sensitive, it has been given priority. If you want to revamp the finalizer system, be my guest I'd love this, but I won't have time to do it in the near future. Ref discussion starting around #8699 (comment) |
An easy hack around this slowness could be to have a hashtable for some objects when you ask explicitely for it, when you know you will be calling finalize() on something. |
@carnaval Please, this wasn't meant to be anything but constructive comments, on how this can be improved. I'm not trying to pressure anybody to change anything, if I see a need, as you've seen, I'll try to do it myself (I might ask for clarification on some points, if you are the expert on this part of julia). |
@carnaval I ran across this also in a couple places: int np = jl_gc_n_preserved_values();
value_t arg = julia_to_scm(expr);
value_t e = fl_applyn(1, symbol_value(symbol("jl-expand-to-thunk")), arg);
jl_value_t *result = scm_to_julia(e,0);
while (jl_gc_n_preserved_values() > np) {
jl_gc_unpreserve();
} where |
Sure, then again "much more efficient" is probably not very important here since the total overhead is a few instructions * number_of_macro_expanded. I would be very surprised if you could measure the difference. It may make sense to change the API for the preserve thing since it is only used here, I don't have any strong feeling. |
Yes, much more efficient in the micro sense of this particular function, not overall performance. |
I've also found a number of functions that are DLLEXPORTed, implying that they are part of an API exposed to at least the code in Base, but that are not used anywhere in julia code. DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void) |
most of those are directly used by |
Personally I'd prefer the |
Yes, that's why I missed them at first, I was looking for the That still leaves: DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void) that don't seem to need to be exported, and: DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void) that don't seem to follow the naming convention of |
we should certainly prefix everything we export from the dll, not doing that is asking for name conflicts for embedders |
Do you think it would break anything to make all of those functions follow the convention of |
Just remember to update the names (strings) in codegen as well. |
and in |
And as you are there, there's also |
As far as I can tell, only |
wait wait! This will totally break me, I'm frantically getting the list together of all of the symbols I use directly, ... .it's a long list. Will be about 10 minutes ... |
|
@waTeim Yes, any change of these names will be breaking. |
They're certainly exported from the windows dll. While breaking, these never should have been exported with non-prefixed names to begin with. I think there's even a comment that says so somewhere. This really needs to move to its own issue though, it has nothing to do with this PR. (@ScottPJones totally unrelated questions should probably go to the mailing list first.) |
that's fine, move it to it's own issue, tell us embedders what it is -- announce it to dev, give us time to react and determine a way to be compatible and we can manage. That will give me time to put my symbol list together. |
Sorry, @tkelman! I'll have a PR for this shortly, and people can play around with it... |
Not quite sure on this one. reinit_list.items is a void**, and in fact is a size_t. This is 8 bytes not 4 bytes on a 64 bit system. Just avoiding the compiler warning here and storing the full value. It does not appear to destabilize things; that's good
But is weird since I did not change ast.c fl_load_system_image. I note also this which makes me think the intention really is to save pos and a 1. Starting point maybe.