Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dump of reinit_list to 8 bytes when size_t is 8 bytes #11363

Closed
wants to merge 2 commits into from

Conversation

waTeim
Copy link
Contributor

@waTeim waTeim commented May 20, 2015

Not quite sure on this one. reinit_list.items is a void**, and in fact is a size_t. This is 8 bytes not 4 bytes on a 64 bit system. Just avoiding the compiler warning here and storing the full value. It does not appear to destabilize things; that's good

But is weird since I did not change ast.c fl_load_system_image. I note also this which makes me think the intention really is to save pos and a 1. Starting point maybe.

   if (jl_typeof(v) == jl_idtable_type) {
        arraylist_push(&reinit_list, (void*)pos);
        arraylist_push(&reinit_list, (void*)1);
   }

@@ -129,6 +129,18 @@ static jl_array_t *datatype_list=NULL; // (only used in MODE_SYSTEM_IMAGE)
#define write_int8(s, n) write_uint8(s, n)
#define read_int8(s) read_uint8(s)

static void write_int64(ios_t *s, int64_t i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use write_uint64 instead? (or at least implement this in a similar way?)

@waTeim
Copy link
Contributor Author

waTeim commented May 20, 2015

Yea, that was dumb. I looked for write_int64 but neglected to search for write_uint64. Also doesn't work; sysimage must have been prebuilt or something earlier (note how works on travis for 32 bit only), so I guess static void jl_load_sysimg_so() is the thing to look at?

@waTeim
Copy link
Contributor Author

waTeim commented May 20, 2015

Well this feels more correct. At least this loads and runs after make clean. Will see about Travis. This would necessarily make sysimage different on 32 and 64 bit systems, maybe it's that way already? I'm not sure about this on a big endian system.

@vtjnash
Copy link
Member

vtjnash commented May 22, 2015

The branches just add code noise. The values in the array are guaranteed by construction to fit in 32 bits.

@waTeim
Copy link
Contributor Author

waTeim commented May 24, 2015

Yea, but this is annoying to look at. Maybe a double cast then?

dump.c: In function ‘jl_save_system_image’:
dump.c:1459:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
         write_int32(&f, (int)reinit_list.items[i]);
                     ^
dump.c:1460:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
         write_int32(&f, (int)reinit_list.items[i+1]);
                     ^

@vtjnash
Copy link
Member

vtjnash commented May 24, 2015

yes, a double cast would be good to eliminate the warning

@ScottPJones
Copy link
Contributor

This troubles me - where is the documentation in the source that reinit_list.items are guaranteed to only hold 32-bit values? I looked, and didn't see anything. If they only can hold 32-bit values, why are they declared as something else?

@waTeim
Copy link
Contributor Author

waTeim commented Jun 16, 2015

From what I saw, the 2nd parameter is always a 0x1 (currently), so that will fit, but not sure about parameter 1.

The underlying issue is this field is a void**, but the value being saved is not that. Maybe a C union to make it explicit? There would not be any loss of speed.

I'm also worried about big-endian vs small endian machines. Casting something that is 8 bytes (size_t) to something that is 4 bytes (int) could be problematic especially when the type is void**

@vtjnash vtjnash closed this Jun 16, 2015
@ScottPJones
Copy link
Contributor

Casting isn't a problem, because the truncation happens on the value like in a register... using a union is,
because there you'd need a union with int32_t val; int32_t pad; for little endian, and int32_t pad; int32_t val; for big endian... (but using a union would avoid the need for the cast...

@ScottPJones
Copy link
Contributor

@vtjnash Why the close? Casting like this might hide the problem, but there is still an issue that the code lacks at least documentation to show that the cast is even safe to do.

@yuyichao
Copy link
Contributor

The first parameter seems to be a counter so it should also be fine

@yuyichao
Copy link
Contributor

And the fact that it is a void* is just how the array data structure is defined in C in order to hold all possible element.

@ScottPJones
Copy link
Contributor

@yuyichao, that's just very bad C programming, sorry! Esp. when the underlying assumptions don't seem to be documented in the code.

@carnaval
Copy link
Contributor

I agree that it deserves a comment in the source (the dump.c code is full of assumptions, it has grown quite organically), but I don't see how it is particularly bad C programming. C does not have a good way to implement polymorphic containers short of a macro nightmare, using void* to store integer seems reasonable to me.

@ScottPJones
Copy link
Contributor

Macro nightmare? It really just needs a union, something any C programmer should be able to handle.
This sort of code leads to incredibly hard to find types of bugs... where only certain rare values may get truncated silently. This sort of punning was totally disallowed anywhere I've worked.

@carnaval
Copy link
Contributor

It's this way because this structure is the default "simple list of pointers" used everywhere in the source. Using it to store small integers is fine (but could be documented) as far as I'm concerned.

I'm not sure what you want here ? a union { void*; int; } in place of void* ? But then you also have to be sure you use the union member consistently so isn't this exactly the same problem ?

@ScottPJones
Copy link
Contributor

Actually, I just looked at all the code that uses these arraylist functions... It would be pretty easy to fix this, have the arraylist functions take an element size, instead of just using void *, and the code would be a lot cleaner also, you'd simply define a struct for each the different things being stored via these functions... the structures would take less space also...

@carnaval
Copy link
Contributor

I don't care too much but I think it wouldn't make anything clearer, would add code, and gain negligible amount of space. I'll let other people decide on that.

@ScottPJones
Copy link
Contributor

It would be cleaner, certainly, and would make it clear exactly what, with what sizes, was being stored in these arrays, and allow the compiler to warn you if you try to store things that don't fit, instead of the silent truncation that will happen now if there are more than 2^31 items (the bottom bit is used, so it's 2^31 instead of 2^32).
The space isn't negligible, not if these ever get large... in this reinit case, it will take up half the memory.

@vtjnash
Copy link
Member

vtjnash commented Jun 16, 2015

closed because the underlying issue is fixed. other parts of the code will assert first (like OOM) long before this counter overflows, since this is a counter for the number of live objects allocations.

i also disagree that this idiom is uncommon. for example, it is a suggested use-case for a Gtk GList:
https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html

@ScottPJones
Copy link
Contributor

I didn't say whether it was common or not, there's a lot of bad code out there...
If so, I'd say that the GtK code is not really great doing that either... and just because some other package does it, doesn't mean that it is necessarily good.
Didn't somebody call that "argument from authority"?

@vtjnash
Copy link
Member

vtjnash commented Jun 16, 2015

for the Gtk use case, it gives predictable semantics (mostly important for allocation) without needing to know the type of the list. C is not a type-safe language anyhow, so a static declaration isn't really possible (although Gtk often does try to use its own class system so that it can do some runtime checks).

Didn't somebody call that "argument from authority"?

that would likely be called "argument by example" https://en.wikipedia.org/wiki/Argument_by_example? Calling it "argument by authority" would be more accurate if I wrote "Alice*, who is a major contributor to the Gtk project, thinks this is good idiom" (where in this case, the statement lacks any information about whether the Gtk project uses this idiom, or if it is even written in C for that matter) https://en.wikipedia.org/wiki/Argument_from_authority [*Alice is a purely ficticious name]

@waTeim
Copy link
Contributor Author

waTeim commented Jun 16, 2015

Not so sure about that. 2^32 is only 4 billion -- could it happen on this?

@vtjnash
Copy link
Member

vtjnash commented Jun 16, 2015

each of those objects must be a minimum of 16 bytes (and likely many times that), so it would require at least 32 GB of sysimg data to hit the limit. it's not impossible, but the julia .ji serialization file format currently doesn't allow it to happen.

@ScottPJones
Copy link
Contributor

It's breaks at 2^31, not 2^32, and I've had code that had more than 2 billion objects, easily... (terabytes of memory, many many cores...)

@ScottPJones
Copy link
Contributor

@vtjnash I'll shortly be able to show you what I mean, with a real PR that can clean up some of this...
GtK still could have had something more flexible, and cleaner, than dumping integers into void *.

@yuyichao
Copy link
Contributor

Sorry to interupt but are you chaning the definition of arraylist_t or where is the reinit_tmp defined/stored?

@ScottPJones
Copy link
Contributor

OK, each different array type would have it's own structure. For example:

struct reinit_str {
    int32_t     pos; // Position shifted up by 1 bit from backref_table_numel, bottom bit is module flag
    int32_t     ind; // Currently always set to 1???
};

This makes it clear just what is being stored in the array (and by doing so, I've determined that instead of 16 bytes, only 4 bytes are needed for each entry (the second part is always set to 1, for some undocumented reason).
arraylist_t is only changed minimally (the addition of an elesz field, and I could even have functions that work exactly the same, saving void *, until everything has been changed to use the new arraylist_new (that takes an element size), arraylist_pop (that takes a void * to where to copy the return value), arraylist_push (passed a pointer to what gets saved, instead of the value to be saved).
The current implementation also seems like it might be liable to have leaks... when arraylist_new is called, it doesn't make sure that the previous, possibly 'malloc'ed array was freed...

For the finalizer list, it's just making 1 call instead of two, not a big difference probably, but it does make things clearer, and the structure is longer defined by the order of the push operations.

@ScottPJones
Copy link
Contributor

About the finalizers, and the performance of running them, why isn't that a simple hash table? Should be simple enough to do...

@carnaval
Copy link
Contributor

it was a hashtable before. the cost of hashing on finalizer attachment was measurably way too high, especially since manually finalizing is not used much, compared to the base use case of creating a gazillion BigFloats and killing them at the next young gen collection.

As we already discussed somewhere, we could revamp this whole thing and make both efficient but short of that, the manual finalizing is not the priority use case. If you control the object it's easy enough to add a finalized flag and a finalize! method that the finalizer calls. The only problematic case is when you want to attach a finalizer to an object you don't control.

@ScottPJones
Copy link
Contributor

I'm surprised that there was a problem, a few shifts and masks of the value o to make a "hash" value, would have sufficed to make this code in particular a lot faster.

    for(int i = 0; i < finalizer_list.len; i+=2) {
        if (o == (jl_value_t*)finalizer_list.items[i]) {
            f = (jl_value_t*)finalizer_list.items[i+1];
            if (i < finalizer_list.len - 2) {
                finalizer_list.items[i] = finalizer_list.items[finalizer_list.len-2];
                finalizer_list.items[i+1] = finalizer_list.items[finalizer_list.len-1];
                i -= 2;
            }
            finalizer_list.len -= 2;
            run_finalizer(o, f);
            success = 1;
        }
    }

@ScottPJones
Copy link
Contributor

@carnaval: This is not true, in my use cases, i.e. use cases dealing with objects created by C/C++.

especially since manually finalizing is not used much, compared to the base use case of creating a gazillion BigFloats and killing them at the next young gen collection.

Also I believe @quinnj has been using finalize for Mmap.array.

@carnaval
Copy link
Contributor

Come on, I told you I am aware that this is slow, I never said no one used it, I said that for now, since the BigFloat thing is much more perf sensitive, it has been given priority. If you want to revamp the finalizer system, be my guest I'd love this, but I won't have time to do it in the near future.

Ref discussion starting around #8699 (comment)

@carnaval
Copy link
Contributor

An easy hack around this slowness could be to have a hashtable for some objects when you ask explicitely for it, when you know you will be calling finalize() on something.

@ScottPJones
Copy link
Contributor

@carnaval Please, this wasn't meant to be anything but constructive comments, on how this can be improved. I'm not trying to pressure anybody to change anything, if I see a need, as you've seen, I'll try to do it myself (I might ask for clarification on some points, if you are the expert on this part of julia).
Also, remember! I think julia is absolutely wonderful, and the people who have created it, and those who are contributing to it, are geniuses.
I do understand about prioritizing, and simply not having enough hours in the day to get everything you want improved done...

@ScottPJones
Copy link
Contributor

@carnaval I ran across this also in a couple places:

    int np = jl_gc_n_preserved_values();
    value_t arg = julia_to_scm(expr);
    value_t e = fl_applyn(1, symbol_value(symbol("jl-expand-to-thunk")), arg);
    jl_value_t *result = scm_to_julia(e,0);
    while (jl_gc_n_preserved_values() > np) {
        jl_gc_unpreserve();
    }

where 'jl_gc_unpreserve just pops the top value on the preserve_list arraylist_t, and throws it away.
It seems like what is really needed here is a jl_gc_pop_back_to(np) function, that would be much more efficient, simply resetting the end of the end of the preserved list.

@carnaval
Copy link
Contributor

Sure, then again "much more efficient" is probably not very important here since the total overhead is a few instructions * number_of_macro_expanded. I would be very surprised if you could measure the difference. It may make sense to change the API for the preserve thing since it is only used here, I don't have any strong feeling.

@ScottPJones
Copy link
Contributor

Yes, much more efficient in the micro sense of this particular function, not overall performance.
I don't know enough about how frequently used these functions are, or how many things get popped
here, to tell what effect it might have on overall performance.
(You might have noticed I tend to have a little bit of OCD tendencies about performance! 😀)

@ScottPJones
Copy link
Contributor

I've also found a number of functions that are DLLEXPORTed, implying that they are part of an API exposed to at least the code in Base, but that are not used anywhere in julia code.
In the jl_gc_* functions, out of 23, only 10 are used in julia or registered packages.
Should all of these really be marked as DLLEXPORT?

DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void)

@vtjnash
Copy link
Member

vtjnash commented Jun 18, 2015

most of those are directly used by sys.so or JIT code (inserted by codegen), a few might be extraneous

@yuyichao
Copy link
Contributor

Personally I'd prefer the alloc family to be prefixed with jl_.

@ScottPJones
Copy link
Contributor

Yes, that's why I missed them at first, I was looking for the jl_ tag to find usage, but just looked for the DLLEXPORT in making the above list... (this is where a cross-reference tool, that worked across C, C++, and Julia code, would be nice!)

That still leaves:

DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void)

that don't seem to need to be exported, and:

DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void)

that don't seem to follow the naming convention of jl_gc_ (for the functions in this file).

@tkelman
Copy link
Contributor

tkelman commented Jun 18, 2015

we should certainly prefix everything we export from the dll, not doing that is asking for name conflicts for embedders

@ScottPJones
Copy link
Contributor

Do you think it would break anything to make all of those functions follow the convention of jl_gc_?
If not, I'll go ahead and change them in a PR...

@yuyichao
Copy link
Contributor

Just remember to update the names (strings) in codegen as well.

@tkelman
Copy link
Contributor

tkelman commented Jun 18, 2015

and in src/julia.expmap

@ScottPJones
Copy link
Contributor

@yuyichao Yes, of course (updating codegen) (I was already going through codegen.cpp, and found some more that are problematical)
@tkelman Thanks, I wasn't aware of src/julia.expmap

@yuyichao
Copy link
Contributor

And as you are there, there's also gc_wb and friends

@ScottPJones
Copy link
Contributor

As far as I can tell, only gc_wb_slow needs to have it's name changed (it's the only one exported).
I can go ahead and change all of them though, be more consistent...

@waTeim
Copy link
Contributor Author

waTeim commented Jun 18, 2015

wait wait! This will totally break me, I'm frantically getting the list together of all of the symbols I use directly, ... .it's a long list. Will be about 10 minutes ...

@yuyichao
Copy link
Contributor

gc_wb_slow is actually not used anywhere (last time I checked nowhere is emitting the slow path).

gc_wb is actually an API and is even in the embedding doc so it can cause name collision.

@yuyichao
Copy link
Contributor

@waTeim Yes, any change of these names will be breaking.

@tkelman
Copy link
Contributor

tkelman commented Jun 18, 2015

They're certainly exported from the windows dll.
image

While breaking, these never should have been exported with non-prefixed names to begin with. I think there's even a comment that says so somewhere.

This really needs to move to its own issue though, it has nothing to do with this PR. (@ScottPJones totally unrelated questions should probably go to the mailing list first.)

@waTeim
Copy link
Contributor Author

waTeim commented Jun 18, 2015

that's fine, move it to it's own issue, tell us embedders what it is -- announce it to dev, give us time to react and determine a way to be compatible and we can manage.

That will give me time to put my symbol list together.

@ScottPJones
Copy link
Contributor

Sorry, @tkelman! I'll have a PR for this shortly, and people can play around with it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants