Change dump of reinit_list to 8 bytes when size_t is 8 bytes #11363

waTeim · 2015-05-20T00:31:14Z

Not quite sure on this one. reinit_list.items is a void**, and in fact is a size_t. This is 8 bytes not 4 bytes on a 64 bit system. Just avoiding the compiler warning here and storing the full value. It does not appear to destabilize things; that's good

But is weird since I did not change ast.c fl_load_system_image. I note also this which makes me think the intention really is to save pos and a 1. Starting point maybe.

   if (jl_typeof(v) == jl_idtable_type) {
        arraylist_push(&reinit_list, (void*)pos);
        arraylist_push(&reinit_list, (void*)1);
   }

yuyichao · 2015-05-20T00:45:20Z

src/dump.c

@@ -129,6 +129,18 @@ static jl_array_t *datatype_list=NULL; // (only used in MODE_SYSTEM_IMAGE)
 #define write_int8(s, n) write_uint8(s, n)
 #define read_int8(s) read_uint8(s)

+static void write_int64(ios_t *s, int64_t i)


Use write_uint64 instead? (or at least implement this in a similar way?)

waTeim · 2015-05-20T01:20:39Z

Yea, that was dumb. I looked for write_int64 but neglected to search for write_uint64. Also doesn't work; sysimage must have been prebuilt or something earlier (note how works on travis for 32 bit only), so I guess static void jl_load_sysimg_so() is the thing to look at?

waTeim · 2015-05-20T02:16:31Z

Well this feels more correct. At least this loads and runs after make clean. Will see about Travis. This would necessarily make sysimage different on 32 and 64 bit systems, maybe it's that way already? I'm not sure about this on a big endian system.

vtjnash · 2015-05-22T17:00:47Z

The branches just add code noise. The values in the array are guaranteed by construction to fit in 32 bits.

waTeim · 2015-05-24T01:56:00Z

Yea, but this is annoying to look at. Maybe a double cast then?

dump.c: In function ‘jl_save_system_image’:
dump.c:1459:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
         write_int32(&f, (int)reinit_list.items[i]);
                     ^
dump.c:1460:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
         write_int32(&f, (int)reinit_list.items[i+1]);
                     ^

vtjnash · 2015-05-24T12:27:18Z

yes, a double cast would be good to eliminate the warning

ScottPJones · 2015-06-16T13:28:57Z

This troubles me - where is the documentation in the source that reinit_list.items are guaranteed to only hold 32-bit values? I looked, and didn't see anything. If they only can hold 32-bit values, why are they declared as something else?

waTeim · 2015-06-16T17:38:58Z

From what I saw, the 2nd parameter is always a 0x1 (currently), so that will fit, but not sure about parameter 1.

The underlying issue is this field is a void**, but the value being saved is not that. Maybe a C union to make it explicit? There would not be any loss of speed.

I'm also worried about big-endian vs small endian machines. Casting something that is 8 bytes (size_t) to something that is 4 bytes (int) could be problematic especially when the type is void**

ScottPJones · 2015-06-16T17:43:05Z

Casting isn't a problem, because the truncation happens on the value like in a register... using a union is,
because there you'd need a union with int32_t val; int32_t pad; for little endian, and int32_t pad; int32_t val; for big endian... (but using a union would avoid the need for the cast...

ScottPJones · 2015-06-16T17:45:03Z

@vtjnash Why the close? Casting like this might hide the problem, but there is still an issue that the code lacks at least documentation to show that the cast is even safe to do.

yuyichao · 2015-06-16T17:46:03Z

The first parameter seems to be a counter so it should also be fine

yuyichao · 2015-06-16T17:47:34Z

And the fact that it is a void* is just how the array data structure is defined in C in order to hold all possible element.

ScottPJones · 2015-06-16T17:49:59Z

@yuyichao, that's just very bad C programming, sorry! Esp. when the underlying assumptions don't seem to be documented in the code.

carnaval · 2015-06-16T17:53:51Z

I agree that it deserves a comment in the source (the dump.c code is full of assumptions, it has grown quite organically), but I don't see how it is particularly bad C programming. C does not have a good way to implement polymorphic containers short of a macro nightmare, using void* to store integer seems reasonable to me.

ScottPJones · 2015-06-16T17:59:36Z

Macro nightmare? It really just needs a union, something any C programmer should be able to handle.
This sort of code leads to incredibly hard to find types of bugs... where only certain rare values may get truncated silently. This sort of punning was totally disallowed anywhere I've worked.

carnaval · 2015-06-16T18:06:49Z

It's this way because this structure is the default "simple list of pointers" used everywhere in the source. Using it to store small integers is fine (but could be documented) as far as I'm concerned.

I'm not sure what you want here ? a union { void*; int; } in place of void* ? But then you also have to be sure you use the union member consistently so isn't this exactly the same problem ?

ScottPJones · 2015-06-16T18:15:19Z

Actually, I just looked at all the code that uses these arraylist functions... It would be pretty easy to fix this, have the arraylist functions take an element size, instead of just using void *, and the code would be a lot cleaner also, you'd simply define a struct for each the different things being stored via these functions... the structures would take less space also...

carnaval · 2015-06-16T18:20:47Z

I don't care too much but I think it wouldn't make anything clearer, would add code, and gain negligible amount of space. I'll let other people decide on that.

ScottPJones · 2015-06-16T18:28:52Z

It would be cleaner, certainly, and would make it clear exactly what, with what sizes, was being stored in these arrays, and allow the compiler to warn you if you try to store things that don't fit, instead of the silent truncation that will happen now if there are more than 2^31 items (the bottom bit is used, so it's 2^31 instead of 2^32).
The space isn't negligible, not if these ever get large... in this reinit case, it will take up half the memory.

vtjnash · 2015-06-16T18:29:52Z

closed because the underlying issue is fixed. other parts of the code will assert first (like OOM) long before this counter overflows, since this is a counter for the number of live objects allocations.

i also disagree that this idiom is uncommon. for example, it is a suggested use-case for a Gtk GList:
https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html

ScottPJones · 2015-06-16T18:32:11Z

I didn't say whether it was common or not, there's a lot of bad code out there...
If so, I'd say that the GtK code is not really great doing that either... and just because some other package does it, doesn't mean that it is necessarily good.
Didn't somebody call that "argument from authority"?

vtjnash · 2015-06-16T18:50:22Z

for the Gtk use case, it gives predictable semantics (mostly important for allocation) without needing to know the type of the list. C is not a type-safe language anyhow, so a static declaration isn't really possible (although Gtk often does try to use its own class system so that it can do some runtime checks).

Didn't somebody call that "argument from authority"?

that would likely be called "argument by example" https://en.wikipedia.org/wiki/Argument_by_example? Calling it "argument by authority" would be more accurate if I wrote "Alice*, who is a major contributor to the Gtk project, thinks this is good idiom" (where in this case, the statement lacks any information about whether the Gtk project uses this idiom, or if it is even written in C for that matter) https://en.wikipedia.org/wiki/Argument_from_authority [*Alice is a purely ficticious name]

waTeim · 2015-06-16T18:50:40Z

Not so sure about that. 2^32 is only 4 billion -- could it happen on this?

vtjnash · 2015-06-16T18:55:13Z

each of those objects must be a minimum of 16 bytes (and likely many times that), so it would require at least 32 GB of sysimg data to hit the limit. it's not impossible, but the julia .ji serialization file format currently doesn't allow it to happen.

ScottPJones · 2015-06-16T18:55:34Z

It's breaks at 2^31, not 2^32, and I've had code that had more than 2 billion objects, easily... (terabytes of memory, many many cores...)

ScottPJones · 2015-06-16T18:57:23Z

@vtjnash I'll shortly be able to show you what I mean, with a real PR that can clean up some of this...
GtK still could have had something more flexible, and cleaner, than dumping integers into void *.

yuyichao · 2015-06-16T19:40:57Z

Sorry to interupt but are you chaning the definition of arraylist_t or where is the reinit_tmp defined/stored?

ScottPJones · 2015-06-16T20:42:36Z

OK, each different array type would have it's own structure. For example:

struct reinit_str {
    int32_t     pos; // Position shifted up by 1 bit from backref_table_numel, bottom bit is module flag
    int32_t     ind; // Currently always set to 1???
};

This makes it clear just what is being stored in the array (and by doing so, I've determined that instead of 16 bytes, only 4 bytes are needed for each entry (the second part is always set to 1, for some undocumented reason).
arraylist_t is only changed minimally (the addition of an elesz field, and I could even have functions that work exactly the same, saving void *, until everything has been changed to use the new arraylist_new (that takes an element size), arraylist_pop (that takes a void * to where to copy the return value), arraylist_push (passed a pointer to what gets saved, instead of the value to be saved).
The current implementation also seems like it might be liable to have leaks... when arraylist_new is called, it doesn't make sure that the previous, possibly 'malloc'ed array was freed...

For the finalizer list, it's just making 1 call instead of two, not a big difference probably, but it does make things clearer, and the structure is longer defined by the order of the push operations.

ScottPJones · 2015-06-16T20:50:37Z

About the finalizers, and the performance of running them, why isn't that a simple hash table? Should be simple enough to do...

carnaval · 2015-06-16T20:55:21Z

it was a hashtable before. the cost of hashing on finalizer attachment was measurably way too high, especially since manually finalizing is not used much, compared to the base use case of creating a gazillion BigFloats and killing them at the next young gen collection.

As we already discussed somewhere, we could revamp this whole thing and make both efficient but short of that, the manual finalizing is not the priority use case. If you control the object it's easy enough to add a finalized flag and a finalize! method that the finalizer calls. The only problematic case is when you want to attach a finalizer to an object you don't control.

ScottPJones · 2015-06-16T21:01:55Z

I'm surprised that there was a problem, a few shifts and masks of the value o to make a "hash" value, would have sufficed to make this code in particular a lot faster.

    for(int i = 0; i < finalizer_list.len; i+=2) {
        if (o == (jl_value_t*)finalizer_list.items[i]) {
            f = (jl_value_t*)finalizer_list.items[i+1];
            if (i < finalizer_list.len - 2) {
                finalizer_list.items[i] = finalizer_list.items[finalizer_list.len-2];
                finalizer_list.items[i+1] = finalizer_list.items[finalizer_list.len-1];
                i -= 2;
            }
            finalizer_list.len -= 2;
            run_finalizer(o, f);
            success = 1;
        }
    }

ScottPJones · 2015-06-16T21:05:09Z

@carnaval: This is not true, in my use cases, i.e. use cases dealing with objects created by C/C++.

especially since manually finalizing is not used much, compared to the base use case of creating a gazillion BigFloats and killing them at the next young gen collection.

Also I believe @quinnj has been using finalize for Mmap.array.

carnaval · 2015-06-16T21:07:52Z

Come on, I told you I am aware that this is slow, I never said no one used it, I said that for now, since the BigFloat thing is much more perf sensitive, it has been given priority. If you want to revamp the finalizer system, be my guest I'd love this, but I won't have time to do it in the near future.

Ref discussion starting around #8699 (comment)

carnaval · 2015-06-16T21:09:51Z

An easy hack around this slowness could be to have a hashtable for some objects when you ask explicitely for it, when you know you will be calling finalize() on something.

ScottPJones · 2015-06-16T21:28:53Z

@carnaval Please, this wasn't meant to be anything but constructive comments, on how this can be improved. I'm not trying to pressure anybody to change anything, if I see a need, as you've seen, I'll try to do it myself (I might ask for clarification on some points, if you are the expert on this part of julia).
Also, remember! I think julia is absolutely wonderful, and the people who have created it, and those who are contributing to it, are geniuses.
I do understand about prioritizing, and simply not having enough hours in the day to get everything you want improved done...

ScottPJones · 2015-06-17T10:41:16Z

@carnaval I ran across this also in a couple places:

    int np = jl_gc_n_preserved_values();
    value_t arg = julia_to_scm(expr);
    value_t e = fl_applyn(1, symbol_value(symbol("jl-expand-to-thunk")), arg);
    jl_value_t *result = scm_to_julia(e,0);
    while (jl_gc_n_preserved_values() > np) {
        jl_gc_unpreserve();
    }

where 'jl_gc_unpreserve just pops the top value on the preserve_list arraylist_t, and throws it away.
It seems like what is really needed here is a jl_gc_pop_back_to(np) function, that would be much more efficient, simply resetting the end of the end of the preserved list.

carnaval · 2015-06-17T13:00:32Z

Sure, then again "much more efficient" is probably not very important here since the total overhead is a few instructions * number_of_macro_expanded. I would be very surprised if you could measure the difference. It may make sense to change the API for the preserve thing since it is only used here, I don't have any strong feeling.

ScottPJones · 2015-06-17T13:11:12Z

Yes, much more efficient in the micro sense of this particular function, not overall performance.
I don't know enough about how frequently used these functions are, or how many things get popped
here, to tell what effect it might have on overall performance.
(You might have noticed I tend to have a little bit of OCD tendencies about performance! 😀)

ScottPJones · 2015-06-18T00:07:46Z

I've also found a number of functions that are DLLEXPORTed, implying that they are part of an API exposed to at least the code in Base, but that are not used anywhere in julia code.
In the jl_gc_* functions, out of 23, only 10 are used in julia or registered packages.
Should all of these really be marked as DLLEXPORT?

DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void)

vtjnash · 2015-06-18T00:13:28Z

most of those are directly used by sys.so or JIT code (inserted by codegen), a few might be extraneous

yuyichao · 2015-06-18T00:35:20Z

Personally I'd prefer the alloc family to be prefixed with jl_.

ScottPJones · 2015-06-18T00:42:39Z

Yes, that's why I missed them at first, I was looking for the jl_ tag to find usage, but just looked for the DLLEXPORT in making the above list... (this is where a cross-reference tool, that worked across C, C++, and Julia code, would be nice!)

That still leaves:

DLLEXPORT int jl_gc_is_enabled(void)
DLLEXPORT void jl_gc_lookfor(jl_value_t *v)
DLLEXPORT void *jl_gc_managed_malloc(size_t sz)
DLLEXPORT void *jl_gc_managed_realloc(void *d, size_t sz, size_t oldsz, int isaligned, jl_value_t* owner)
DLLEXPORT int jl_gc_n_preserved_values(void)
DLLEXPORT void jl_gc_preserve(jl_value_t *v)
DLLEXPORT void jl_gc_unpreserve(void)

that don't seem to need to be exported, and:

DLLEXPORT void gc_queue_root(jl_value_t *ptr)
DLLEXPORT jl_value_t *allocobj(size_t sz)
DLLEXPORT jl_value_t *alloc_0w(void)
DLLEXPORT jl_value_t *alloc_1w(void)
DLLEXPORT jl_value_t *alloc_2w(void)
DLLEXPORT jl_value_t *alloc_3w(void)

that don't seem to follow the naming convention of jl_gc_ (for the functions in this file).

tkelman · 2015-06-18T01:08:35Z

we should certainly prefix everything we export from the dll, not doing that is asking for name conflicts for embedders

ScottPJones · 2015-06-18T01:11:04Z

Do you think it would break anything to make all of those functions follow the convention of jl_gc_?
If not, I'll go ahead and change them in a PR...

yuyichao · 2015-06-18T01:12:15Z

Just remember to update the names (strings) in codegen as well.

tkelman · 2015-06-18T01:16:33Z

and in src/julia.expmap

ScottPJones · 2015-06-18T01:21:28Z

@yuyichao Yes, of course (updating codegen) (I was already going through codegen.cpp, and found some more that are problematical)
@tkelman Thanks, I wasn't aware of src/julia.expmap

yuyichao · 2015-06-18T01:36:07Z

And as you are there, there's also gc_wb and friends

ScottPJones · 2015-06-18T02:15:45Z

As far as I can tell, only gc_wb_slow needs to have it's name changed (it's the only one exported).
I can go ahead and change all of them though, be more consistent...

waTeim · 2015-06-18T02:17:55Z

wait wait! This will totally break me, I'm frantically getting the list together of all of the symbols I use directly, ... .it's a long list. Will be about 10 minutes ...

yuyichao · 2015-06-18T02:20:12Z

gc_wb_slow is actually not used anywhere (last time I checked nowhere is emitting the slow path).

gc_wb is actually an API and is even in the embedding doc so it can cause name collision.

yuyichao · 2015-06-18T02:22:30Z

@waTeim Yes, any change of these names will be breaking.

tkelman · 2015-06-18T02:24:38Z

They're certainly exported from the windows dll.

While breaking, these never should have been exported with non-prefixed names to begin with. I think there's even a comment that says so somewhere.

This really needs to move to its own issue though, it has nothing to do with this PR. (@ScottPJones totally unrelated questions should probably go to the mailing list first.)

waTeim · 2015-06-18T02:28:18Z

that's fine, move it to it's own issue, tell us embedders what it is -- announce it to dev, give us time to react and determine a way to be compatible and we can manage.

That will give me time to put my symbol list together.

ScottPJones · 2015-06-18T02:39:54Z

Sorry, @tkelman! I'll have a PR for this shortly, and people can play around with it...

Change dump of reinit_list to 8 bytes when size_t is 8 bytes

c0317e1

yuyichao reviewed May 20, 2015
View reviewed changes

Also update load to expect 64bit quantities

354443f

yuyichao mentioned this pull request Jun 16, 2015

Silence ptr->int conversion warning in dump.c #11724

Merged

vtjnash closed this Jun 16, 2015

yuyichao mentioned this pull request Jun 18, 2015

Fix exported names from gc.c to have jl_gc_ prefix #11741

Merged

Change dump of reinit_list to 8 bytes when size_t is 8 bytes #11363

Change dump of reinit_list to 8 bytes when size_t is 8 bytes #11363

Conversation

waTeim commented May 20, 2015

yuyichao May 20, 2015

Choose a reason for hiding this comment

waTeim commented May 20, 2015

waTeim commented May 20, 2015

vtjnash commented May 22, 2015

waTeim commented May 24, 2015

vtjnash commented May 24, 2015

ScottPJones commented Jun 16, 2015

waTeim commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

yuyichao commented Jun 16, 2015

yuyichao commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

carnaval commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

carnaval commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

carnaval commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

vtjnash commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

vtjnash commented Jun 16, 2015

waTeim commented Jun 16, 2015

vtjnash commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

yuyichao commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

carnaval commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

carnaval commented Jun 16, 2015

carnaval commented Jun 16, 2015

ScottPJones commented Jun 16, 2015

ScottPJones commented Jun 17, 2015

carnaval commented Jun 17, 2015

ScottPJones commented Jun 17, 2015

ScottPJones commented Jun 18, 2015

vtjnash commented Jun 18, 2015

yuyichao commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

tkelman commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

yuyichao commented Jun 18, 2015

tkelman commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

yuyichao commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

waTeim commented Jun 18, 2015

yuyichao commented Jun 18, 2015

yuyichao commented Jun 18, 2015

tkelman commented Jun 18, 2015

waTeim commented Jun 18, 2015

ScottPJones commented Jun 18, 2015