-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement file embedding #414
Conversation
$EMBED:'filename','handle' and _EMBEDDED$("handle")
try again...
forget AddQuotes$(), rather make file.bas self-contained using CHR$(34)
as suggested by Matt...
ideally these should have been part of the 1st commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with my comments, I'd be interested to see at least some basic tests of $EMBED
and _EMBEDDED$()
added, it should be pretty simple I think (embed a file, read it, and verify the contents are correct). Otherwise this may not get tested on Linux and Mac OS before the next release.
@@ -3548,6 +3635,60 @@ DO | |||
|
|||
END IF 'QB64 Metacommands | |||
|
|||
'Check that Embed-Handle is a literal string and follows conventions, | |||
'also check if such a handle was defined and mark it as used then. | |||
IF INSTR(a3u$, qb64prefix$ + "EMBEDDED$") > 0 THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in understanding that this code is only necessary so that we can exclude unnecessary $EMBED
s?
I'm personally tempted to say we just shouldn't do that, it seems like the string literal requirement will introduce a lot of confusion for little benefit. This would be the only function that works that way, and func__embedded()
still takes a string argument at runtime anyway so it's not really necessary to do this.
NEXT vc& | ||
'--- process remaining BYTEs --- | ||
IF cntB& > 0 THEN | ||
PRINT #dff%, "static const uint8_t "; handle$; "B[] = {" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't just output the whole file this way? It seems a lot simpler overall since you'd only have one array to deal with. Once it's compiled it doesn't really matter if it was originally one array or multiple.
Going along with that, if we only had one array then you could use qbs_new_txt()
with the single array and avoid needing to allocating a new buffer to copy the data into. In the non-compressed case it would mean the data could \be used as-is with no copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mkilgore. We can just create a byte array and have the whole file inside it. We can use sizeof
to get the size of the array. Whether the data is compressed or not can be simply stored in a bool
inside the function that we are generating. Uncompressed data can be directly used without the need to do two memcpy
s.
qbs *GetArrayData_peLogo(qbs *handle)
{
if (!qbs_equal(handle, qbs_new_txt("peLogo"))) {return qbs_new_txt("");}
qbs *data = qbs_new(17394, 1);
void *buff = data -> chr;
memcpy(buff, &peLogoL0[1], peLogoL0[0] << 2);
buff += (peLogoL0[0] << 2);
memcpy(buff, &peLogoB[1], peLogoB[0]);
return func__inflate(data, 17438, 1);
}
END IF | ||
PRINT #dff%, "" | ||
IF packed% THEN | ||
PRINT #dff%, " return func__inflate(data, "; LTRIM$(STR$(LEN(filedata$))); ", 1);" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to qbs_free()
the data
string here, since inflate creates a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func__inflate()
returns a tmp
qbs
. Do we still need to qbs_free()
that?
qbs *ret = qbs_new(uncompsize, 1);
I am asking this because I am working on something and I see a lot of libqb
stuff creating strings this way and returning it.
The only time, it is not done this way is when the qbs
is static
inside the function and allocated only once during the lifetime of the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I'll be honest, I don't know for sure, there's some spooky stuff that goes on with the string allocation and cleanup.
My thinking was that the data
string that we pass to func__inflate()
is created earlier in this function using qbs_new()
, but that's not the string that gets returned - func__inflate()
calls qbs_new()
itself and returns a new string. The data
string is never used again after the func__inflate()
call and seems like it's a leak 🤷
FOR vc& = 0 TO cntV& | ||
IF vc& = cntV& THEN numL& = (cntL& MOD 8180): ELSE numL& = 8180 | ||
PRINT #dff%, "static const uint32_t "; handle$; "L"; LTRIM$(STR$(vc&)); "[] = {" | ||
PRINT #dff%, " "; LTRIM$(STR$(numL& * 8)); "," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the code would be easier to read if you stored the lengths in separate variables rather than in the arrays themselves. But also, you can use sizeof
to get the size of the array (in bytes) when defined this way, you don't actually need to store the length.
source/utilities/file.bas
Outdated
PRINT #dff%, " if (!qbs_equal(handle, qbs_new_txt("; AddQuotes$(handle$); "))) {return qbs_new_txt("; MKI$(&H2222); ");}" | ||
PRINT #dff%, "" | ||
PRINT #dff%, " qbs *data = qbs_new("; LTRIM$(STR$(fl&)); ", 1);" | ||
PRINT #dff%, " void *buff = data -> chr;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but this should really be a char *
. gcc lets you do pointer math on void *
for some reason but it's not ideal.
source/utilities/file.bas
Outdated
CLOSE #sff% | ||
'--- try to compress --- | ||
compdata$ = _DEFLATE$(filedata$) | ||
IF LEN(compdata$) < LEN(filedata$) THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to let the caller pick if compression is done - going along with some of my other comments, if the compression is skipped then you could get away with no copying of the data. That could be something that the developer explicitly wants to avoid the runtime and memory cost (though we probably want to discourage embedding files so large that it matters anyway...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data is stored uncompressed, is there some way we could get a pointer and size to wherever it's embedded in memory? Then one could point a mem block to it with M = _MEM(pointer, size), and access that embedded data directly, without ever having to assign it to another actual variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That goes to my other comment about the multiple C arrays vs. a single one. If you store the entire thing in a single array uncompressed then you can create a STRING
out of it using qbs_new_txt()
and that does no copying. In that situation _EMBEDDED$()
would return a STRING
which is directly backed by the original data, and only if you modify the string will the data need to be copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to let the caller pick if compression is done - going along with some of my other comments, if the compression is skipped then you could get away with no copying of the data. That could be something that the developer explicitly wants to avoid the runtime and memory cost (though we probably want to discourage embedding files so large that it matters anyway...)
I agree. We should let the caller decide. From my observations even an MP3 file would compress if you pass it through _DEFLATE
but only by a few bytes. In those cases you end up decompressing the data twice which can contribute a significant performance hit.
OPEN "A", #eflFF, tmpdir$ + "embedded.cpp" | ||
'append the internal retrieval function for _EMBEDDED$() | ||
PRINT #eflFF, "qbs *func__embedded(qbs *handle) {" | ||
PRINT #eflFF, " qbs *result = qbs_new_txt("; MKI$(&H2222); ");" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just call this on the return
line, otherwise I'm pretty sure you need to qbs_free()
it when it doesn't get used.
PRINT #eflFF, "" | ||
FOR i = 0 TO eflUB | ||
IF embedFileList$(eflFile, i) <> "" AND VAL(embedFileList$(eflUsed, i)) > 0 THEN | ||
PRINT #eflFF, " result = GetArrayData_"; embedFileList$(eflHand, i); "(handle);" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth producing a hash table to look-up the correct function rather than the loop.
@@ -3413,6 +3418,88 @@ DO | |||
RETURN | |||
END IF | |||
|
|||
IF LEFT$(a3u$, 6) = "$EMBED" THEN | |||
'redim/clear the $EMBED array on each new pass | |||
IF linenumber <= eflLastLine THEN eflLastLine = 0: REDIM SHARED embedFileList$(3, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clear this out earlier when all the other stuff gets cleared before compilation (Ex. maybe around line 1646, that's where I clear the midi stuff).
The user might build a program that uses $EMBED
and then build one that does not (or remove the line). When that happens, the embedFileList$
won't get reset on the next compile since there will be no $EMBED
lines to trigger this.
eflUB = UBOUND(embedFileList$, 2) | ||
FOR i = 0 TO eflUB | ||
IF embedFileList$(eflFile, i) = EmbedFile$ THEN | ||
a$ = "File '" + EmbedFile$ + "' was already embedded in line" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm not sure we should ban this, I think you could reasonably want to use the same file for several embeds (Ex. You reuse the same image for different handles, so that you can easily change them separately in the future). I'd rather just allow this now and then in the future we could optimize it so that we only embed one copy of each file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you just need one embed, even in that case?
image1 = _LOADIMAGE(_Embedded(foo$), 32, "memory")
image2 = _LOADIMAGE(_Embedded(foo$), 32, "memory")
image3 = _LOADIMAGE(_Embedded(foo$), 32, "memory")
The embed is just the image file itself in memory, and just like if it was on the drive, you'd still need to _LoadImage it before it'd become a valid, working image handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is not a "can you" but rather "should you". If you have multiple embeds that serve completely different purposes then you don't want them to share a handle regardless of whether they use the same file - in the future you may change the file for one of the handles but not both, and you wouldn't want to have to then figure out which spots are supposed to use which handle.
IMO It's the same as saying "two CONST
s shouldn't have the same value, if they do then just use the same CONST
in both places". It's technically true, you don't need more than one CONST
with a particular value, but it misses the point that if one CONST
is named test_file
and another is named logo
they might temporarily (or always) have the same value but they don't serve the same purpose and combining them wouldn't be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's like having two CONSTs with the same value. I think it's more like putting the same font file inside your folder and then naming it cour(2).ttf, cout(3).ttf, cour(4).ttf.... At the end of the day, does it matter which of those files you use to load into a font handle? They're all the exact same font, and unless you have some plan to overwrite those files, there's absolutely no reason to have more than that one.
I honestly don't think there's any way to overwrite (and save) embedded data inside a program, with that program itself. Windows doesn't allow that type of virus-like behavior at all.
I'd think a better way to think of it would be like with DATA statements. Would you ever include the same 1000 lines of data, just to place them under a separate label? Or would you just make multiple labels for the same data?
label_1:
label_2:
label_3:
DATA "dog", "frog", "fish", "wish", "chicken"
...999 more lines of data
Then, if you really need to, you can always RESTORE label_1, RESTORE label_2, or RESTORE label_3, but you still wouldn't need to have that same 1000 lines repeated again and again in the same program.
At the most, I think allowing multiple labels for the same embed might not be so terrible. $EMBED:'foo.ext', 'handle1', 'handle2', 'handle3' But the idea of including the same file multiple times, just to have it listed under a separate label to read back from, seems like a bloat of exe size and a waste of an user's memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, the handle is only in use really with _EMBEDDED, which will decompress the data and then assign it to wherever you want. For what you're talking about, all you'd need is:
a$ = _EMBEDDED$(foo$)
b$ = _EMBEDDED$(foo$)
Use a$ all you want in one spot. Use b$ all you want in a completely different spot. foo$ itself is embedded and wouldn't ever really change on you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're actually in agreement :P On the backend I agree, we should only keep one copy of the file embedded and serve it up for all the handles that use that file, but that's also an optimization we could do later-on if Rho doesn't want to do it here.
If we error on using the same file though, as it currently works, then introducing that optimization won't help people who already duplicated their files to get around the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to address the performance / memory costs.
Allow the user the choice of compressing the data or not.
source/utilities/file.bas
Outdated
CLOSE #sff% | ||
'--- try to compress --- | ||
compdata$ = _DEFLATE$(filedata$) | ||
IF LEN(compdata$) < LEN(filedata$) THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to let the caller pick if compression is done - going along with some of my other comments, if the compression is skipped then you could get away with no copying of the data. That could be something that the developer explicitly wants to avoid the runtime and memory cost (though we probably want to discourage embedding files so large that it matters anyway...)
I agree. We should let the caller decide. From my observations even an MP3 file would compress if you pass it through _DEFLATE
but only by a few bytes. In those cases you end up decompressing the data twice which can contribute a significant performance hit.
NEXT vc& | ||
'--- process remaining BYTEs --- | ||
IF cntB& > 0 THEN | ||
PRINT #dff%, "static const uint8_t "; handle$; "B[] = {" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mkilgore. We can just create a byte array and have the whole file inside it. We can use sizeof
to get the size of the array. Whether the data is compressed or not can be simply stored in a bool
inside the function that we are generating. Uncompressed data can be directly used without the need to do two memcpy
s.
qbs *GetArrayData_peLogo(qbs *handle)
{
if (!qbs_equal(handle, qbs_new_txt("peLogo"))) {return qbs_new_txt("");}
qbs *data = qbs_new(17394, 1);
void *buff = data -> chr;
memcpy(buff, &peLogoL0[1], peLogoL0[0] << 2);
buff += (peLogoL0[0] << 2);
memcpy(buff, &peLogoB[1], peLogoB[0]);
return func__inflate(data, 17438, 1);
}
- As suggested by @mkilgore , moved the embed list array reset out of the $EMBED block - Imposed a 20% least ratio for compression - Moved the handle comparison into `func__embedded()` to avoid some unnecessary function calls
simple test case, which just embeds the `test.output` file and prints it
$EMBED:'filename','handle'
and_EMBEDDED$("handle")
Closing #383