RDB$BLOB_UTIL system package. #281

asfernandes · 2020-08-23T01:32:52Z

No description provided.

aafemt · 2020-08-23T10:22:22Z

Wouldn't it be better for this package to be a little more compatible with Oracle in term of names and usage...?

sim1984

Firebird 5? What's stopping you from adding this feature to Firebird 4.0?

dyemanov · 2021-03-24T09:00:41Z

doc/sql.extensions/README.blob_util.md

+## Function `NEW`
+
+`RDB$BLOB_UTIL.NEW` is used to create a new BLOB. It returns a handle (an integer bound to the transaction) that should be used with the others functions of the package.
+


Do we really need such an artificial for SQL concept as "handle" here? Every blob is represented with blob ID which actually is a handle. Passing a blob here and there inside PSQL (except assigning to a table field) is just a matter of copying its ID, the contents is not touched. So tra_blob_util_map may just store blob IDs created/opened with RDB$BLOB_UTIL package. And all package functions may declare inputs/outputs as just BLOB instead of INTEGER handle. Do I miss anything?

A blob id is used in the client with a handle. A handle in this context is an id more the blb class inside the engine. A blb has information like current position. RDB$BLOB_UTIL handles model this concept in PSQL.

A blob id for this would be very confusing. Many different variables would have the same id so how one could have multiple parallel seek/read in the same blob id?

Also a blob id is implicitely copied depending on blob charset when they are passed as arguments.

dyemanov · 2021-03-24T09:04:45Z

doc/sql.extensions/README.blob_util.md

+
+Input parameter:
+ - `SEGMENTED` type `BOOLEAN NOT NULL`
+ - `TEMP_STORAGE` type `BOOLEAN NOT NULL`


Perhaps we should be prepared for the tablespaces feature, so that blobs could be created in the explicitly specified (by name) tablespace which can be either "permanent" or "temporary". This requires more thinking.

A new parameter with default. Also named arguments as Oracle => would be very interesting.

The problem is that TEMP_STORAGE may conflict with the tablespace, e.g. TEMP_STORAGE = TRUE but TABLESPACE = MY_BLOB_SPACE. This looks error-prone.

Do you know how the same problem is going to be resolved in regard to storage specified in BPB?

I think the lower-level options (TEMP_STORAGE parameter or isc_bpb_storage_temp) should override the DDL-level default storage.

dyemanov · 2021-03-24T09:06:03Z

doc/sql.extensions/README.blob_util.md

+
+## Function `OPEN_BLOB`
+
+`RDB$BLOB_UTIL.OPEN_BLOB` is used to open an existing BLOB for read. It returns a handle that should be used with the others functions of the package.


Please just "OPEN", not "OPEN_BLOB". We already have just "NEW" and the package is named RDB$BLOB_UTIL ;-)

I would use it if it's not a reserved word.

OK, but let's be consistent then and rename NEW -> NEW_BLOB, to complement OPEN_BLOB, MAKE_BLOB and possible APPEND_BLOB ;-)

dyemanov · 2021-03-24T09:09:45Z

doc/sql.extensions/README.blob_util.md

+Input parameters:
+ - `HANDLE` type `INTEGER NOT NULL`
+ - `DATA` type `VARBINARY(32767) NOT NULL`
+


I believe we need yet another routine -- something like APPEND_BLOB -- to concatenate the whole other blob if it's longer than 32KB. Here "BLOB" in the name again seems redundant ;-) so better naming ideas are welcome. Or we should find a way to make APPEND polymorphic in regard in its input.

Perhaps we could create a VARIANT type which could be used as system routines arguments - and also as general data type.

System functions already can work in this way, but they do not have stored metadata.

While the VARIANT type might be an interesting idea per se, it requires some serious thinking and discussions and it could be an overkill for this particular need if we need to release v5 really soon. So I'd be more happy with APPEND_TEXT (or APPEND_STRING if you wish) and separate APPEND_BLOB.

dyemanov · 2021-03-24T09:12:27Z

doc/sql.extensions/README.blob_util.md

+
+Return type: `INTEGER NOT NULL`.
+
+## Procedure `APPEND`


I would be more happy to see all routines defined as functions even if they don't return anything useful, just because of this typing difference:
execute procedure rdb$blob_util.append(...);
vs
rdb$blob_util.seek(...);

What should we return? True or 0?

Integer NULL maybe. From another side, we could add the SQL-standard CALL in addition to our legacy EXECUTE PROCEDURE and the typing would be much easier ;-)

CALL would be good to fix EXECUTE PROCEDURE, but since our procedures are not identical to SQL, I will open a discussion in devel.

dyemanov · 2021-03-24T09:22:38Z

doc/sql.extensions/README.blob_util.md

+If `LENGTH` is passed with a positive number, it returns a VARBINARY with its maximum length.
+
+If `LENGTH` is `NULL` it returns just a segment of the BLOB with a maximum length of 32765.
+


IIRC, blob segments may be up to 64KB in length. Will the longer-than-32KB segment be truncated to 32KB and the next READ call would return the remaining part? Can there be any consequences if the segment is split to multiple parts? For example, one source segment will be written as two segments in the target blob. Blob filters may be not able to decode half-chunks properly (firstly it's about built-in transliteration filters -- think about splitting in the middle of a multi-byte character -- although perhaps they never deal with chunks longer then 32KB).

We could increase max VARCHAR to 64KB - 2. But also, is there an impediment to have max dsc_length of dtype_varying to 64KB?

As I understand, there should not be many places just reading and incrementing dsc_length, so dtype_cstring / dtype_varying does not need to have the constant size added to dsc_length.

It would simplify various places that substract and re-add that value.

Well, the segment split problem remains anyway, if the LENGTH argument is not NULL (and less than the segment size). So perhaps we don't need to do anything special right now. Those who use filtered blobs should either prefer under-32KB segments or avoid using this package.

dyemanov · 2021-03-24T09:25:08Z

doc/sql.extensions/README.blob_util.md

+## Function `MAKE_BLOB`
+
+`RDB$BLOB_UTIL.MAKE_BLOB` is used to create a BLOB from a BLOB handle created with `NEW` followed by its content added with `APPEND`. After `MAKE_BLOB` is called the handle is destroyed and should not be used with the others functions.
+


With parameters being BLOB rather than INTEGER handle, I'd just call it "CLOSE". And maybe think about "auto-close" scenarios in some cases.

dyemanov · 2021-03-24T09:37:01Z

src/jrd/BlobUtil.cpp

+	IExternalContext* context, const AppendInput::Type* in, void*)
+{
+	const auto tdbb = JRD_get_thread_data();
+	Attachment::SyncGuard guard(tdbb->getAttachment(), FB_FUNCTION);


Unrelated to this PR: could it make sense (from the performance POV) to avoid EngineCheckout for system packages?

It could be avoided for simple cases. Do you mean something different?

It could be avoided for simple cases. Do you mean something different?

I was been talking about avoinding SyncGuard for simple cases.

But yes, we should better avoid EngineCheckout for system packages in general.

dyemanov · 2021-03-24T09:39:44Z

src/jrd/BlobUtil.cpp

+	if (in->data.length > 0)
+		blob->BLB_put_data(tdbb, (const UCHAR*) in->data.str, in->data.length);
+	else if (in->data.length == 0 && !(blob->blb_flags & BLB_stream))
+		blob->BLB_put_segment(tdbb, (const UCHAR*) in->data.str, 0);


While zero-length segments are allowed by the engine, does it make sense to copy them?

I see you do not copy them when replicating, but if user can explicitly do it and gbak preserves it, I think it should be preserved.

OK, I don't mind.

src/jrd/BlobUtil.cpp

src/jrd/jrd.h

dyemanov · 2022-10-12T10:20:40Z

I've added some comments with a hope to close the remaining questions. Naming consistency (whether "_BLOB" suffix in method names should be mandatory) also needs some agreement.

asfernandes · 2022-10-21T11:56:55Z

This package needs some adjustments after APPEND_BLOB to not implement same thing in different way:

Currently RDB$BLOB_UTIL.NEW returns a handle, that needs usage of RDB$BLOB_UTIL.APPEND and RDB$BLOB_UTIL.MAKE_BLOB.

RDB$BLOB_UTIL.NEW should be replaced by RDB$BLOB_UTIL.NEW_BLOB, that creates (only create, not append) a blob in very similar way than APPEND_BLOB do, but with option to create it in temporary space and segmented/stream like RDB$BLOB_UTIL.NEW was allowing.

RDB$BLOB_UTIL.APPEND and RDB$BLOB_UTIL.MAKE_BLOB should be removed and let APPEND_BLOB and BLB_close_on_read do this work.
RDB$BLOB_UTIL.CANCEL should be adapted in relation to changed RDB$BLOB_UTIL.NEW. It should be split in two functions: RDB$BLOB_UTIL.CLOSE (for opened handles) and RDB$BLOB_UTIL.CANCEL_BLOB (for BLB_close_on_read).

Please note that some functions has _BLOB suffixes and some not. This is because some functions operates on blobs and some in handles.

Maybe make sense to add _HANDLE suffixes too. That would also be to deal with reserved words, as for example, CLOSE is a reserved word.

sim1984 · 2022-10-21T13:25:56Z

I do not agree with these changes. Why try to cast the return result to the RDB$BLOB_APPEND variant? Let this package work with handles and in a different way than RDB$BLOB_APPEND.

asfernandes · 2022-10-22T01:54:28Z

I do not agree with these changes. Why try to cast the return result to the RDB$BLOB_APPEND variant? Let this package work with handles and in a different way than RDB$BLOB_APPEND.

Cast RDB$BLOB_APPEND? I said to remove RDB$BLOB_UTIL.APPEND. This job can be done by APPEND_BLOB, we liking it or not.

So with exception of RDB$BLOB_UTIL.NEW_BLOB (that fulfills a job not done by APPEND_BLOB), the package will be most for reading.

sim1984 · 2022-10-22T07:33:05Z

I don't mind deleting append, but in this case it would be necessary to provide the blob_write procedure for new blobs. Let this package be a kind of analogue of the blob api. In this case, you will not be able to abandon make_blob. In addition, I am against changing the types of open. Otherwise, it will work, but here the fish was wrapped. And by the way, it would be nice to have a blob_info procedure, similar to the api, but returning all the information at once, such as the blob type, number of segments, length, and so on.

asfernandes · 2022-10-22T10:55:56Z

Why are you insisting for write function if APPEND_BLOB can do it?

cincuranet · 2022-10-24T15:09:20Z

Maybe make sense to add _HANDLE suffixes too. That would also be to deal with reserved words, as for example, CLOSE is a reserved word.

That would make sense to me.

…LOB_APPEND.

sim1984 · 2022-12-12T06:38:12Z

How about the BLOB_INFO procedure, which returns information about a BLOB:

blob type (segmented, streamed)
blob length
number of segments (if any);
blob placement (temporary, permanent).

asfernandes · 2022-12-12T10:57:43Z

How about the BLOB_INFO procedure, which returns information about a BLOB:

blob type (segmented, streamed)

What is the use case to get this info in PSQL?

blob length

There is already CHAR_LENGTH and OCTET_LENGTH.

number of segments (if any);

What is the use case to get this info in PSQL?

blob placement (temporary, permanent).

This may be good.

sim1984 · 2022-12-12T11:08:38Z

What is the use case to get this info in PSQL?

This may be required for your own BLOB_UTILS package. BLOB navigation with SEEK is only possible for streaming BLOBs.

About length, I meant the information that can be obtained through isc_blob_info. Although considering that it still doesn't work for long BLOBs > 2GB, it's probably not necessary.

In fact, I would like only this:

blob type (segmented, streamed)
blob placement (temporary, permanent).

aafemt · 2022-12-12T11:10:39Z

This may be good.

But also useless IMHO.

Implementation of SEEK for segmented BLOBs is not that complicated.

dyemanov · 2022-10-12T09:34:03Z

doc/sql.extensions/README.blob_util.md

+If `LENGTH` is passed with a positive number, it returns a VARBINARY with its maximum length.
+
+If `LENGTH` is `NULL` it returns just a segment of the BLOB with a maximum length of 32765.
+


Well, the segment split problem remains anyway, if the LENGTH argument is not NULL (and less than the segment size). So perhaps we don't need to do anything special right now. Those who use filtered blobs should either prefer under-32KB segments or avoid using this package.

dyemanov · 2022-10-12T09:42:21Z

doc/sql.extensions/README.blob_util.md

+
+Input parameter:
+ - `SEGMENTED` type `BOOLEAN NOT NULL`
+ - `TEMP_STORAGE` type `BOOLEAN NOT NULL`


I think the lower-level options (TEMP_STORAGE parameter or isc_bpb_storage_temp) should override the DDL-level default storage.

src/jrd/BlobUtil.cpp

dyemanov · 2022-10-12T10:18:05Z

src/jrd/names.h

+NAME("RDB$BLOB_UTIL_HANDLE", nam_butil_handle)
+NAME("RDB$BLOB", nam_blob)
+NAME("RDB$VARBINARY_MAX", nam_varbinary_max)
+NAME("RDB$LONG_NUMBER", nam_long_number)


Maybe name it RDB$HANDLE instead? For me "long number" suggests something like "longer than usual" e.g. INT64 and also "NUMBER" is not necessarily means INTEGER in the SQL world.

Well, now I see that RDB$BLOB_UTIL_HANDLE is not used at all and RDB$LONG_NUMBER acts as both a handle and a mode/offset/length too. Is this correct?

Correct for RDB$BLOB_UTIL_HANDLE.

For RDB$LONG_NUMBER I want to create a generic domain as I think it does not make sense to create one domain for OFFSET and another for LENGTH. For MODE it would make sense, but I'm just reusing it there too. I now renamed it to RDB$INTEGER.

RDB$BLOB_UTIL system package.

9cff3dd

asfernandes force-pushed the work/blob-util branch from 857bfea to 9cff3dd Compare August 24, 2020 01:24

sim1984 reviewed Aug 24, 2020

View reviewed changes

asfernandes self-assigned this May 6, 2021

asfernandes added the type: improvement label May 6, 2021

dyemanov requested changes May 14, 2021

View reviewed changes

asfernandes added 3 commits May 17, 2021 14:37

Merge remote-tracking branch 'origin/master' into work/blob-util

9d8a404

Do not checkout from engine when calling system packages.

8ad7850

Remove usage of Attachment::SyncGuard in RDB$BLOB_UTIL.

081469f

asfernandes commented May 17, 2021

View reviewed changes

src/jrd/jrd.h Outdated Show resolved Hide resolved

Fix Windows build.

cfcbaaf

asfernandes added 2 commits October 19, 2022 08:08

Fix RDB$BLOB_UTIL.SEEK.

9886d64

Merge remote-tracking branch 'origin/master' into work/blob-util

eb6e19e

Fix crash.

e18bd80

asfernandes added 3 commits December 11, 2022 21:56

Rework changing routines and names for better fit after creation of B…

84c4651

…LOB_APPEND.

Add RDB$BLOB_UTIL.IS_WRITABLE function.

ec8e0ae

Merge remote-tracking branch 'origin/master' into work/blob-util

8d7e53e

asfernandes requested a review from dyemanov December 12, 2022 01:18

Misc.

c8f5070

dyemanov approved these changes Dec 15, 2022

View reviewed changes

asfernandes added 3 commits December 15, 2022 21:26

Fix documentation.

e0cde91

Re-add and use RDB$BLOB_UTIL_HANDLE domain.

22dcbea

Rename domain RDB$LONG_NUMBER to RDB$INTEGER.

c31a7da

asfernandes merged commit 73c1ab8 into master Dec 16, 2022

asfernandes added fix-version: 5.0 Beta 1 component: engine labels Dec 16, 2022

		## Function `NEW`

		`RDB$BLOB_UTIL.NEW` is used to create a new BLOB. It returns a handle (an integer bound to the transaction) that should be used with the others functions of the package.


		## Function `OPEN_BLOB`

		`RDB$BLOB_UTIL.OPEN_BLOB` is used to open an existing BLOB for read. It returns a handle that should be used with the others functions of the package.

		If `LENGTH` is passed with a positive number, it returns a VARBINARY with its maximum length.

		If `LENGTH` is `NULL` it returns just a segment of the BLOB with a maximum length of 32765.

		## Function `MAKE_BLOB`

		`RDB$BLOB_UTIL.MAKE_BLOB` is used to create a BLOB from a BLOB handle created with `NEW` followed by its content added with `APPEND`. After `MAKE_BLOB` is called the handle is destroyed and should not be used with the others functions.

RDB$BLOB_UTIL system package. #281

RDB$BLOB_UTIL system package. #281

Conversation

asfernandes commented Aug 23, 2020

aafemt commented Aug 23, 2020

sim1984 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfernandes May 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfernandes May 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfernandes May 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dyemanov commented Oct 12, 2022

asfernandes commented Oct 21, 2022

sim1984 commented Oct 21, 2022 • edited Loading

asfernandes commented Oct 22, 2022

sim1984 commented Oct 22, 2022

asfernandes commented Oct 22, 2022

cincuranet commented Oct 24, 2022

sim1984 commented Dec 12, 2022

asfernandes commented Dec 12, 2022

sim1984 commented Dec 12, 2022 • edited Loading

aafemt commented Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfernandes May 15, 2021 •

edited

Loading

asfernandes May 15, 2021 •

edited

Loading

asfernandes May 15, 2021 •

edited

Loading

sim1984 commented Oct 21, 2022 •

edited

Loading

sim1984 commented Dec 12, 2022 •

edited

Loading

aafemt commented Dec 12, 2022 •

edited

Loading