You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specifically, added the following:
* Combine the two document loaders into a single AzureBlobStorageLoader
class.
* Encourage blob_parser parameter more as a future consideration. This
would be helpful if a customer did not necessarily wanted to use the
blob loader interfaces but still wanted more control over how to parse
blobs.
@@ -297,11 +298,11 @@ for doc in loader.lazy_load():
297
298
Below shows how to load documents asynchronously. This is acheived by calling the `aload()` or `alazy_load()` methods on the document loader. For example:
298
299
299
300
```python
300
-
from langchain_azure_storage.document_loaders importAzureBlobStorageContainerLoader
301
+
from langchain_azure_storage.document_loaders importAzureBlobStorageLoader
@@ -573,6 +577,27 @@ However, similar to why document loaders were chosen over blob loaders, blob par
573
577
over libraries like Unstructured and takeaway from the batteries-included value proposition that LangChain document
574
578
loaders provide.
575
579
580
+
It's important to note that this decision does not prevent us from exposing a `blob_parser` parameter in the future.
581
+
Specifically, this would be useful if we see customers wanting to customize loading behavior more but not necessarily
582
+
want to drop down to using a blob loader interface.
583
+
584
+
585
+
#### Exposing document loaders as two classes, `AzureBlobStorageFileLoader` and `AzureBlobStorageContainerLoader`, instead of a single `AzureBlobStorageLoader`
586
+
Exposing the document loaders as these two classes would be beneficial in that they would match the existing community
587
+
document loaders and lessen the amount of changes needed to migrate. However, combining them into a single class
588
+
has the following advantages:
589
+
590
+
* It simplifies the getting started experience. Customers will no longer have to make a decision on which Azure Storage
591
+
document loader class to use as there will be only one document loader class to choose from.
592
+
* It simplifies class names by removing the additional `File` and `Container` qualifiers, which could lead to
593
+
misinterpretations on what the classes do.
594
+
* It is easier to maintain as there is only one class that will need to be maintained and less code will likely need to
595
+
be duplicated.
596
+
597
+
While this will introduce an additional step in migrating (i.e., change class names), the impact is limited
598
+
as customers will still be providing the same positional parameters even after changing class names
599
+
(i.e., use account + container for the container loader and account + container + blob for the file loader).
600
+
576
601
577
602
#### Alternatives to default parsing to UTF-8 text
578
603
The default parsing logic when no `loader_factory` is provided is to treat the blob content as UTF-8 text
@@ -638,10 +663,10 @@ customize how blobs are parsed to text. However, possible requested extension po
638
663
* Wanting the blob data to be passed using an in-memory representation than file on disk
639
664
640
665
If we ever plan to extend the interface, we should strongly consider exposing blob loaders
641
-
instead as discussed in the [alternatives considered](#exposing-a-blob_parser-parameter-instead-of-loader_factory)
666
+
and/or a `blob_parser` parameter instead as discussed in the [alternatives considered](#exposing-a-blob_parser-parameter-instead-of-loader_factory)
642
667
section above.
643
668
644
-
If blob loaders do not suffice, we could consider expanding the `loader_factory` to:
669
+
If blob loaders nor a `blob_parser` parameter suffice, we could consider expanding the `loader_factory` to:
645
670
646
671
* Inspect signature arguments of callable provided to `loader_factory` and call the callable with
647
672
additional parameters if detected (e.g., detect if the a `blob_properties` parameter is present and
@@ -666,7 +691,7 @@ Based on customer requests, in the future, we could consider exposing these prop
666
691
## Future work
667
692
Below are some possible future work ideas that could be considered after the initial implementation based on customer feedback:
668
693
669
-
* Expose blob loader integrations for Azure Blob Storage (see [alternatives considered](#exposing-a-blob_parser-parameter-instead-of-loader_factory) section).
0 commit comments