Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples: migrate code from googleapis/java-document-ai #7427

Merged
merged 153 commits into from
Nov 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
a7cbcb1
chore: regenerate common templates (#26)
yoshi-automation Mar 18, 2020
6c981e0
chore(deps): update dependency com.google.cloud:libraries-bom to v4.3…
renovate-bot Mar 23, 2020
3280c9d
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Mar 25, 2020
e40e2f1
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Mar 30, 2020
c9ad32b
chore(deps): update dependency com.google.cloud:libraries-bom to v4.4…
renovate-bot Apr 1, 2020
5ec319e
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Apr 6, 2020
4110b21
chore(deps): update dependency com.google.cloud:libraries-bom to v4.4…
renovate-bot Apr 6, 2020
da92f8a
chore(deps): update dependency com.google.cloud:libraries-bom to v5 (…
renovate-bot Apr 14, 2020
646ef5d
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Apr 16, 2020
6d2ece5
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Apr 17, 2020
bd9f3d8
chore(deps): update dependency com.google.cloud:libraries-bom to v5.2…
renovate-bot Apr 24, 2020
d01ae96
chore(deps): update dependency com.google.cloud:libraries-bom to v5.3…
renovate-bot Apr 28, 2020
f99dde5
samples: add v1beta2 samples (#42)
munkhuushmgl Apr 29, 2020
a4b258a
chore(deps): update dependency com.google.cloud:libraries-bom to v5.4…
renovate-bot May 19, 2020
fbdb45a
chore(deps): update dependency com.google.cloud:libraries-bom to v5.5…
renovate-bot May 29, 2020
13248e2
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Jun 10, 2020
6c632aa
chore(deps): update dependency com.google.cloud:libraries-bom to v5.7…
renovate-bot Jun 10, 2020
5945124
chore(deps): update dependency com.google.cloud:libraries-bom to v6 (…
renovate-bot Jun 16, 2020
1e2da90
chore(deps): update dependency com.google.cloud:libraries-bom to v7 (…
renovate-bot Jun 17, 2020
403d15a
chore(deps): update dependency com.google.cloud:libraries-bom to v7.0…
renovate-bot Jun 22, 2020
2a4a17c
chore(deps): update dependency com.google.cloud:libraries-bom to v8 (…
renovate-bot Jun 26, 2020
3d28cac
chore(deps): update dependency com.google.cloud:libraries-bom to v8.1…
renovate-bot Jul 16, 2020
e9ff1ea
chore(deps): update dependency com.google.cloud:libraries-bom to v9 (…
renovate-bot Aug 14, 2020
fc510a3
chore(deps): update dependency com.google.cloud:libraries-bom to v9.1.0
renovate-bot Aug 17, 2020
35b46dc
samples: add presubmit lint check (#156)
yoshi-automation Aug 27, 2020
670b65c
chore(deps): update dependency com.google.cloud:libraries-bom to v10 …
renovate-bot Sep 22, 2020
c4b7487
chore(deps): update dependency com.google.cloud:libraries-bom to v11
renovate-bot Sep 24, 2020
2d18f56
chore(deps): update dependency com.google.cloud:libraries-bom to v11.…
renovate-bot Oct 1, 2020
0f646be
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Oct 2, 2020
978f6fa
chore(deps): update dependency com.google.cloud:libraries-bom to v12 …
renovate-bot Oct 6, 2020
3f8be86
test(deps): update dependency junit:junit to v4.13.1
renovate-bot Oct 12, 2020
b8b9fca
samples: new Doc AI samples for v1beta3 (#206)
munkhuushmgl Oct 15, 2020
8c0c4e0
chore(deps): update dependency com.google.cloud:libraries-bom to v12.…
renovate-bot Oct 15, 2020
beb77bf
chore(deps): update dependency com.google.cloud:libraries-bom to v13 …
renovate-bot Oct 20, 2020
106b019
chore(deps): update dependency com.google.cloud:libraries-bom to v13.…
renovate-bot Oct 21, 2020
119decf
test(deps): update dependency com.google.truth:truth to v1.1 (#226)
renovate-bot Oct 22, 2020
606d496
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 22, 2020
116907e
chore(deps): update dependency com.google.cloud:libraries-bom to v13.…
renovate-bot Oct 23, 2020
c4e996c
chore(deps): update dependency com.google.cloud:libraries-bom to v13.…
renovate-bot Oct 27, 2020
c947f39
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 30, 2020
58d439f
chore(deps): update dependency com.google.cloud:libraries-bom to v13.…
renovate-bot Oct 30, 2020
6b1f9e7
test: re-enable BatchParseTable test (#239)
twinkle-kadia Nov 2, 2020
67b65b6
chore(deps): update dependency com.google.cloud:libraries-bom to v14 …
renovate-bot Nov 4, 2020
724a19d
chore(deps): update dependency com.google.cloud:libraries-bom to v15 …
renovate-bot Nov 5, 2020
b265d29
chore(deps): update dependency com.google.cloud:libraries-bom to v15.…
renovate-bot Nov 12, 2020
d18c6ae
chore(deps): update dependency com.google.cloud:libraries-bom to v16 …
renovate-bot Nov 19, 2020
89407fb
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Nov 19, 2020
acba712
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Dec 14, 2020
eb77023
chore(deps): update dependency com.google.cloud:libraries-bom to v16.…
renovate-bot Dec 15, 2020
ed73f66
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jan 7, 2021
25b98c5
chore(deps): update dependency com.google.cloud:libraries-bom to v16.…
renovate-bot Jan 7, 2021
30d3b37
chore: release 0.3.8 (#301)
release-please[bot] Jan 7, 2021
78d153e
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jan 7, 2021
f7cf7c4
chore(deps): update dependency com.google.cloud:libraries-bom to v16.…
renovate-bot Jan 20, 2021
cee8e33
test(deps): update dependency com.google.truth:truth to v1.1.2 (#324)
renovate-bot Jan 25, 2021
1097ac5
chore(deps): update dependency com.google.cloud:libraries-bom to v16.…
renovate-bot Feb 10, 2021
9dce0cb
chore: added conditonal check to prevent indexOutOfBound Exception (#…
munkhuushmgl Feb 10, 2021
015a95b
test(deps): update dependency junit:junit to v4.13.2 (#346)
renovate-bot Feb 16, 2021
2d9ac02
chore(deps): update dependency com.google.cloud:libraries-bom to v17 …
renovate-bot Feb 24, 2021
8fefa9f
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Feb 24, 2021
9932e9a
chore(deps): update dependency com.google.cloud:libraries-bom to v18 …
renovate-bot Feb 25, 2021
eea8bcf
chore(deps): update dependency com.google.cloud:libraries-bom to v18.…
renovate-bot Mar 3, 2021
b2f89b2
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Mar 3, 2021
9a44dcf
chore(deps): update dependency com.google.cloud:libraries-bom to v19 …
renovate-bot Mar 4, 2021
c6bd6a2
chore(deps): update dependency com.google.cloud:libraries-bom to v19.…
renovate-bot Mar 17, 2021
6493c3e
chore(deps): update dependency com.google.cloud:libraries-bom to v19.…
renovate-bot Mar 19, 2021
481e048
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Mar 23, 2021
2f9fcde
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Apr 8, 2021
0d7bb4f
chore: increased timeout len (#402)
munkhuushmgl Apr 9, 2021
f751880
chore(deps): update dependency com.google.cloud:libraries-bom to v20 …
renovate-bot Apr 12, 2021
f1a7877
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 14, 2021
166b90e
feat: generate v1 client (#421)
Neenu1995 Apr 14, 2021
fd27c7b
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Apr 19, 2021
e8c209b
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 20, 2021
cfcaf66
samples: updates samples to v1 (#425)
telpirion Apr 20, 2021
e395f4f
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 21, 2021
ca49a2d
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Apr 29, 2021
e4b4955
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 30, 2021
e381b64
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot May 13, 2021
5e5cfe7
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot May 14, 2021
41660f3
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot May 17, 2021
f21ed90
chore: increased timeout again attemp #2 (#492)
munkhuushmgl May 20, 2021
602b73f
test(deps): update dependency com.google.truth:truth to v1.1.3 (#496)
renovate-bot May 26, 2021
581bb55
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot May 26, 2021
ffd4843
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jun 2, 2021
4ea60a0
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Jun 6, 2021
67ca50e
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Jun 7, 2021
33215ff
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jun 16, 2021
1242c4b
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jun 17, 2021
a994492
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Jun 23, 2021
297590c
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jul 7, 2021
7b46739
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Jul 9, 2021
4a1886e
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jul 21, 2021
d4c13c3
samples: fixes typo (#554)
telpirion Jul 23, 2021
2e25bf2
chore(deps): update dependency com.google.cloud:libraries-bom to v20.…
renovate-bot Jul 28, 2021
66fa163
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Aug 12, 2021
253bf38
chore(deps): update dependency com.google.cloud:libraries-bom to v21 …
renovate-bot Aug 17, 2021
8e72f41
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Aug 24, 2021
0b68fca
chore(deps): update dependency com.google.cloud:libraries-bom to v22 …
renovate-bot Aug 27, 2021
5397972
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 1, 2021
7816ff9
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 1, 2021
572f139
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 2, 2021
8df6b30
chore(deps): update dependency com.google.cloud:libraries-bom to v23 …
renovate-bot Sep 8, 2021
2f2223c
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 14, 2021
f0ff0cf
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 29, 2021
4b8608c
chore(deps): update dependency com.google.cloud:libraries-bom to v23.…
renovate-bot Oct 1, 2021
cc2fdb3
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 1, 2021
3272d17
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 1, 2021
41ae1af
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 6, 2021
3d07283
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 21, 2021
57fbba7
chore(deps): update dependency com.google.cloud:libraries-bom to v24 …
renovate-bot Oct 27, 2021
e397a20
docs(samples): add OCR, form, quality, splitter and specialized proce…
Nov 17, 2021
f1537ba
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Dec 6, 2021
4bbc7a7
chore(deps): update dependency com.google.cloud.samples:shared-config…
renovate-bot Dec 6, 2021
1b684e6
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Dec 28, 2021
f093a03
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Jan 7, 2022
526af70
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Jan 7, 2022
31e9564
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jan 17, 2022
3ed9d52
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Jan 18, 2022
112beab
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Feb 8, 2022
4fa2371
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Feb 8, 2022
3fe5ca3
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Feb 9, 2022
dec9782
chore(deps): update dependency com.google.cloud:libraries-bom to v24.…
renovate-bot Mar 3, 2022
1b0d9fa
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Mar 8, 2022
eadb8bd
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Mar 10, 2022
bc32780
chore(deps): update dependency com.google.cloud:libraries-bom to v25 …
renovate-bot Mar 14, 2022
7398435
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 1, 2022
fe3ee55
chore(deps): update dependency com.google.cloud:libraries-bom to v25.…
renovate-bot Apr 1, 2022
e92108f
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 4, 2022
e6e7af8
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 18, 2022
78daae0
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Apr 19, 2022
fa182e1
chore: Removed v1beta2 samples and tests (#793)
galz10 Apr 20, 2022
278b7c4
chore(deps): update dependency com.google.cloud:libraries-bom to v25.…
renovate-bot Apr 27, 2022
9ba7fee
chore(deps): update dependency com.google.cloud:libraries-bom to v25.…
renovate-bot May 16, 2022
e38ce59
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot May 25, 2022
d219fc4
chore(deps): update dependency com.google.cloud:libraries-bom to v25.…
renovate-bot Jun 6, 2022
b234f87
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jun 13, 2022
382b208
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Jun 24, 2022
6a33a1d
chore(deps): update dependency com.google.cloud:libraries-bom to v26 …
renovate-bot Jul 11, 2022
ce99b58
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Aug 2, 2022
35d0fbf
chore(deps): update dependency com.google.cloud:libraries-bom to v26.…
renovate-bot Aug 16, 2022
96e1b0c
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Aug 16, 2022
1242ed9
chore(deps): update dependency com.google.cloud:libraries-bom to v26.…
renovate-bot Aug 31, 2022
8dd5f6c
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 20, 2022
b65b264
chore(deps): update dependency com.google.cloud:libraries-bom to v26.…
renovate-bot Sep 20, 2022
80aea62
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 20, 2022
f3b7b6c
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 21, 2022
e701506
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Sep 28, 2022
92e45e7
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 6, 2022
c3908d3
chore(deps): update dependency com.google.cloud:libraries-bom to v26.…
renovate-bot Oct 7, 2022
d6cbfa2
chore(deps): update dependency com.google.cloud:google-cloud-document…
renovate-bot Oct 10, 2022
2d223cf
Merge remote-tracking branch 'migration/main' into java-document-ai-m…
Shabirmean Nov 15, 2022
d2ecb32
chore: post migration updates - groupId, artifact url, repo references
Shabirmean Nov 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions document-ai/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<?xml version='1.0' encoding='UTF-8'?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.documentai</groupId>
<artifactId>documentai-snippets</artifactId>
<packaging>jar</packaging>
<name>Google Document AI Snippets</name>
<url>https://github.com/GoogleCloudPlatform/java-docs-samples/tree/main/document-ai</url>

<!--
The parent pom defines common style checks and testing strategies for our samples.
Removing or replacing it should not affect the execution of the samples in anyway.
-->
<parent>
<groupId>com.google.cloud.samples</groupId>
<artifactId>shared-configuration</artifactId>
<version>1.2.0</version>
</parent>

<properties>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.source>1.8</maven.compiler.source>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>


<!-- [START documentai_install_with_bom] -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>26.1.3</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-document-ai</artifactId>
<version>2.7.5</version>
</dependency>
<!-- [END documentai_install_with_bom] -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.truth</groupId>
<artifactId>truth</artifactId>
<version>1.1.3</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Binary file not shown.
Binary file added document-ai/resources/handwritten_form.pdf
Binary file not shown.
Binary file added document-ai/resources/invoice.pdf
Binary file not shown.
Binary file added document-ai/resources/multi_document.pdf
Binary file not shown.
Binary file added document-ai/resources/us_driver_license.pdf
Binary file not shown.
178 changes: 178 additions & 0 deletions document-ai/src/main/java/documentai/v1/BatchProcessDocument.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
/*
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package documentai.v1;

// [START documentai_batch_process_document]

import com.google.api.gax.longrunning.OperationFuture;
import com.google.api.gax.paging.Page;
import com.google.cloud.documentai.v1.BatchDocumentsInputConfig;
import com.google.cloud.documentai.v1.BatchProcessMetadata;
import com.google.cloud.documentai.v1.BatchProcessRequest;
import com.google.cloud.documentai.v1.BatchProcessResponse;
import com.google.cloud.documentai.v1.Document;
import com.google.cloud.documentai.v1.DocumentOutputConfig;
import com.google.cloud.documentai.v1.DocumentOutputConfig.GcsOutputConfig;
import com.google.cloud.documentai.v1.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1.GcsDocument;
import com.google.cloud.documentai.v1.GcsDocuments;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.Bucket;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.protobuf.util.JsonFormat;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class BatchProcessDocument {
public static void batchProcessDocument()
throws IOException, InterruptedException, TimeoutException, ExecutionException {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String location = "your-project-location"; // Format is "us" or "eu".
String processerId = "your-processor-id";
String outputGcsBucketName = "your-gcs-bucket-name";
String outputGcsPrefix = "PREFIX";
String inputGcsUri = "gs://your-gcs-bucket/path/to/input/file.pdf";
batchProcessDocument(
projectId, location, processerId, inputGcsUri, outputGcsBucketName, outputGcsPrefix);
}

public static void batchProcessDocument(
String projectId,
String location,
String processorId,
String gcsInputUri,
String gcsOutputBucketName,
String gcsOutputUriPrefix)
throws IOException, InterruptedException, TimeoutException, ExecutionException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create()) {
// The full resource name of the processor, e.g.:
// projects/project-id/locations/location/processor/processor-id
// You must create new processors in the Cloud Console first
String name =
String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

GcsDocument gcsDocument =
GcsDocument.newBuilder().setGcsUri(gcsInputUri).setMimeType("application/pdf").build();

GcsDocuments gcsDocuments = GcsDocuments.newBuilder().addDocuments(gcsDocument).build();

BatchDocumentsInputConfig inputConfig =
BatchDocumentsInputConfig.newBuilder().setGcsDocuments(gcsDocuments).build();

String fullGcsPath = String.format("gs://%s/%s/", gcsOutputBucketName, gcsOutputUriPrefix);
GcsOutputConfig gcsOutputConfig = GcsOutputConfig.newBuilder().setGcsUri(fullGcsPath).build();

DocumentOutputConfig documentOutputConfig =
DocumentOutputConfig.newBuilder().setGcsOutputConfig(gcsOutputConfig).build();

// Configure the batch process request.
BatchProcessRequest request =
BatchProcessRequest.newBuilder()
.setName(name)
.setInputDocuments(inputConfig)
.setDocumentOutputConfig(documentOutputConfig)
.build();

OperationFuture<BatchProcessResponse, BatchProcessMetadata> future =
client.batchProcessDocumentsAsync(request);

// Batch process document using a long-running operation.
// You can wait for now, or get results later.
// Note: first request to the service takes longer than subsequent
// requests.
System.out.println("Waiting for operation to complete...");
future.get(240, TimeUnit.SECONDS);

System.out.println("Document processing complete.");

Storage storage = StorageOptions.newBuilder().setProjectId(projectId).build().getService();
Bucket bucket = storage.get(gcsOutputBucketName);

// List all of the files in the Storage bucket.
Page<Blob> blobs = bucket.list(Storage.BlobListOption.prefix(gcsOutputUriPrefix + "/"));
int idx = 0;
for (Blob blob : blobs.iterateAll()) {
if (!blob.isDirectory()) {
System.out.printf("Fetched file #%d\n", ++idx);
// Read the results

// Download and store json data in a temp file.
File tempFile = File.createTempFile("file", ".json");
Blob fileInfo = storage.get(BlobId.of(gcsOutputBucketName, blob.getName()));
fileInfo.downloadTo(tempFile.toPath());

// Parse json file into Document.
FileReader reader = new FileReader(tempFile);
Document.Builder builder = Document.newBuilder();
JsonFormat.parser().merge(reader, builder);

Document document = builder.build();

// Get all of the document text as one big string.
String text = document.getText();

// Read the text recognition output from the processor
System.out.println("The document contains the following paragraphs:");
Document.Page page1 = document.getPages(0);
List<Document.Page.Paragraph> paragraphList = page1.getParagraphsList();
for (Document.Page.Paragraph paragraph : paragraphList) {
String paragraphText = getText(paragraph.getLayout().getTextAnchor(), text);
System.out.printf("Paragraph text:%s\n", paragraphText);
}

// Form parsing provides additional output about
// form-formatted PDFs. You must create a form
// processor in the Cloud Console to see full field details.
System.out.println("The following form key/value pairs were detected:");

for (Document.Page.FormField field : page1.getFormFieldsList()) {
String fieldName = getText(field.getFieldName().getTextAnchor(), text);
String fieldValue = getText(field.getFieldValue().getTextAnchor(), text);

System.out.println("Extracted form fields pair:");
System.out.printf("\t(%s, %s))", fieldName, fieldValue);
}

// Clean up temp file.
tempFile.deleteOnExit();
}
}
}
}

// Extract shards from the text field
private static String getText(Document.TextAnchor textAnchor, String text) {
if (textAnchor.getTextSegmentsList().size() > 0) {
int startIdx = (int) textAnchor.getTextSegments(0).getStartIndex();
int endIdx = (int) textAnchor.getTextSegments(0).getEndIndex();
return text.substring(startIdx, endIdx);
}
return "[NO TEXT]";
}
}
// [END documentai_batch_process_document]
113 changes: 113 additions & 0 deletions document-ai/src/main/java/documentai/v1/ProcessDocument.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
/*
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package documentai.v1;

// [START documentai_process_document]

import com.google.cloud.documentai.v1.Document;
import com.google.cloud.documentai.v1.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1.ProcessRequest;
import com.google.cloud.documentai.v1.ProcessResponse;
import com.google.cloud.documentai.v1.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class ProcessDocument {
public static void processDocument()
throws IOException, InterruptedException, ExecutionException, TimeoutException {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String location = "your-project-location"; // Format is "us" or "eu".
String processerId = "your-processor-id";
String filePath = "path/to/input/file.pdf";
processDocument(projectId, location, processerId, filePath);
}

public static void processDocument(
String projectId, String location, String processorId, String filePath)
throws IOException, InterruptedException, ExecutionException, TimeoutException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create()) {
// The full resource name of the processor, e.g.:
// projects/project-id/locations/location/processor/processor-id
// You must create new processors in the Cloud Console first
String name =
String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

// Read the file.
byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

// Convert the image data to a Buffer and base64 encode it.
ByteString content = ByteString.copyFrom(imageFileData);

RawDocument document =
RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

// Configure the process request.
ProcessRequest request =
ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

// Recognizes text entities in the PDF document
ProcessResponse result = client.processDocument(request);
Document documentResponse = result.getDocument();

// Get all of the document text as one big string
String text = documentResponse.getText();

// Read the text recognition output from the processor
System.out.println("The document contains the following paragraphs:");
Document.Page firstPage = documentResponse.getPages(0);
List<Document.Page.Paragraph> paragraphs = firstPage.getParagraphsList();

for (Document.Page.Paragraph paragraph : paragraphs) {
String paragraphText = getText(paragraph.getLayout().getTextAnchor(), text);
System.out.printf("Paragraph text:\n%s\n", paragraphText);
}

// Form parsing provides additional output about
// form-formatted PDFs. You must create a form
// processor in the Cloud Console to see full field details.
System.out.println("The following form key/value pairs were detected:");

for (Document.Page.FormField field : firstPage.getFormFieldsList()) {
String fieldName = getText(field.getFieldName().getTextAnchor(), text);
String fieldValue = getText(field.getFieldValue().getTextAnchor(), text);

System.out.println("Extracted form fields pair:");
System.out.printf("\t(%s, %s))\n", fieldName, fieldValue);
}
}
}

// Extract shards from the text field
private static String getText(Document.TextAnchor textAnchor, String text) {
if (textAnchor.getTextSegmentsList().size() > 0) {
int startIdx = (int) textAnchor.getTextSegments(0).getStartIndex();
int endIdx = (int) textAnchor.getTextSegments(0).getEndIndex();
return text.substring(startIdx, endIdx);
}
return "[NO TEXT]";
}
}
// [END documentai_process_document]
Loading