In our presentation we talked about using different APIs like IBM'S Watson, Microsoft's Azure, and Google's Cloud Vision, to quickly add Artificial Intelligence to our applications. We want to introduce you to the Google Cloud API, which will allow you to get really cool, invaluable insight from images.
The Google Cloud Vision API allows us to understand the contents of an image. You can use it for label detection, logo detection, explicit content detection, face detection, image attributes, web detection, and more. But today, we are going to use the Google Cloud Vision API to identify an uploaded image using label detection (tagging an image with attributes and text recognition (OCR).
Click here to obtain your free Google Cloud coupon (thanks Tim!). Google Cloud requires a billing account with a valid credit card, but with this coupon, we'll bypass that entirely.
Enter your Dartmouth email and name. Then go to your email to verify your account.
After verifying your email, you should receive another email with your coupon code. Click the link they provided. You might have to sign in to your Google account if you aren’t signed in already.
Now, create a new project in your google account by clicking here.
Go to this link to see your billing accounts.
If you only have one billing account, move on to the ENABLE API section.
If you have multiple billing accounts, do the following to set up the project correctly:
-
Go to the Google Cloud Platform Console and open the console left side menu and select
Billing
. -
If you have more than one billing account, you'll be prompted to select
Go to linked billing account
to manage the current project's billing. -
Under Projects linked to this billing account, locate the name of the project that you want to change billing for, and then click the menu next to it (3 dots).
Our project is called demo, but feel free to call your project anything you want!
Select Change billing account
, then choose the desired destination for the billing account and click Set account
.
Click here to enable the API for your project. The page should look like this:
Click select a project
at the top and choose your project name. Then click Open
.
You should be at a page that looks like this.
If you don't have the sidebar on the left, click the three horizontal menu bars on the top left.
From this sidebar, click APIs & Services
and then click Dashboard
.
You should be taken to a page that looks like this.
Scroll to the bottom that has a list of APIs and click Cloud Vision API
.
If Cloud Vision is not enabled, you should be taken to a page that looks like this.
Click Enable
at the top.
If Cloud Vision is already enabled, you would just see a disable option. So do nothing.
If you haven't already, fork this repo to get our starter code.
To use this client library, you must first authenticate. Complete these steps:
Go to the Create service account key page in the GCP Console: Service account key page
Make sure your project name is showing at the top and it is the right project (ours is called demo).
- if it doesn't work select
Credentials
from the left sidebar underAPIs & Services
- select the
Create Credentials
dropdown menu - click
Service Account Key
From the Service account drop-down list, select New service account
.
Enter a name into the Service account name field.
From the Role drop-down list, select Project > Owner
.
Click Create
. A JSON file that contains your key will download to your computer. Save the file in your forked repo root directory as key.json
.
Check then npm version you have installed currently using node -v
If you don't have version 9.8.0
you need to rollback your version. If you don't, the demo won't work.
Run the following in your terminal:
npm install --global [email protected]
cd
into the root directory of the repo and run npm install
to install all of the package dependencies. Don't run yarn
. Running yarn
will overwrite some of the packages needed for this demo to work.
9.8.0
of node. Otherwise, running npm install
will cause a lot of nasty errors
This is going to be a mini React App that takes in an image and returns a label describing what is in the image (image recognition). The React app runs an Express server to send POST
requests to the Google API server. As a reminder, a POST
request requests that a web server accepts the data enclosed in the body of the request message in order to do something with that data. In our case, this data is an image. For example, say we upload the following image:
The Google API server will return a JSON file that looks something like this:
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/0bt9lr",
"description": "dog",
"score": 0.97346616
},
{
"mid": "/m/09686",
"description": "vertebrate",
"score": 0.85700572
},
{
"mid": "/m/01pm38",
"description": "clumber spaniel",
"score": 0.84881884
},
{
"mid": "/m/04rky",
"description": "mammal",
"score": 0.847575
},
{
"mid": "/m/02wbgd",
"description": "english cocker spaniel",
"score": 0.75829375
}
]
}
]
}
containing the identification tags for the image sent.
But wait, what's Express?! 😱 Sneak peek to the short assignment coming out today: Express is a server-side JavaScript framework that adds another dimension to the layers of web development that we've already learned. It’s a lightweight framework that gives developers extra, built-in web application features.
In our case, the Express server is used to facilitate the client-server interaction between our app and the Google Cloud Vision API.
Without further ado, let's get started!
- Don't forget to run
npm install
! In thesrc
directory, create a file titledindex.js
and copy and paste the following code into the file:
import React from 'react';
import ReactDOM from 'react-dom';
import './index.css';
import App from './App';
ReactDOM.render(<App />, document.getElementById('root'));
You should be a React expert by now, so you'll know that this file takes in an App
component and renders the HTML element with the #root
id with that component. We could put all of the React code into this file and the app would run just fine; however, as we've learned, it's better to include the App
component in a separate file and render it from there because it keeps the app more organized.
- have
.eslintrc
errors? try runningnpm i eslint-config-airbnb
- Also in the
src
directory, create anApp.js
file and copy and paste the following code into the file:
import React, { Component } from "react";
import "./App.css";
class App extends Component {
constructor(props) {
super(props);
this.state = {
image: "",
imageT:"",
identification: "",
identificationT:""
};
}
render() {
return (
<div>
Google Visiony things will be happening here soon
</div>
);
}
}
export default App;
No further explanation needed here – at this point, you should be able to create a new React component in your sleep 😴. Now, open your terminal and run npm start
. A new tab should open in your browser with the app. Ooooh, magic 💥
(jk, it's actually an npm
package called react-scripts
)
- Now, make a new directory called
server
in the root directory and in it, a new file calledserver.js
. This is where our mini-Express server will go. Copy and paste the following code into the file:
const express = require("express");
const app = express();
const bodyParser = require("body-parser");
const multer = require("multer");
const path = require("path");
const storage = multer.memoryStorage();
const upload = multer({ storage: storage });
const vision = require("@google-cloud/vision");
const client = new vision.ImageAnnotatorClient({
keyFilename: '../key.json',
project_id: "YOUR-PROJECT-ID"
});
app.use(bodyParser.json());
app.post("/upload", upload.single("myImage"), function(req, res) {
client
.webDetection(req.file.buffer)
.then(results => {
console.log(results[0].textAnnotations);
const webDetection = results[0].webDetection;
var identification = { breed: webDetection.webEntities[0].description };
console.log(webDetection.webEntities[0].description);
res.json(webDetection.webEntities[0].description);
})
.catch(err => {
console.error("ERROR:", err);
});
});
app.post("/uploadText", upload.single("myImage"), function(req, res) {
client
.textDetection(req.file.buffer)
.then(results => {
console.log(results);
const textDetection = results[0].fullTextAnnotation;
var identificationT = { breed: textDetection.text};
console.log(textDetection.text);
res.json(textDetection.text);
})
.catch(err => {
console.error("ERROR:", err);
});
});
app.listen(3001);
Woah, that's a lot of code! Make sure you have the 🔑 key.json
file in your root directory and that you replace YOUR-PROJECT-ID
with the project id of the project you created earlier.You can find it in your key.json
file under the project_id
field.
Now, let's understand what this code is doing.
Each function here creates an Express server that handles all of the POST requests to the Google Vision API via the route /upload
and /uploadText
. The POST
request is uploading something called myImage
, which will contain the data of the images you'll be uploading to the app later. The request returns a response JSON that is parsed and any errors are handled when necessary. To start this server, we'll run node server.js
later, and the server will listen at port 3001
for any changes. There are two POST
functions to handle the requests for two different features: label detection and OCR.
- Now, go back to your
App.js
file and replace the code in the return statement of your render function with this:
<div>
<header>
<h1>Google Cloud Vision Image Identifier</h1>
</header>
<div className="uploadForm">
<h3>Upload Image</h3>
<input
type="file"
/>
<button className="button" onClick={this.postImg}>Click to identify</button>
</div>
<h3 style={{ marginTop: 50 }}>{this.state.identification}</h3>
<div className="uploadForm">
<h3 style={{ marginTop: 50 }}> Upload text Image</h3>
<input
type="file"
/>
<button className="button" onClick={this.postImgText}>Click to identify</button>
</div>
<h3 style={{ marginTop: 50, marginBottom: 20 }}>{this.state.identificationT}</h3>
</div>
These are just some basic HTML components to prepare to allow you to upload photos. Nothing too complicated. You can refresh your browser tab to see the changes if they're not getting displayed.
- Above the render function, add the following functions:
postImg = () => {
var formData = new FormData();
formData.append("myImage", this.state.image);
fetch("/upload", {
method: "POST",
body: formData
})
.then(response => {
if(response.status === 200) {return response.json();}
else {
console.log(response);
return { error: 'there was an error with response' }
}
}).then(response => {
if(response.error) { console.log(response); }
else {
this.setState({ identification: response });
}
});
};
postImgText = () => {
var formData = new FormData();
formData.append("myImage", this.state.imageT);
fetch("/uploadText", {
method: "POST",
body: formData
})
.then(response => {
if(response.status === 200) {return response.json();}
else {
console.log(response);
return { error: 'there was an error with response' }
}
}).then(response => {
if(response.error) { console.log(response); }
else {
this.setState({ identificationT: response });
}
});
};
Uh oh, more HTTP requests 😰. This function actually isn't that complicated when you take another look at it. Can you guess what it does? postImg
and postImgText
would be presumably called on some action and they take in the current state of the uploaded image, convert it to an HTTP request-friendly format, and send it via the response. The functions then handle the response, either updating the state identification
or console logging the error.
- Now lets go back to the render function. You'll notice that the calls to change the state of
image
andimageT
aren't defined. Can you figure out how to add them in?
-
Hint 1: when should the state change be called?
-
Hint 2: set the state of
image
andimageT
toe.target.files[0]
.
If you can't figure it out, scroll to the bottom of the README for the answer 😅.
Great, now you should have all the code you'll need to run the app!
Run npm start
again in the root directory of your repo or refresh your browser tag. This is what you should see:
Now, open a second tab in your terminal and cd
into the server
directory. Once there, run node server.js
. This will run the server concurrently with your web app. Remember, the server handles all of your requests to the Google Vision API and needs to run in order for your images to be processed correctly.
Navigate back to the browser tab and click on the first button. Upload any image (if you need an image, feel free to use the image of the dog we included earlier – it's in the Rimg
directory in the repo). Then, click Click to identify
. Voilà, after a few seconds, a tag will appear below the input fields, effectively identifying your image! This is the output you should see in your terminal running the Express server:
The words that show up are the labels the Google Vision API is assigning to your image. Play around with different images and the other button that handles images with text (OCR) to get a feel for the API.
We've already added two buttons for you: Label Detection and OCR. Add another button for face recognition and get it to work for additional bonus points! (For hints, look at the server.js
code again and check out the Google Vision API source code for reference).
- React recap
- Implementing Google Cloud Vision API
- Intro to APIs
- Intro to Express
- The URL to your forked repo
- A screen shot of the face recognition button working for you if you attempted the extra credit
- This tutorial was based off of this repo created by
brianhallerweb
on GitHub.
<input type="file" onChange={e => this.setState({ image: e.target.files[0] })} />
<input type="file" onChange={e => this.setState({ imageT: e.target.files[0] })} />