The Cloud Vision API lets you understand the content of an image
by encapsulating powerful machine learning models in a simple REST API.
In this lab,
we will send images to the Vision API and see it detect objects, faces, and
landmarks.
Self-paced
environment setup
If you don't already have a Google
Account (Gmail or Google Apps), you must create one. Sign-in
to Google Cloud Platform console (console.cloud.google.com)
and create a new project:
Remember the project ID, a unique name across all Google Cloud
projects (the name above has already been taken and will not work for you,
sorry!). It will be referred to later in this codelab as PROJECT_ID
.
Next, you'll
need to enable billing in
the Cloud Console in order to use Google Cloud resources.
Running
through this codelab shouldn't cost you more than a few dollars, but it could
be more if you decide to use more resources or if you leave them running (see
"cleanup" section at the end of this document).
New users of
Google Cloud Platform are eligible for a $300
free trial.
Enable the Cloud Vision
API
Click on the menu icon in the top left of
the screen.
Select APIs & services from the drop
down and click on Dashboard
Click on Enable APIs and services.
Then, search for "vision" in the
search box. Click on Google
Cloud Vision API:
Click Enable to enable the Cloud Vision API:
Wait for a few seconds for it to enable.
You will see this once it's enabled:
Activate Cloud Shell
Google Cloud Shell is a
command line environment running in the Cloud. This Debian-based
virtual machine is loaded with all the development tools you'll need (gcloud
, bq
, git
and others) and offers a persistent 5GB
home directory. We'll use Cloud Shell to create our request to the Speech API.
To get
started with Cloud Shell, Click on the "Activate Google Cloud
Shell" >_ icon in top right
hand corner of the header bar
A Cloud Shell session opens inside a new frame at
the bottom of the console and displays a command-line prompt. Wait until the
user@project:~$ prompt appears
Create an API Key
Since we'll be using curl to send a request to
the Vision API, we'll need to generate an API key to pass in our request URL.
To create an API key, navigate to the Credentials section of APIs &
services in your Cloud console:
In the drop down menu, select API key:
Next, copy the key you just generated.
Now that you
have an API key, save it to an environment variable to avoid having to insert
the value of your API key in each request. You can do this in Cloud Shell. Be
sure to replace <your_api_key>
with the key you
just copied.
Upload an Image to a
Cloud Storage bucket
Creating a Cloud Storage
bucket
There are two
ways to send an image to the Vision API for image detection: by sending the API
a base64 encoded image string, or passing it the URL of a file stored in Google
Cloud Storage. We'll be using a Cloud Storage URL. We'll create a Google Cloud
Storage bucket to store our images.
Navigate to
the Storage browser in the Cloud console for your project:
Then click Create
bucket. Give your bucket a unique name (such as your Project ID)
and click Create.
Upload an image to your
bucket
Right click
on the following image of donuts, then click Save image as and save it to your Downloads folder as donuts.png.
Navigate to the bucket you just created in the
storage browser and click Upload
files. Then select donuts.png.
You should see the file in your bucket:
Next, edit the permission of the image.
Create your Vision API
request
In your Cloud Shell environment, create a request.json
file with the
code below, making sure to replace my-bucket-name with the name
of the Cloud Storage bucket you created. You can either create the file using
one of your preferred command line editors (nano, vim, emacs) or use the
built-in editor in Cloud Shell (New File
under File
menu):
request.json
{
"requests": [
{
"image": {
"source": {
"gcsImageUri": "gs://my-bucket-name/donuts.png"
}
},
"features": [
{
"type": "LABEL_DETECTION",
"maxResults": 10
}
]
}
]
}
The first Cloud Vision API feature we'll explore
is label detection. This method will return a list of labels (words) of what's
in your image.
Call the Vision API's
label detection method
We're now ready to call the Vision API with curl:
curl -s -X POST -H "Content-Type: application/json"
--data-binary @request.json https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}
Your response should look something like the
following:
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/01dk8s",
"description": "Powdered sugar",
"score": 0.9861496,
"topicality": 0.9861496
},
{
"mid": "/m/01wydv",
"description": "Beignet",
"score": 0.9565117,
"topicality": 0.9565117
},
{
"mid": "/m/02wbm",
"description": "Food",
"score": 0.9424965,
"topicality": 0.9424965
},
{
"mid": "/m/0hnyx",
"description": "Pastry",
"score": 0.8173416,
"topicality": 0.8173416
},
...
]
}
]
}
The API was able to identify the specific type of
donuts these are (beignets), cool! For each label the Vision API found, it
returns a description
with the name
of the item. It also returns a score
, a number from 0 - 100 indicating how confident it is that
the description matches what's in the image. The mid
value maps to the item's mid in Google's Knowledge Graph. You can use the mid
when calling the Knowledge Graph API to get more information on the item.
Web Detection with the
Vision API
In addition to getting labels on what's
in our image, the Vision API can also search the Internet for additional
details on our image. Through the API's webDetection method, we get
a lot of interesting data back:
- A list of entities found in our image, based
on content from pages with similar images
- URLs of exact and partial matching images
found across the web, along with the URLs of those pages
- URLs of similar images, like doing a reverse
image search
To try out web detection, we'll use the
same image of beignets from above so all we need to change is one line in
our request.json file
(you can also venture out into the unknown and use an entirely different
image). Under the features list, just change type from LABEL_DETECTION to WEB_DETECTION. request.json should now look like this:
request.json
{
"requests": [
{
"image": {
"source": {
"gcsImageUri": "gs://my-bucket-name/donuts.png"
}
},
"features": [
{
"type": "WEB_DETECTION",
"maxResults": 10
}
]
}
]
}
To send it to the Vision API, you can use the
same curl command as before (just press the up arrow in Cloud Shell):
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}
Let's dive into the response, starting with webEntities
. Here are some of
the entities this image returned:
"webEntities": [
{
"entityId": "/m/01hyh_",
"score": 0.8859,
"description": "Machine learning"
},
{
"entityId": "/m/0z5n",
"score": 0.4393,
"description": "Application programming interface"
},
{
"entityId": "/m/07kg1sq",
"score": 0.3508,
"description": "Encapsulation"
},
{
"entityId": "/m/02y_9m3",
"score": 0.325,
"description": "Cloud computing"
},
{
"entityId": "/m/01xzx",
"score": 0.2685,
"description": "Computer vision"
},
{
"entityId": "/m/06j0j4",
"score": 0.2481,
"description": "OpenCV"
},
{
"entityId": "/m/0105pbj4",
"score": 0.2401,
"description": "Google Cloud Platform"
},
...
]
This image has been reused in many presentations on our Cloud ML
APIs, which is why the API found the entities "Machine learning",
"Cloud computing", and "Google Cloud Platform".
If we inspect
the URLs under fullMatchingImages
, partialMatchingImages
, and pagesWithMatchingImages
, we'll notice that many
of the URLs point to this codelab site (super meta!).
Let's say we
want to find other images of beignets, but not the exact same images. That's
where the visuallySimilarImages
part of the API
response comes in handy. Here are a few of the visually similar images it
found:
We can navigate to those URLs to see the similar images:
Cool! And now you probably really want a beignet
(sorry). This is similar to searching by an image on Google
Images:
But with Cloud Vision, we can access this
functionality with an easy to use REST API and integrate it into our
applications.
Face and Landmark
Detection with the Vision API
Next we'll explore the face and landmark
detection methods of the Vision API. The face detection method returns data on
faces found in an image, including the emotions of the faces and their location
in the image. Landmark detection can identify common (and obscure) landmarks -
it returns the name of the landmark, its latitude longitude coordinates, and
the location of where the landmark was identified in an image.
Upload a new image
To use these
two new methods, let's upload a new image with faces and landmarks to our Cloud
Storage bucket. Right click on the following image, then click Save image as and save it to your Downloads folder as selfie.png.
Then upload it to your Cloud Storage bucket the same way you did
in the previous step, making sure to edit its permissions and grant read access
to allUsers
(file will appear
as public).
Updating our request
Next, we'll
update our request.json
file to include
the URL of the new image, and to use face and landmark detection instead of
label detection. Be sure to replace my-bucket-name with the name of
our Cloud Storage bucket:
request.json
{
"requests": [
{
"image": {
"source": {
"gcsImageUri": "gs://my-bucket-name/selfie.png"
}
},
"features": [
{
"type": "FACE_DETECTION"
},
{
"type": "LANDMARK_DETECTION"
}
]
}
]
}
Calling the Vision API and parsing the response
Now you're ready to call the Vision API using the same curl
command you used above:
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}
Let's take a look at the faceAnnotations object in our response
first. You'll notice the API returns an object for each face found in the image
- in this case, three. Here's a clipped version of our response:
"faceAnnotations": [
{
"boundingPoly": {
"vertices": [
{
"x": 669,
"y": 324
},
...
]
},
"fdBoundingPoly": {
...
},
"landmarks": [
{
"type": "LEFT_EYE",
"position": {
"x": 692.05646,
"y": 372.95868,
"z": -0.00025268539
}
},
...
],
"rollAngle": 0.21619819,
"panAngle": -23.027969,
"tiltAngle": -1.5531756,
"detectionConfidence": 0.72354823,
"landmarkingConfidence": 0.20047489,
"joyLikelihood": "POSSIBLE",
"sorrowLikelihood": "VERY_UNLIKELY",
"angerLikelihood": "VERY_UNLIKELY",
"surpriseLikelihood": "VERY_UNLIKELY",
"underExposedLikelihood": "VERY_UNLIKELY",
"blurredLikelihood": "VERY_UNLIKELY",
"headwearLikelihood": "VERY_LIKELY"
},
...
]
The boundingPoly gives us the x,y coordinates around the face in
the image. fdBoundingPoly is a smaller box than boundingPoly, focusing on the
skin part of the face. landmarks is an array of objects for each facial feature
(some you may not have even known about!). This tells us the type of landmark,
along with the 3D position of that feature (x,y,z coordinates) where the z
coordinate is the depth. The remaining values give us more details on the face,
including the likelihood of joy, sorrow, anger, and surprise. The object above
is for the person furthest back in the image - you can see he's making kind of
a silly face which explains the joyLikelihood of POSSIBLE.
Next let's look at the landmarkAnnotations part of our response:
"landmarkAnnotations": [
{
"mid": "/m/0c7zy",
"description": "Petra",
"score": 0.5403372,
"boundingPoly": {
"vertices": [
{
"x": 153,
"y": 64
},
...
]
},
"locations": [
{
"latLng": {
"latitude": 30.323975,
"longitude": 35.449361
}
}
]
}
]
Here, the Vision API was able to tell that this picture was taken
in Petra - this is pretty impressive given the visual clues in this image are
minimal. The values in this response should look similar to the
labelAnnotations response above.
We get the mid of the landmark, it's name (description), along
with a confidence score. boundingPoly shows the region in the image where the
landmark was identified. The locations key tells us the latitude longitude
coordinates of this landmark.
Explore other Vision API methods
We've looked at the Vision API's label, face, and landmark
detection methods, but there are others we haven't explored. Dive into the docs
to learn about the other three:
Logo detection: identify common logos and their location in an
image.
Safe search detection: determine whether or not an image contains
explicit content. This is useful for any application with user-generated
content. You can filter images based on different factors: racy, adult,
medical, violent, and spoof content.
Text detection: run OCR to extract text from images. This method
can identify the language of text present in an image and even works on
handwriting.
Object localization: identify multiple objects and their locations
in an image.
Comments
Post a Comment