Working With Video Intelligence API Detect Labels in Video

You can use the Video Intelligence API to detect labels in a video. Labels describe the video based on its visual content.

Copy the following code into your IPython session:

from google.cloud import videointelligence
from google.cloud.videointelligence import enums, types

def detect_labels(video_uri, mode, segments=None):
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [enums.Feature.LABEL_DETECTION]
config = types.LabelDetectionConfig(label_detection_mode=mode)
context = types.VideoContext(
segments=segments,
label_detection_config=config,
)

print(f'Processing video "{video_uri}"...')
operation = video_client.annotate_video(
input_uri=video_uri,
features=features,
video_context=context,
)
return operation.result()

Take a moment to study the code and see how it uses the annotate_video client library method with the LABEL_DETECTION parameter to analyze a video and detect labels.

Call the function to analyze the first 37 seconds of the video:

video_uri = 'gs://cloudmleap/video/next/JaneGoodall.mp4'
mode = enums.LabelDetectionMode.SHOT_MODE
segment = types.VideoSegment()
segment.start_time_offset.FromSeconds(0)
segment.end_time_offset.FromSeconds(37)

response = detect_labels(video_uri, mode, [segment])

Wait a moment for the video to be processed:

Processing video "gs://cloudmleap/video/next/JaneGoodall.mp4"...

Add this function to print out the labels at the video level:

def print_video_labels(response):
# First result only, as a single video is processed
labels = response.annotation_results[0].segment_label_annotations
sort_by_first_segment_confidence(labels)

print(f' Video labels: {len(labels)} '.center(80, '-'))
for label in labels:
categories = category_entities_to_str(label.category_entities)
for segment in label.segments:
confidence = segment.confidence
start_ms = segment.segment.start_time_offset.ToMilliseconds()
end_ms = segment.segment.end_time_offset.ToMilliseconds()
print(f'{confidence:4.0%}',
f'{start_ms:>7,}',
f'{end_ms:>7,}',
f'{label.entity.description}{categories}',
sep=' | ')

def sort_by_first_segment_confidence(labels):
labels.sort(key=lambda label: label.segments[0].confidence, reverse=True)

def category_entities_to_str(category_entities):
if not category_entities:
return ''
entities = ', '.join([e.description for e in category_entities])
return f' ({entities})'

Call the function:

print_video_labels(response)

You should see something like this:

------------------------------- Video labels: 10 -------------------------------

96% | 0 | 36,960 | nature

74% | 0 | 36,960 | vegetation

59% | 0 | 36,960 | tree (plant)

56% | 0 | 36,960 | forest (geographical feature)

49% | 0 | 36,960 | leaf (plant)

43% | 0 | 36,960 | flora (plant)

38% | 0 | 36,960 | nature reserve (geographical feature)

38% | 0 | 36,960 | woodland (forest)

35% | 0 | 36,960 | water resources (water)

32% | 0 | 36,960 | sunlight (light)

Thanks to these video-level labels, you can understand that the beginning of the video is mostly about nature and vegetation.

Add this function to print out the labels at the shot level:

def print_shot_labels(response):
# First result only, as a single video is processed
labels = response.annotation_results[0].shot_label_annotations
sort_by_first_segment_start_and_reversed_confidence(labels)

print(f' Shot labels: {len(labels)} '.center(80, '-'))
for label in labels:
categories = category_entities_to_str(label.category_entities)
print(f'{label.entity.description}{categories}')
for segment in label.segments:
confidence = segment.confidence
start_ms = segment.segment.start_time_offset.ToMilliseconds()
end_ms = segment.segment.end_time_offset.ToMilliseconds()
print(f' {confidence:4.0%}',
f'{start_ms:>7,}',
f'{end_ms:>7,}',
sep=' | ')

def sort_by_first_segment_start_and_reversed_confidence(labels):
def first_segment_start_and_reversed_confidence(label):
first_segment = label.segments[0]
return (+first_segment.segment.start_time_offset.ToMilliseconds(),
-first_segment.confidence)
labels.sort(key=first_segment_start_and_reversed_confidence)

Call the function:

print_shot_labels(response)

You should see something like this:

------------------------------- Shot labels: 29 --------------------------------

planet (astronomical object)

83% | 0 | 12,880

earth (planet)

53% | 0 | 12,880

water resources (water)

43% | 0 | 12,880

aerial photography (photography)

43% | 0 | 12,880

vegetation

32% | 0 | 12,880

92% | 12,920 | 21,680

83% | 21,720 | 27,880

77% | 27,920 | 31,800

76% | 31,840 | 34,720

...

butterfly (insect, animal)

84% | 34,760 | 36,960

...

Thanks to these shot-level labels, you can understand that the video starts with a shot of a planet (likely Earth), that there's a butterfly in the 34,760..36,960 ms shot,...

Note: You can also request label detection at the frame level with the FRAME_MODE mode (or SHOT_AND_FRAME_MODE for both).

Mr & Mrs Tamilan - Social Blog and Tamil Technical Information Site

Search This Blog

Working With Video Intelligence API Detect Labels in Video

Comments

Post a Comment