Working With Video Intelligence API Detect Labels in Video
You can use the Video Intelligence API to detect labels in a video. Labels describe the video based on its visual content.
Copy the following code into your IPython session:
from google.cloud import videointelligencefrom google.cloud.videointelligence import enums, typesdef detect_labels(video_uri, mode, segments=None):video_client = videointelligence.VideoIntelligenceServiceClient()features = [enums.Feature.LABEL_DETECTION]config = types.LabelDetectionConfig(label_detection_mode=mode)context = types.VideoContext(segments=segments,label_detection_config=config,)print(f'Processing video "{video_uri}"...')operation = video_client.annotate_video(input_uri=video_uri,features=features,video_context=context,)return operation.result()
Take a moment to study the code and see how it uses the annotate_video client library method with the LABEL_DETECTION parameter to analyze a video and detect labels.
Call the function to analyze the first 37 seconds of the video:
video_uri = 'gs://cloudmleap/video/next/JaneGoodall.mp4'mode = enums.LabelDetectionMode.SHOT_MODEsegment = types.VideoSegment()segment.start_time_offset.FromSeconds(0)segment.end_time_offset.FromSeconds(37)response = detect_labels(video_uri, mode, [segment])Wait a moment for the video to be processed:Processing video "gs://cloudmleap/video/next/JaneGoodall.mp4"...Add this function to print out the labels at the video level:def print_video_labels(response):# First result only, as a single video is processedlabels = response.annotation_results[0].segment_label_annotationssort_by_first_segment_confidence(labels)print(f' Video labels: {len(labels)} '.center(80, '-'))for label in labels:categories = category_entities_to_str(label.category_entities)for segment in label.segments:confidence = segment.confidencestart_ms = segment.segment.start_time_offset.ToMilliseconds()end_ms = segment.segment.end_time_offset.ToMilliseconds()print(f'{confidence:4.0%}',f'{start_ms:>7,}',f'{end_ms:>7,}',f'{label.entity.description}{categories}',sep=' | ')def sort_by_first_segment_confidence(labels):labels.sort(key=lambda label: label.segments[0].confidence, reverse=True)def category_entities_to_str(category_entities):if not category_entities:return ''entities = ', '.join([e.description for e in category_entities])return f' ({entities})'
Call the function:
print_video_labels(response)
You should see something like this:
------------------------------- Video labels: 10 -------------------------------
96% | 0 | 36,960 | nature
74% | 0 | 36,960 | vegetation
59% | 0 | 36,960 | tree (plant)
56% | 0 | 36,960 | forest (geographical feature)
49% | 0 | 36,960 | leaf (plant)
43% | 0 | 36,960 | flora (plant)
38% | 0 | 36,960 | nature reserve (geographical feature)
38% | 0 | 36,960 | woodland (forest)
35% | 0 | 36,960 | water resources (water)
32% | 0 | 36,960 | sunlight (light)
Thanks to these video-level labels, you can understand that the beginning of the video is mostly about nature and vegetation.
Add this function to print out the labels at the shot level:
def print_shot_labels(response):# First result only, as a single video is processedlabels = response.annotation_results[0].shot_label_annotationssort_by_first_segment_start_and_reversed_confidence(labels)print(f' Shot labels: {len(labels)} '.center(80, '-'))for label in labels:categories = category_entities_to_str(label.category_entities)print(f'{label.entity.description}{categories}')for segment in label.segments:confidence = segment.confidencestart_ms = segment.segment.start_time_offset.ToMilliseconds()end_ms = segment.segment.end_time_offset.ToMilliseconds()print(f' {confidence:4.0%}',f'{start_ms:>7,}',f'{end_ms:>7,}',sep=' | ')def sort_by_first_segment_start_and_reversed_confidence(labels):def first_segment_start_and_reversed_confidence(label):first_segment = label.segments[0]return (+first_segment.segment.start_time_offset.ToMilliseconds(),-first_segment.confidence)labels.sort(key=first_segment_start_and_reversed_confidence)
Call the function:
print_shot_labels(response)
You should see something like this:
------------------------------- Shot labels: 29 --------------------------------
planet (astronomical object)
83% | 0 | 12,880
earth (planet)
53% | 0 | 12,880
water resources (water)
43% | 0 | 12,880
aerial photography (photography)
43% | 0 | 12,880
vegetation
32% | 0 | 12,880
92% | 12,920 | 21,680
83% | 21,720 | 27,880
77% | 27,920 | 31,800
76% | 31,840 | 34,720
...
butterfly (insect, animal)
84% | 34,760 | 36,960
...
Thanks to these shot-level labels, you can understand that the video starts with a shot of a planet (likely Earth), that there's a butterfly in the 34,760..36,960 ms shot,...
Note: You can also request label detection at the frame level with the FRAME_MODE mode (or SHOT_AND_FRAME_MODE for both).
Comments
Post a Comment