Kniha Building an Integrated Computer-Vision System Chong Lip Phang

Building an Integrated Computer-Vision System

Detection, Segmentation, Keypoints, Tracking, Class & Part Hierarchies, Scene Graphs, 3D Primitives & Voxels, and Incremental Learning - Served Live to Mobile Browsers

Jazyk: Angličtina
Vazba: Brožovaná
Dostupnost: Očekávané naskladnění
Naskladnění 30. 06. 2026
437
Most computer-vision tutorials hand you a demo: a box drawn around a face, a label slapped on a cat,...

Informace o knize

Jazyk
Angličtina
Vazba
Kniha - Brožovaná
Vydáno
2026
Stránek
380
EAN
9798184414348
Enbook ID
53026442
Hmotnost
459
Rozměry
152 x 229 x 24

Kompletní popis

Most computer-vision tutorials hand you a demo: a box drawn around a face, a label slapped on a cat, and then nothing. The moment you ask a second question - Is that car moving? What is it near? Where is it in 3D? - the demo falls silent.

This book builds a system instead of a demo. From a single camera frame, you'll produce a structured, queryable description of the entire scene - a world model you can interrogate like a small database: the same object keeps its identity across frames, its parts, its properties, its place on the ground, and its relationships to everything else.

Working in Python with today's open-source stack, you'll wire together more than a dozen capabilities into one coherent pipeline:

  • Detection, segmentation, keypoints, and tracking with the latest end-to-end, NMS-free YOLO26
  • Open-vocabulary detection with YOLOE - find anything you can name, no retraining, from a 4,585-class vocabulary
  • Class and part hierarchies, properties, states, and a queryable scene graph
  • Ground-plane estimation, geometric primitives, monocular depth, and voxel reconstruction to lift flat pixels into 3D - including a trainable 6-DoF pose head
  • Facial recognition, OCR, and gait as second-stage, box-confined capabilities
  • A two-phase cascade that stays broad at the top and sharp at the bottom
  • A browser-based annotation studio, custom-model training, incremental and active learning
  • A real FastAPI inference server, live video streaming, and the profiling, packaging, securing, and scaling needed to ship it

Every chapter is grounded in real, runnable open-source code, not pseudocode - with a companion repository, original diagrams, and an architecture designed so you can add a new capability or teach a new class after deployment, without forgetting what the system already knew.

Who it's for: developers, ML engineers, robotics and AR builders, and serious learners who are done running other people's demos and want to construct a perception system of their own - one frame, one world model, fully under your control.