First, I want to thank the Barbados Lands and Surveys Department for this incredible real-world challenge, Zindi Africa for hosting the competition, and all the participants who made this such an exciting competition.
My approach combined Vision-Language Models (VLMs) with deep learning segmentation to extract both geometries and metadata from analog survey plans. The key innovation was using VLMs not just for OCR, but as reasoning engines to solve spatial alignment problems.
Geometry Alignment: The training data already contains geographic coordinate shapes of land parcels, but they need to be aligned to pixel space. I used Qwen3-VL-30B (32B) by providing it with both the full survey plan map and an image of the geo coords polygon shape—the model intelligently found the corresponding pixel locations of the parcel boundaries
Model: Unet++ with EfficientNet-B5 encoder
Novel Approach: Surveyor bias model - learned per-surveyor embeddings to capture individual naming conventions and geometry patterns
Inference: 8-way TTA with IoU-based ensemble selection (picked polygons with highest consistency across augmentations)
Model: Fine-tuned Qwen3-VL-8B using Unsloth LoRA
Image Patchification: Split each image into 7 overlapping crops (1024×1024) to improve VLM focus
Automated Label Correction: Used VLM to audit and fix noisy training labels—the model takes in the raw labels and image, then returns corrected full names, addresses, and other metadata details before fine-tuning
• 8-way TTA with IoU-based ensemble selection (not simple averaging)
• Patchifying images into 7 overlapping crops focused VLM attention on relevant regions
• Douglas-Peucker smoothing on extracted polygons reduced noise while preserving shape
• Mixed precision training (16-bit) for efficient 2048×2048 segmentation
• Surveyor bias embeddings captured domain-specific patterns (middle name conventions, preferred land mappings...)
Git Code Repo - If you found this solution helpful, please consider giving it a ⭐ on GitHub!
Wow. Congratulations @Brainiac this is incredible. Thanks for sharing🙏
Thank you @21db
@Brainiac just released his result. Tears of Joy😭.
I have never in my life heard of Douglas-Peucker.
And for this technique:
How did you go by this? I am finding it confusing and really difficult to understand because I am like 'But How?"😂😂.
Honestly, Congratulations once again big man! Your name indeed depicts the win!
@CodeJoe Thank you! I really appreciate the congratulations! 😄
For the geometry alignment part - Qwen3-VL (and other vision-language models) can return pixel coordinates/bounding boxes of objects in images.
How It Works:
You provide two inputs to the model:
The model can then reason and visually match the shape from the geo coords to the corresponding shape on the cadastral map, extracting the exact pixel boundaries or corners of the parcel of land.
The geographic coordinates already define the exact land parcel we're interested in - just in a different coordinate system. By providing both the cadastral map and the geo shape visualization, Qwen can infer the exact pixel locations through visual correspondence.
Wow, so there wasn't any need to label. Just Wow. I am astonished!
Yeah, Qwen vlms are quite powerful
Thank You SO MUCH for sharing! Truly grateful big man