Mon. Dec 23rd, 2024
Scaling Data Quality With Computer Vision For Spatial Data

Recent advances in natural language technology, such as generative capabilities for understanding and rendering natural language on demand, have become prominent in many modern artificial intelligence applications. Nevertheless, computer vision applications that manifest object detection, image recognition, and other manifestations are becoming more and more important to enterprises and the billions, if not billions, of spatial mapping data on mobile devices that rely on this technology every day. Equally dire for millions of consumers.

Indeed, the advanced machine learning algorithms that support this use case are implemented on the backend and are not directly accessed by digital map users. However, these promote data quality at scale and Overture Map Foundationa provider of digital mapping data based on interoperable open standards, adds nearly 1 billion buildings to the rapidly growing collection of buildings around the world in its latest digital map dataset.

In December 2023, Overture increased the number of buildings mapped to its dataset to over 2 billion, driven in part by Google’s Open Buildings buildings. According to Marc Prioleau, executive director of the Overture Maps Foundation, integrating a building footprint of this size across multiple sources “requires machine learning.” [involved]”

To achieve this goal, entities must be deduplicated, which is a typical data quality issue. Powered by machine learning technology, Overture Maps adds buildings collected from satellite images, disambiguates, deduplicates, ranks the results, expands with existing building collections, and supports open standards. It was made publicly available through.

Object detection, image recognition

A significant amount of the buildings available in Overture Maps’ latest digital mapping dataset were identified through computer vision applied to satellite imagery. Some of the buildings on our site open building detected and recognized by this technology. Another building supplier, Microsoft Building Footprints, took a similar approach. “Microsoft had all the satellite imagery,” Prioleau pointed out. “They applied artificial intelligence to it. Artificial intelligence looks at the pixels in the image and determines that it’s a road. It’s a field. Those pixels are buildings.”

These machine learning applications need to detect objects and recognize them as various images enumerated by Prioleau. Other data sources included in Overture Maps’ latest dataset include maps of buildings made available by governments and maps “crowdsourced” by individuals. “Machine learning and artificial intelligence automatically created the footprints of the buildings,” Prioleau said, referring to buildings taken from satellite images owned by Microsoft and Google.

Deduplication and data currency

Implementing data quality for these and other sources is essential for a variety of reasons. Obviously, some of the buildings from these sources can be the same and deduplication is required. In other instances, data, particularly data disseminated by individuals mapping neighborhoods, may be unreliable or unreliable. Data currency is another factor, as buildings and objects may have changed since they were last mapped. “So what we did was take all these sources, merge them, and then deduplicate them,” Prioleau explained. “Because it turns out that you mapped buildings into the city’s database and Microsoft captured that as well. So we have to look at them and think about who we can trust the most.” did.”

computer vision Essential for determining overlapping entities in buildings. “The building footprint looks like a small box,” Prioleau commented. “If a building is rectangular, it looks like a small square. So if you have such data in all four datasets, you’ll get different squares that overlap. An exact match is accurate enough. It’s not, but the algorithm looks at that and identifies that all four representations of the building are the same building.”

ranking etc.

The deduplication step is subject to what Prioleau called probabilistic calculations to determine whether particular images are of the same building. In this case, or any other case where different sources map the same building, Overture Maps is responsible for selecting the best or most accurate image. This also involves: data quality. “We found that we trust crowdsourcing first, the government second, Google third, and Microsoft fourth,” Prioleau commented. “That’s what we prioritized. It’s just based on common indicators of data quality.”

However, individual building reviews were still being conducted to determine which buildings would actually be published via Overture Maps, likely due to a redundant results ranking process. “Once we determine that these buildings are all the same building, we select the building that we determine is of the highest quality, highest rank,” Prioleau said. “Then he puts them all together into one building and assigns them a stable identifier.”

continuous development

These days, there’s no shortage of headlines detailing the great advances made in natural language technology. Nevertheless, computer vision remains a highly viable aspect of advanced machine learning for enterprises. Its usefulness for data quality is evident from the use of Overture Maps. This technology could similarly benefit other aspects of the ever-changing data ecosystem.

About the author

Jelani Harper is an editorial consultant serving the information technology market. He specializes in data-driven applications with a focus on semantic technologies, data governance, and analytics.

Sign up for free at insideBIGDATA Newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW