A scalable data warehousing and querying system

VisionHub is the system that allows GroupM to analyze how the advertising campaigns of their agencies are working in the real world, and to help them provide ROI figures to their advertisers. By aggregating the information supplied by different AdServers, in different formats, and building a data warehouse and a querying system on top of it, we’re able to provide the business the insight they need in a fast and easy way. They’re able to take action faster than ever, and be on top of the market at all times.

App screen App screen App screen

Crafting the platform

GroupM had a production system with an aging architecture and a convoluted processing pipeline. Any error in a single file could halt the whole file ingestion process for all the system, and any modification was a cumbersome and dangerous endeavor. The provided way of querying the available information was static, and didn’t provide the required level of detail.

VisionHub works by leveraging the power of the Azure platform, with files being downloaded from an Azure Worker role into Azure Blob Storage, where they are ingested into a Data Warehouse also in Azure Blob Storage. All query operations are managed from an Azure Website UI which relays those queries into one (or more) HDInsight clusters. This ensures the architecture can be easily scaled to suit the business needs at any time.

Projects Insights

VisionHub works with HDInsight by having both permanent clusters to ingest the data into the system as available, or execute simple data extraction queries, and also a cluster on demand system that allows a user to run costly data processing queries that work with vast amounts of data.

Storing all the data in the Azure Blob Storage system lets VisionHub store years of data at a very low cost, and also provides replication and ease of access for the ingestion and processing using HDInsight.


The data ingestion system can be easily modified to suit the changing needs of the business: new rows, schema changes, etc.


The current Data Warehouse holds more than 150 Tb.


Users can run simple data extraction jobs, or massive queries that summarize the information from millions of rows into only a few.


The UI allows the user to execute complex queries by selecting a few key values.