Technology and data integration

Building an enterprise data lake and pipeline

Technology and data integration

Building an enterprise data lake and pipeline

Context

Clear Channel Outdoor is one of the world’s leading Out-of-Home media owners. Across their large, diverse Out-of-Home portfolio of half a million sites in 22 countries throughout Europe, Asia, USA and Latin America, they boost brands by connecting them with the people they want to reach, with media and ideas that enlighten, entertain, charm, challenge and influence. In the UK alone, they operate more than 35,000 sites nationwide, from Inverness in Scotland to Truro in Cornwall and in every major urban area in between.

 

Challenge

Clear Channel has ambitious targets and compliance standards: to make internal data available to the entire organisation to monitor digital campaign execution, enable the Sales and Operations teams to gain insight through data analytics, ensure full transparency and accountability for digital outdoor campaigns, and improve transparency of the brand’s performance to advertisers.

The organisation typically implemented local, independent, often manual solutions to achieve business goals, this fragmented approach resulted in a large number of siloed data ponds. Therefore a new data strategy was launched based on an on-premise environment which aimed to centralise in-house data around a third-party product. Subsequently, the need for a new technology capability was identified following canvassing the different data related needs across countries, departments and core services groups.

The capability needed to provide visibility and timely access to campaign data and allow frictionless introduction of new datasets. Additionally, a new data governance and operating model was required to support the sharing of data across the organisation and with external parties – Buyers, Agencies, Specialists.

 

Solution

The Riverflex team’s solution was to create a data ecosystem – data lake and data pipeline – to provide the Sales and Operations teams access to near real-time campaign performance data. We envisioned a cloud-based platform that would deliver business relevant insight leveraging multiple data-types and scalable compute power. The platform included: a data pipeline that ingests and normalises data into the central data lake repository, discovery through a data catalogue, data access methods that support disparate functional needs, and scalable cloud compute capability for applications and analytics. Azure was chosen as the cloud provider. Riverflex drove the process to build and support the Azure technology foundation, security implementation, continuous integration/continuous delivery (CI/CD) pipeline and create the application stack to support the data lake and ETL ingestion pipeline.

The 10 key elements of the cloud infrastructure, data and compute platform are described below.

 

1. Cloud-based Account / Project Structure

The proper root account / sub-account / project structure was implemented to achieve huge gains in productivity, innovation, and cost reduction as the workload moved to the Azure cloud. There are a variety of services and features that allow for flexible control of cloud computing resources and also of the Azure AD account(s) managing those resources. On the account level, these options are designed to help provide proper cost allocation, agility, and security. A project-based mapping one-to-one to a sub-account structure was implemented. Creating a security relationship between sub-accounts was a key element added to assess the security of cloud-based deployments, centralise security monitoring and management, manage identity and access, and provide audit and compliance monitoring services.

 

2. Project-based Implementation with Infrastructure as Code (IaC)

Infrastructure as Code (IaC) was implemented as a method to provision and manage IT infrastructure through the use of source code, rather than through standard operating procedures and manual processes. IaC helps the DevOps team to automate the infrastructure deployment process in a repeatable, consistent manner. It is also helpful to easily deploy standard infrastructure environments in other regions where the cloud provider operates so they can be used for backup and disaster recovery.

 

3. Serverless Compute and Storage

By employing cloud serverless compute and storage, such as Blob and object stores, the organisation leverages the ability to build and run applications and services and with infinite elasticity without using physical hardware. In addition, all existing costs associated with managing servers and containers (operating system updates, maintenance updates, image snapshots, backups, restarts, etc.) largely disappeared.

 

4. ETL and the Data Pipeline

The data pipeline acts as a utility – a standard suite of data tools that enabled the DevOps team to automate the sourcing, processing, and entitlement of data. Automation of these processes allows data sources to be quickly added and the approach for the cloud data lake then extracted, transformed, combined, validated and loaded (ETL) for further use. The data pipeline is able to simultaneously process multiple data sources at once.

 

5. Enterprise Data Lake

The introduction of an enterprise data lake provided a central data repository and access to analytics tools that maximized the value of the data. The enterprise data lake is a centralized repository that allows storage of structured and unstructured data at any scale. Data can be stored as-is, without having to first structure the data, and run different types of analytics – from dashboards and visualisations to big data processing, real-time analytics, and machine learning to guide better decisions.

 

6. Data Catalog and Discoverability

The introduction of a data catalogue provided a single searchable glossary of data that is available to the organisation, including the data source, definition and entitlements. The data catalogue built on top of the data lake allows users to find the data they need and then use it in the tools that they prefer along with ensuring information boundaries and data contracts are not violated.

 

7. Application Programming Interface

An Application Program Interface (API) is a set of software routines that allow programs to interact. The use of an API at the organisation allowed data from the enterprise data lake to be accessed by upstream applications that rely on it. Additionally, the API enabled end users (both internal & external) access to data for their individual analytics and modelling needs. External parties were able to integrate the APIs with their inhouse sales systems, to better improve programmatic buying and monitoring of campaign delivery.

 

8. Analyse and Visualise Data

Through self-discovery of data resident in the enterprise data lake through the data catalogue, individuals are able to access data based on their entitlements. For low technology use cases, end users are able to upload datasets into Excel or tools such as PowerBI, or alternatively, the API allows data to be integrated with programs coded in Python, Scala, Java, R, etc. For heavy big-data workloads, data engineers are able to use data bricks clusters and Azure synapse to achieve limitless analytics.

 

9. Sandbox for Experimentation

To support innovation, the data lake includes a sandbox environment that provides the functionality of the enterprise data lake but allows one to easily introduce new datasets and technologies for experimentation.

 

10. Flexibility of Architecture

Unlike traditional technical approaches for data warehousing which are inflexible in terms of data schemas, technical capabilities and tools, the cloud-based approach allows flexibility on all of these fronts. Fit for purpose cloud-based data tools and technologies can be incorporated with relative ease as needs get identified.

 

Impact

The creation of an enterprise data lake had substantial benefits. Historically, the organisation’s data was distributed across multiple business teams and systems. Providing access to this data required point-to-point solutions and significant time was spent preparing and reconciling data by the many teams who used it. Additionally, teams were not aware of data that existed within the organisation. The enterprise data lake enabled the organisation’s Sales and Operations teams to capitalise on the value in data, by bringing together internal and external datasets in a single place, in near real-time, as well as eliminating redundant reconciliations by using the same datasets. This value continues to grow as new data sets are added. Easier access to a broad set of data empowers users to innovate the way the organisation sells and manages its digital campaign delivery and the overall performance.

As a direct result of Riverflex’s work, Clear Channel have now been independently audited by professional audit firm PwC. The audit covered Clear Channel’s delivery and reporting of selected UK digital campaigns, meeting advertisers’ requirement for transparency in digital media. It also covered Clear Channel’s delivery and reporting of UK digital campaigns over three months. The audit is part of Clear Channel’s ongoing commitment to ensuring full transparency and accountability for their digital outdoor campaigns.

By leveraging the data catalogue, the internal users can now easily discover, access and perform analytics on hundreds of datasets. After discovery, analysts can access data directly via an API into their environment, or they can leverage sophisticated scalable cloud-based analytics software, such as Spark based services, to perform complex algorithms against multiple datasets. This accelerates the analytics process significantly and increases its use in digital campaign performance.

With the implementation of the Continuous Integration/Continuous Delivery (CI/CD) approach to cloud development and deployment, the organisation can deliver new software features in hours or days instead of months. Smaller code changes are simpler (more atomic) and have fewer unintended consequences. Upgrades introduce smaller units of change and are less disruptive. The products improve rapidly through fast feature introduction and fast turn-around on feature changes. Greater end-user involvement and feedback during continuous development leads to substantial usability improvements. Users can add new requirements based on customer’s needs on a daily basis.

 

Image credit

Let’s talk!

Reach out to one of our Foundry experts to see how we can help you deliver – obligation-free.

Contact us Follow us on
Get access to this success story and learn how we solved a tough technical challenge for our client.

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Riverflex will use your contact details to be in touch with you and to provide support and information on our consulting services.