Machine Learning Based Security for DC Comics

4 Million

Log messages processed a day

10,000

suspicious requests identified

DC Entertainment is home to the iconic brands DC Comics, Vertigo, and MAD Magazine. Along with chronicling the adventures of Superman, Batman, Green Lantern, Wonder Woman, The Flash, and many more, DC houses the creative division charged with strategically integrating its content across Warner Bros. Entertainment and Time Warner.

Since our initial engagement in 2012 for a multi-site Drupal installation, DC Entertainment and its management team at Warner Bros. have relied on Metal Toad to develop and continually improve upon their entire web ecosystem.

 

Business Problem: Identifying 

DC Entertainment serves up to 2 million requests per day and during events like San Diego Comic Con that number can be up to 4x higher. This increased traffic can adversely affect site performance. To make it worse, DC Entertainment site is routinely crawled by third parties looking for security vulnerabilities or new leaks ahead of announcements. Sorting through this traffic to identify malicious traffic is a monumental task to accomplish at the best of times, and an impossible task in anything near real time.

 

Suspicious Activity Example
Suspicious Activity Example

 

Technology Solution: ML Log evaluation

Metal Toad started by setting up a data pipeline that replicated the manual process Metal Toad had been doing for years. This used a SNS trigger to start a lambda job every time CloudFront’s logs were stored in S3. The Lambda parsed the log and removed the unneeded Columns.  

Next, Metal Toad data scientists looked at the data available, algorithms, and features we would need. They quickly identified a SageMaker algorithm IPInsights as a good solution. The data from several days of logs was parsed into test and training data.

Once we had a trained model, we set up a SageMaker endpoint and updated the Lambda to send the log file to the endpoint, and store the results in DynamoDB for evaluation.

 

 

IP Insights Algorithm
IP Insights Algorithm
Learning to score URL IPv4 pairs
Learning to score URL IPv4 pairs 

 

Impact: Quickly identified new threats

The ML Log Monitoring solution quickly found two groups of IPs for evaluation.

The first set was pretty obviously malicious based on just looking at the query parameters, and could probably have been handled by tuning the WAF on CloudFront better. 

The second find was better. The data scientist originally thought that these second groups may be a false positive. Deeper analysis of the IP addresses found a few gems, including a few WhiteHat scanning companies. 

With this new data, Metal Toad could block problem IP addresses and ensure that DC Entertainment could stay secure for another Comic Con

 

Industry Category: 

Learn more about how we've set our clients up for success by reading our case studies

Have questions?