Sign up for our newsletter! →

Decoupling Security Data with Snowflake

Written By
hanabyte blog, snowflake, haabyte, Patrick Davis

It’s 2023, and somehow we humans have still managed to avoid being overtaken by our technological creations. We have, however, created a monster of a different sort. In today’s ever-evolving digital landscape, we face a constant uphill battle against hidden enemies who pose real, visible, tangible threats to our data and resources–an ever-growing mountain of trivial metadata and non-trivial personal information. Our security teams are increasingly pressed daily to find new ways to protect our data from would-be intruders, bogged down by legacy systems and methodologies for security event analysis that cannot handle the sheer volume, velocity, or complexity of the security data they must ingest and process.

 

Our data systems are constantly bombarded with attempts to gain unauthorized access through brute force attacks, known but unpatched vulnerabilities, zero-day vulnerabilities, and leveraging simple misconfigurations on the part of application administrators (human error). The Open Worldwide Application Security Project (OWASP) places “Broken Access Control,” “Security Misconfiguration,” and “Security Logging and Monitoring Failures” within the Top 10 Web Application Security Risks. Logging and monitoring have been a common thread through many iterations of the top 10 list, so it’s apparent that our legacy solutions are not sufficient for the growing capabilities of our hidden enemies.

 

I know that sounds bleak, but there’s hope on the horizon. By harnessing the power of a security data lake on a data platform like Snowflake, you can leverage near-infinitely scalable compute and storage capacity to change the story. With Snowflake’s ecosystem, you can ingest security data in any format and store it together. Further, you can leverage the scripting and automation capabilities with languages like Java and Python to usher in a new era of security operations where the good guys are ahead for once. And finally, you can leverage the Snowflake platform for all your organization’s data, providing a rich source of context beyond what security events alone can provide.

The Data Challenge

In the past, data resided in on-premise data centers where cybersecurity teams were responsible for all aspects of protection, and we had all means of control from physical to virtual access. In that world, SIEM and legacy cybersecurity tools SOAR’ed. But this is 2023, and the sources of security information are no longer “here at home” but everywhere. Today, there are logs from Tokyo, events from Paris, and accounting trails tracking logins in Virginia. In short, there’s so much security data from all of our AWS, GCP, and Azure environments–not to mention SaaS solutions like Microsoft 365 and Google Workspaces–that the old licensing model and SIEM systems just don’t cut it.

 

We’re constantly fighting a back-and-forth between all the data you need to analyze more quickly vs. the data you can retain. And forklifting a legacy system into a cloud environment doesn’t resolve that problem. Often you’re faced with higher-than-expected storage costs when you must provision more than you need, not to mention the computing costs associated with moving those systems into the cloud. And to top it off, you’re generally limited to syslogs in one format or another and must rely on that shared compute power to normalize all the logs coming in. Between the inefficiency of the SIEM computing environment and the licensing cost, the current state of affairs is unsustainable in the long run.

Enter the Security Data Lake

You need a solution that can take the logs in whatever format they arrive and store them alongside logs of all other formats with no issues. In a security data lake like you can build in Snowflake, that’s what you can do. A security data lake is a centralized repository for ingesting and managing logging or other data sources relevant to an organization’s security posture. When you’re ready to query, or when there’s some actionable intelligence, the data will be normalized and transformed into a usable format. The rapidly scalable nature of Snowflake’s platform allows for elastic compute utilization instead of wasteful spending on excess capacity. It will enable the usage of elastic object storage and usage-based costs instead of this dedicated excess capacity. And instead of disparate sources of security information, it provides a single point of access to all of your security and contextual data–oh yeah, now you have contextual data instead of just security event logs. Contextual data is any data “that provides context to an event, person, or item.” Snowflake’s data lake architecture makes real-time and post-event analysis easier and AI-driven analysis possible. With contextual data alongside your security data, you can easily use AI/ML operations to provide you with only interesting events. With sufficient training, a predictive AI model can even help you shift automated processes away from reactive to predictive, proactive measures.

 

The data is immediately stored in whatever format you need–structured, semi-structured, or unstructured–and processed quickly by scripts written in supported programming languages. With Snowflake’s native scripting capabilities, you can quickly and efficiently perform ETL operations on incoming data. By embracing a security data lake, your organization can break down the barriers between all of your security tools and between your security data and the rest of the organization’s data.

 

You can incorporate contextual data and leverage AI to detect patterns that otherwise would not be apparent in security events alone. Colocating this data unlocks the potential to have the whole picture of your security posture across the organization. With a security data lake built on Snowflake’s platform, you can improve operational efficiency, remove contextual and security data barriers, and realize cost savings. It’s time to leverage ALL your data to give you an edge against threats and bad actors.




Relevant Blogs

hanabyte blog by Otis Thrasher on AI and ML with AWS
Cloud Security

With Great AI Comes Great Responsibility

AI is here to stay. There is no avoiding it; however, there are revolutionary advancements that can be made by leveraging AI to create, automate, and streamline mundane processes with security best practices…

Read More →
Patrick Davis for HanaByte blog on SASE
Automation

How SASE Can Benefit You

Secure Access Service Edge (SASE) is a “cloud architecture model that combines network and security-as-a-service functions together and delivers them as a single cloud service.” (Fortinet Cyberglossary) This solution allows hybrid organizations and their hybrid or remote workers to benefit from corporate security mechanisms anywhere they might be located, securely extending the network edge…

Read More →
Shea Nangle for HanaByte blog on Bill of materials cybersecurity
Cloud Security

Cloud Services Bill of Materials: An Idea Whose Time Has Come

A Cloud Services Bill Of Materials (CSBOM) is a comprehensive listing of each cloud-based asset utilized by a service that you run. For instance, if your company has a SaaS offering, it is very likely that the offering is dependent on a number of services provided by one or more cloud providers…

Read More →