It’s no secret that the proliferation of artificial intelligence (AI) brings a lot of promise and opportunity to businesses.In fact, recent data show AI adoption has more than doubled since 2017, and that number will continue to grow.
For data teams, AI improves employee self-service so they can tackle more initiatives without requiring specialized knowledge or domain expertise, such as knowledge of complex languages like SQL or Python. It will be. AI also has the potential to significantly impact data security efforts, especially when it comes to streamlining data discovery and data fusion.
Today’s organizations face two major data discovery challenges. First, data is highly diffuse and can often be found in more places than traditional databases. Second, data classification is highly context-dependent. Therefore, teams need to look across the vast landscape and identify not only what data is out there, but how that data is interrelated. AI helps execute complex rules in a repeatable manner, allowing data discovery to continue to scale as data sources grow.
Identify patterns to prioritize threats
Data fusion, or linking information across different systems and classifications, is also a challenge for data security professionals. Finding threats often requires gathering information from a variety of systems, from identity management and cloud storage to event monitoring, VPNs, and access control. Teams must understand each schema and dialect, then synchronize and analyze all this information together, while also compensating for data quality and frequency. So where can AI help here? It can support streamlined search, fusion, and analysis to enhance the entire data security process.
Although it brings many benefits, 71% of IT leaders say generative AI also creates new data security risks. To maximize the benefits of AI, it is important to carefully consider data security as a foundational component of any AI or large-scale language model (LLM) implementation. This is where her four “whats” and “hows” of data security come into play.
- “what” Will the data be used to train the AI model? Teams should start with the training data and identify what sensitive data might be used to develop the AI model. Teams need to sanitize data and invalidate sensitive information. Otherwise, you risk exposing sensitive data or spreading misinformation.
- “how” Is the AI model trained? The “how” context that impacts data sensitivity is often overlooked. Data may seem innocuous until it is combined with other information. When an AI model is trained, these innocuous bits of information are implicitly combined to reveal potentially sensitive details. As a result, teams must identify how data is combined within the model to reduce the effects of sensitivity induced during model training.
- “what” Do controls exist for deployed AI? Even after controlling the data and training the model, the data model itself must be protected.of European Commission proposes AI lawDetails are still being negotiated, but we are proposing restrictions on the use of the model. There are certain activities (such as enforcement and social credit scoring) that involve unacceptable AI risk. Other use cases such as human resources functions are also considered high risk. This means there is a high potential impact on people’s rights, safety and livelihoods. So when it comes to AI, you need to understand why someone would use a model and what security and access controls exist for that model.
- “how” Can you assess the veracity of the output? This last element is important and, if not addressed first, can have a negative impact on society through the spread of misinformation. AI can produce highly reliable results. For example, when someone asks an AI like LLM to summarize obscure science, it does a great job of producing a highly plausible summary, including citations that look legitimate. However, these sources often do not exist and are abridged studies that were never done. Fortunately, access controls help establish the intended scope of a model and restrict activities that push the boundaries of that defined scope.
By prioritizing data security and access controls, organizations can safely harness the power of AI and LLM while protecting against potential risks and ensuring responsible use. These four considerations are interdependent and each serves as a critical step within the data security lifecycle of discovering, securing, and monitoring sensitive data.
Ultimately, this next generation of AI promises a net positive in terms of helping improve security and governance, but only if it’s built in from the beginning. Security teams need to consider these four “whats” and “hows” of data security in AI conversations from the start to maximize the benefits of the technology.
Joe Regensberger Imta, Vice President of Research and Engineering