6 avril 2023
#Gouvernance #Governance #Data
Ontologies, the new ally of your data
Data cataloging companies are increasingly using methods such as ontology, taxonomy, knowledge graph, and thesauri to organize and make sense of the vast amounts of data they process. These methods provide a framework for creating a common vocabulary and structure that can be used to describe and link different data elements, and to discover patterns and insights in the data. In this article, we will first introduce these different methods, then explore the relationships between them and provide examples of their use. Finally, we will highlight the importance of these methods and explain how Dawizz benefits from them and plans to benefit from them in the future. Ontology: terms, relations, and concepts that are making the buzz in data cataloging Ontology is a formal definition of a set of terms used to describe and represent a domain. It contains terms and relationships between those terms, as well as property terms that describe the characteristics and attributes of concepts. An example of an ontology is the Gene Ontology (GO), which is used in biological research to describe genes and their functions. GO contains terms such as "cellular component", "molecular function" and "biological process", as well as relationships between these terms, such as "is_one" and "is_part_of". Taxonomy, or how to bring order to the chaos of data Taxonomy is the science of classification, used to organize concepts in a hierarchical structure. A taxonomy can be domain-specific or general, and it can be used to classify a variety of things, including organisms, documents, or data elements. An example of a taxonomy is the Dewey Decimal Classification System, which is used to classify books in libraries. The Dewey Decimal Classification System contains broad categories such as "000 - Computer, Information and General Works", which are then divided into subcategories such as "020 - Library and Information Science Example: How to classify animals The relationship between ontology, taxonomy and thesauri can be understood through an example in the field of biology. Suppose we build a system that aims to classify and organize different species of animals. First, we can start by creating a taxonomy of animals, which involves grouping animals according to their physical characteristics and evolutionary relationships. For example, we can group mammals, birds, reptiles, fish, and insects into separate categories based on their unique characteristics. This taxonomy provides a basic structure for organizing different species of animals into a hierarchy. Then, we can create a thesaurus, which can be considered an extension of the taxonomy. The thesaurus allows for more detailed descriptions of each species, including their behavioral traits, habitats, and geographic locations. For example, under the category of mammals, we can include various subcategories such as carnivores, herbivores and omnivores. Each of these subcategories can be further subdivided into more specific groups such as primates, rodents and carnivorous mammals. This allows us to more accurately describe and categorize each animal species. Finally, we can use an ontology to formally define concepts and relationships in the field of animal species classification. The ontology provides a standardized vocabulary and structure to describe the different concepts and relationships involved, allowing for a more precise and accurate representation of domain knowledge. For example, we can define the term "mammal" as a class with certain characteristics such as hair, milk production, and live birth, and we can define the relationships between different classes such as "carnivorous mammals" and "herbivorous mammals". This allows us to reason more easily and accurately about the field of animal species classification. By using taxonomy, thesaurus and ontology, Dawizz can benefit from a better organization and classification of its data. Description The key to efficient and secure data management Ontologies can be a powerful tool for data and information management, especially in complex environments. By creating formal models of concepts and relationships, ontologies can help organizations identify and organize data more efficiently, improve data analysis and decision making, and ensure compliance with legal and ethical standards. But did you know that ontologies can also be a key tool for improving data security and privacy? The importance of ontologies in security: By creating ontologies, themes can be identified for sources and servers, which can highlight sensitive sources and provide insight into sensitive servers. By understanding the relationships between these sources and servers, appropriate actions can be taken if necessary. Understanding the interrelationships and connections between entities in a data environment can lead to intelligent exploration and analysis. By using an ontology to map these connections, data analysts can gain insights into complex systems and discover new patterns and relationships. Ontologies can help establish connections between entities and concepts, which can lead to the generation of rules and policies for later verification. By creating rules based on ontological relationships, data management and analysis can become more accurate and efficient. By using ontologies to map the data environment and establish taxonomies, organizations can verify their compliance with regulations such as the GDPR. This can help ensure that data management practices comply with legal and ethical standards, reducing the risk of regulatory penalties and other legal issues. Ontologies are not just for data management and analysis - they can also play a critical role in improving data security and privacy. By using ontologies to classify and protect sensitive data, identify and mitigate security risks, and ensure compliance with legal and ethical standards, organizations can build more secure and resilient data environments. In short, ontology provides a powerful tool for managing and analyzing data accurately and securely. Dawizz and the future of data cataloging Dawizz is an innovative company in the field of data management. Through the use of thesauri, Dawizz can ensure compliance of environments based on general thesauri such as RGPD, or even custom thesauri created and modified by customers. But Dawizz does not stop there. With its research and innovation team, Dawizz is developing a new approach to using ontologies. We know that ontologies can be both an administrator's best friend and worst enemy, as their maintenance is difficult and a small change can cause an avalanche of problems. That's why our team is working on an approach to automatic concept extraction, followed by the creation of taxonomies, thesauri, and then ontologies. This approach will allow our customers to benefit from existing standards and ontologies, but also to automatically create ontologies adapted to their environments and specific needs. With Dawizz, you can be sure that your data is managed efficiently and securely, according to the highest standards. We are always at the forefront of innovation, working on new solutions to facilitate the management of your data. Contact us today to find out how we can help you achieve your data management goals.
25 janvier 2023
MyDataCatalogue trials by DAWIZZ
MyDataCatalogue trials by DAWIZZ MyDataCatalogue in few words MyDataCatalogue is a complete platform for data discovery and global data cataloging (Databases, APIs, and Volumes). The DAWIZZ Trials allow you to discover all the features of the platform for 15 days for free! Why we propose a MyDataCatalogue trial? Dawizz has thought about those who wish or have engaged in a data governance approach. You have a project for global cataloging and data discovery, but you don't know how to differentiate the offers in the market? That is why we offer you the opportunity to try our solution first, so you can make your own opinion. If you are interested in MyDataCatalogue but are not sure that our solution meets your data governance needs, you can request a trial for 15 days. This solution gives you the chance to test the platform for free and see how it can meet your expectations. How is a MyDataCatalogue trial conducted? Description A launch meeting (45 minutes) to define the issues, the scope of the experiment, the deployment prerequisites of the probes and the trial schedule. Provision by Dawizz of a pre-configured environment. Two hours of practical training to learn how to inventory the informational heritage, document and enhance the metadata and data records. A follow-up meeting of one hour at mid-term to share the results of the analysis and exchange on the features used. A 30 minute evaluation to collect your impressions and possibly organize the next steps based on your needs. Dedicated Dawizz Assistance Our project team will be available for you during these 15 days for the service opening, the deployment of the probes and the analysis of the results. You will also be able to discover during the trial, new data governance features, such as data anonymization, data cleaning, and data quality tracking. Also take advantage of online support Finally, be aware that throughout the duration of this trial, direct access to the online support service (accessible from the platform) will allow you to ask questions and directly follow the consideration and responses to your requests at any time. MyDataCatalogue trials
17 novembre 2021
The Data catalog serving cybersecurity
The Data catalog serving cybersecurity For the third consecutive year, Dawizz is mentioned in the French cybersecurity ecosystem radar. Data protection is one of our priorities, so we are proud to appear in the "Data Security" category. This year again, Wavestone and Le Hub BPI conducted qualitative interviews and identified/mapped the most promising startups/scale-ups in the sector. Our data cataloging solution – MyDataCatalogue – is included in one of the 7 categories presented in the radar, namely "Data". Data Network Cybercriminals Risk Management & Compliance Users & their Devices Applications New Technologies Dawizz is indeed specialized in data governance (structured and unstructured data). As the publisher of the MyDataCatalogue software, it helps its users on a daily basis with their confidentiality and data security efforts, system hygiene, and prioritizes system surveillance (in addition to SOC/SIEM) by gaining a better understanding of the sensitivity of data in the information system. MyDataCatalogue is a smart, multilingual solution based on recognition and classification algorithms that automates the knowledge of the data (structured or unstructured) present and manipulated within the information system. WHAT IS THE GOAL OF THE DATA CATALOG IN A CYBERSECURITY APPROACH? We offer a solution that allows our customers to quickly obtain a complete overview of the data in their information system. Our probes automatically extract metadata from database applications, structured files (CSV, excel, txt), and unstructured files (word, pdf, etc.). With the help of "machine learning" and our knowledge base, these metadata (data that characterizes the data) are automatically normalized and classified by our algorithms. The data is then published in MyDataCatalogue with its metadata. Security standards such as ANSSI and CIS (Center for Internet Security), among others, strongly recommend the use of a SOC in the context of a strengthened security policy. SIEMs are handling an increasing amount of data, causing too many alarms, or even false positives, often too late. In addition, the analysis of logs from SIEMs is very time-consuming when analyzing the impacts, due to a lack of functional knowledge of the data manipulated within the applications. Therefore, the complementarity of our automatic data mapping solution with recognition of the sensitivity level (thanks to algorithms) with a SOC approach seems obvious to allow our customers to prioritize the management of logs from the SIEM. WHY IS DATA CATALOGING A DECISION AID FOR CISOs/CIOs? In a context where the growth of data created, exchanged, stored is exponential, a CISO, responsible for defining and ensuring the implementation of the security policy, must be present in a preventive, analytical, and reactive manner. As Alain Bouillé (CISO of the Caisse des Dépôts) indicates, "The European GDPR regulation helps the company to identify personal data, but everything remains to be done for other digital data" For relevance, the Chief Information Security Officer (CISO) must increasingly finely analyze the company's information system (SI) data. Our data cataloging and mapping solution allows CISOs of companies and public institutions to make strategic decisions with confidence, with a risk management approach focused on protection while optimizing performance. To facilitate data analysis by a CISO, dedicated Cybersecurity matchers have been implemented in the solution. Measures have also been taken in the deployment of our probes. Indeed, a dedicated administration interface for our crawlers simplifies the audit of the SI of different servers and computer stations (shadow IT management). Finally, an alert and notification service has been implemented in MyDataCatalogue, allowing a user or third-party solutions (such as a Security Operations Center) to subscribe to our solution and be alerted, for example, on the discovery of new sensitive data in the information system. WHAT ARE THE CYBER RISKS COVERED WITH THE DATA CATALOG? Risk mapping is the main step of any information system security action plan. It aims to define all necessary actions to achieve a residual risk level that can be accepted with full knowledge of the facts, at the right decision-making level. The main challenge for a CISO is to properly prioritize vulnerabilities and not compromise its credibility by launching false alerts that waste time for its colleagues. In order to effectively secure data, before knowing how to protect, one must ask the question: what needs to be protected? and the answer necessarily involves a precise and exhaustive inventory of the data present in the information heritage with their level of sensitivity.
10 février 2021
#Data cleaning #Gouvernance #Sécurité
Why implement a Data Cleaning process?
Why implement a Data Cleaning process? The democratization of office tools, the need to collaborate and analyze on a daily basis, have greatly contributed to the explosion of documents and data storage in files: unstructured data now constitutes an important part of the data heritage. It is therefore necessary to integrate them into the mapping process. But before analyzing them, it is necessary to carry out a data cleaning. What is Data Cleaning? Data Cleaning is a computer process that consists of cleaning data before analyzing it. The goal of Data Cleaning is to identify data that is outdated, incomplete, corrupted or duplicated within an information system. These data are then removed from the data catalog so as not to alter or harm the accuracy of the stored data. An exploding volume of data in the cloud and elsewhere... According to IDC studies, global data volumes are expected to reach 175 zettabytes (a zettabyte is equal to 1 billion terabytes!) by 2025 ... and in parallel, less than 0.5% of these data would be analyzed. Data storage takes place in dedicated office servers, Clouds and personal equipment (computer, external hard drive...). Therefore, it is necessary to regularly clean one's information system in order to: Facilitate its compliance with regulations such as the GDPR (by reducing sensitive sources) Minimize its exposure to cyber risks Limit its environmental impact (digital storage requires servers, data centers, network equipment, ... whose ecological footprint is rather high at present) Tips for implementing a data file cleaning process (Data cleansing / Data Cleaning): Every organization has sensitive data. These may concern its own activity (intellectual property, know-how, etc.) or its customers, administered or users (personal data, contracts, etc.). Sensitizing teams to the risks associated with these data remains the first piece of advice for implementing a good data and cleaning strategy, which could be referred to here as "computer hygiene". Each organization must be able to easily identify risk data. The first lever, the detection phase of obsolete files, is generally the most radical and effective: how many files over 5 years old are really necessary for an organization to function? After this obsolete file deletion step, action prioritization can be done by data sensitivity level. In fact, a classification of files according to their risk level, allows for prioritization, minimization of analysis work, and finally prioritization of cleaning actions. Put in place a security strategy, a security policy in order to limit access to sensitive files (security at the storage level, privilege management) The automation of the process by dedicated tools provides the guarantee of a really applied process. In fact, given the volume, the classification and file analysis work cannot be carried out exhaustively and effectively by humans. It is also advisable to rely on "smart" solutions that implement algorithms and allow for regular audit/monitoring of the information asset. Beyond raising awareness among employees, it is indeed necessary to centralize the identification of sensitive data, and to define classification processes, file cleaning processes, in order to define specific security measures that can cover backup, deletion, logging, access, etc.
19 novembre 2020
#Data Ethic
Where Do We Stand on Data Ethics?
Over the years, we have seen a real boom in digital technology in the context of our work and in our private lives. The volume of data created and exchanged continues to grow. The way in which it is processed has become increasingly elaborate. The value of some particular types of data continues to rise, which highlights its vulnerability even further. Data development is not without problems, and raises its own set of questions. This is the source of the term data ethics. Legislation is changing Current legislation provides a framework for using data. But it is struggling to adapt to a world that is reinvented every day. What with the large-scale application of Artificial Intelligence, IoT, the increasing use of algorithms, things are changing constantly. How can you legislate in such a shifting context, with technology developing at a mind-boggling speed? Therein lies the whole problem. When GDPR, the European data protection regulation, was introduced in 2018, it provided a legal framework on the use of personal data. This was an initial step. However, there are a great number of companies and organizations battling with the reality and difficulties of implementing this legal text. And where does digital society come into all this? Digital society questions the best way to protect the private life of e-citizens and the freedom of internet users. And this despite some glaring contradictions… This is how the idea of data ethics formed. It is everyone’s role to question the moral principles that should be applied to “data management” and particularly the way in which these are put into practice. Could data compliance be the solution? Adopting a data compliance approach might be the first way to address the challenging issue of data ethics, referring to the legal framework, current standards, actively watching technology and its daily application. One thing is certain: Today, an organization that invests in digital confidence and the prevention of cyber risks has a considerable competitive edge and will be preparing a brighter future.