Document Identifier in Document Central
Overview
The Document Identifier in Document Central serves to uniquely identify documents and prevent duplicates. It is automatically generated when a document is uploaded and is based on an SHA-512 hash of the Base64-encoded document. This mechanism efficiently determines whether a document already exists in the system.
Generation of the Document Identifier
When uploading a document, its content is converted into a Base64 string and then hashed using the SHA-512 algorithm to create a unique identifier. This identifier is stored in the Document Entry table and enables the following functions:
- Ensuring that each document is uniquely identifiable.
- Detecting and preventing duplicate documents.
- Improving search and retrieval performance by allowing documents to be identified based on their content.
Use of the Document Identifier
Duplicate Prevention
When a new document is uploaded, the system checks whether the generated Document Identifier already exists in the Document Entry table. If it does, the user receives a list of all records in which the document is already present and can then decide whether to upload it anyway.
Display in Document Overviews
The Document Identifier can be displayed in the Document Overviews. A special action allows users to view and verify the hash value of a document.
Document Search
In the Document Search, documents can be specifically searched using their Document Identifier. To do this, the advanced view must be opened, allowing precise filtering based on this identifier. This makes it easier to quickly find documents based on their content, even if metadata or filenames vary.
Benefits
- Increased Data Integrity: Ensures that each document is unique.
- Optimized Storage Usage: Avoiding unnecessary duplicates reduces storage consumption.
- Efficient Search Functionality: Faster document retrieval using the hash value.
- Enhanced Compliance: Provides a consistent method for verifying document authenticity and uniqueness.
By implementing the Document Identifier, Document Central offers a robust solution for identifying and managing documents, improving both efficiency and data quality.
Restrictions
The unique identifier will not work properly with emails and their attachments. When emails are uploaded directly from, e.g., Outlook, the metadata differs for each email, even if the content is the same. Additionally, the metadata of the attachments will vary each time they are extracted by Document Central. This difference in metadata will result in a different hash value for the document.