Content Hashed Duplicate File Report
Summary
This report utilizes the new added file content hash feature of version 4.0. The 4.0 scan policy definition gives a new option to Generate file content hashes for All Files or Files uploaded since the last scan. This option prompts the AgentFS to generate a SHA256 hash of the file content and store the hash in the database where it can be compared against other files that match in content.
Details
Line 10 contains the paths to be reported against. Modify these paths appropriate to your environment.
The fullpath_hash is a hash of the full path string stored for quick comparison. In SQL you can reference single or multiple paths using the = or IN operand respectively.
The path_hash() function can be used to query inline the fullpath_hash for the desired path(s).
WITH root_path(fullpath, fullpath_hash, ns_left, ns_right, path_type, scan_id) AS (SELECT sd.fullpath, sd.fullpath_hash, sd.ns_left, sd.ns_right, sd.path_type, sd.scan_id FROM srs.scan_data AS sd INNER JOIN srs.scans AS s ON s.id = sd.scan_id WHERE (sd.fullpath_hash IN (srs.path_hash('\\srs-m1.sp.cctec.org\Shares\Atlanta\Employees'),srs.path_hash('\\srs-m1.sp.cctec.org\Shares2\mp'))) AND (sd.path_type = 2) AND (s.progress_status = 4) AND (s.scan_type = 1)), q(fullpath, size, create_time, modify_time, access_time, name, item_count, total_hash_size, content_hash) AS (SELECT sd.fullpath, sd.size, sd.create_time, sd.modify_time, sd.access_time, sd.name, COUNT(*) OVER (PARTITION BY sd.content_hash) AS item_count, Sum(sd.size) OVER (PARTITION BY sd.content_hash) AS total_hash_size, srs.bytes_to_hex_string(sd.content_hash) as content_hash FROM srs.scan_data AS sd INNER JOIN root_path AS rp ON (rp.scan_id = sd.scan_id) AND (rp.ns_left <= sd.ns_left) AND (rp.ns_right >= sd.ns_right) WHERE sd.path_type = 1) SELECT q.fullpath, q.item_count, srs.byte_string(q.size) AS size_string, srs.byte_string(CAST(q.total_hash_size AS BIGINT)) AS total_size_string, (q.total_hash_size - q.size) AS wasted_space, srs.byte_string(CAST((q.total_hash_size - q.size) AS BIGINT)) AS wasted_space_string, q.total_hash_size, q.size, q.content_hash FROM q WHERE (q.item_count >= 2) AND (q.size > 1) ORDER BY q.fullpath
Attachment | Size |
---|---|
![]() | 4.17 KB |
Attachment | Size |
---|---|
![]() | 291.53 KB |