My problem in short is when I run os.walk()
I get an accurate list of files, but when I try getting information about those files like their last modified date, file size or even just try open()
with them, I get an error that the file cannot be found for only some files. Roughly 0.2% for reasons that are unclear.
Background
At work we have a server running Windows Server 2012 R2 (I know, I know..). We are wanting to automate moving targeted shared folders to specific shared drives in Google Drive.
The first thing I wanted to do was get a list of files and their last modified dates and file sizes to be used later. The code I wrote worked fine on my laptop which was running Windows 11, but when I tried pointing it at a few different share folders on the server it ran into the same issue repeatedly.
Troubleshooting
I don't think it's a code issue and have revised my code several times to be simpler while ending up with the same end result - where it works locally but fails to fully go through the share folders.
My first thought was maybe it was due to long path names (the 255 character limit on older systems) but it was successfully finding files whose paths were > 300 characters.
My next thought was maybe there's a clear pattern to what kind of files can't be found, but in a given folder it'd find most of the PDFs successfully but then fail to locate one or a few of the others. This is just an observed example, it is not specific to PDFs.
I've dedicated probably 6-8 hours total into trying to troubleshoot and investigate this but I'm pretty stumped at this point.
Code
do_test.py - uses the hurry.filesize package for rough file sizes
import os
import datetime
from hurry.filesize import size
from pprint import pprint
# Test directory
src = "//[DC]/PATH/TO/FOLDER"
def simple_file_check(src_dir):
total_bytes = 0
total_files = 0
total_folders = 0
total_not_found = 0
files_not_found = []
for (root, dirs, files) in os.walk(src_dir):
# just count files and folders for now
total_files += len(files)
total_folders += len(dirs)
# Get full-path file names
fnames = [os.path.join(root, f).replace("\\","/") for f in files]
# Get their sizes and sum it up
fsizes = []
for f in fnames:
try:
fsizes.append(os.stat(f).st_size)
except Exception as e:
files_not_found.append(f)
total_bytes += sum(fsizes)
total_size = size(total_bytes)
total_not_found += len(files_not_found)
pct_missing = total_not_found/total_not_found+total_files*100
data = {
"ttl-size": total_size,
"ttl-files": total_files,
"ttl-folders": total_folders,
"ttl-not-found": total_not_found,
"pct-missing": "{}%".format(pct_missing)
}
pprint(data)
def time_it_pls(func, *arg):
begin_dt = datetime.datetime.now()
begin = str(begin_dt)[:19]
print("beginning execution at: {}".format(begin))
func(*arg)
end_dt = datetime.datetime.now()
end = str(end_dt)[:19]
print("ending execution at: {}".format(end))
print("time taken: {}".format(end_dt - begin_dt))
time_it_pls(simple_file_check, src)
result
beginning execution at: 2023-06-21 14:50:06
{'pct-missing': '0.19806269922322284%',
'ttl-files': 193878,
'ttl-folders': 18150,
'ttl-not-found': 384,
'ttl-size': '210G'}
ending execution at: 2023-06-21 14:51:11
time taken: 0:01:05.302772
specific error message without the Exception block
Traceback (most recent call last):
File "C:\it_scripts\do_test.py", line 53, in <module>
time_it_pls(simple_file_check, src)
File "C:\it_scripts\do_test.py", line 47, in time_it_pls
func(*arg)
File "C:\it_scripts\do_test.py", line 25, in simple_file_check
fsizes.append(os.stat(f).st_size)
^^^^^^^^^^
FileNotFoundError: [WinError 3] The system cannot find the path specified: '//DC/PATH/TO/FILE'
--edit--
I get a similar error when trying to use open()
against just an individual file like so in the interpreter.
>>> f = "//DC/PATH/TO/FILE" # actual path length is 267 characters long and copied from the exception in the previous example.
>>> d = open(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '//DC/PATH/TO/FILE'
--edit 2--
We are getting closer! Just trying to list the folder in PowerShell, I can see that the file exists but if I try running ls against the individual file I get an error. So it's not python specific and is hinting at something weird on the Windows side.
Here's a much less redacted version of the output and error on the PS-side. Please understand that this does need to be redacted to an extent due to the sensitive nature of these files.
PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\"
Directory: \\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing
Letters\Rejections
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/31/2019 3:06 PM 13025 Samole closing Letter - No DV Simple assault.docx
-a---- 11/21/2018 3:10 PM 16232 Sample Closing Letter-Not a qualifying crime (Sp).dotx
-a---- 7/26/2018 11:32 AM 13581 Sample Closing Letter-RE PC does not qualify a indirect victim.dotx
-a---- 11/21/2018 3:14 PM 12908 Sample Closing Letter-RE U Cert Request Denied.dotx
-a---- 7/9/2018 7:25 PM 13500 Sample Closing Letter-Unqualifying crime.dotx
-a---- 7/26/2018 6:19 PM 12769 Sample Closing Ltr w Copy of File (Sp), Over Income.dotx
-a---- 7/26/2018 1:24 PM 16432 Sample Rejection Letter, unqualifying crime.dotx
PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx"
ls : Cannot find path '\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing
Letters\Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx' because it does not exist.
At line:1 char:1
+ ls "\\DC\#CONSOLIDATION ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (\\DC\...ect victim.dotx:String) [Get-ChildItem], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand