Score:0

Python can't find files that are shown in os.walk()

es flag

My problem in short is when I run os.walk() I get an accurate list of files, but when I try getting information about those files like their last modified date, file size or even just try open() with them, I get an error that the file cannot be found for only some files. Roughly 0.2% for reasons that are unclear.

Background

At work we have a server running Windows Server 2012 R2 (I know, I know..). We are wanting to automate moving targeted shared folders to specific shared drives in Google Drive.

The first thing I wanted to do was get a list of files and their last modified dates and file sizes to be used later. The code I wrote worked fine on my laptop which was running Windows 11, but when I tried pointing it at a few different share folders on the server it ran into the same issue repeatedly.

Troubleshooting

I don't think it's a code issue and have revised my code several times to be simpler while ending up with the same end result - where it works locally but fails to fully go through the share folders.

My first thought was maybe it was due to long path names (the 255 character limit on older systems) but it was successfully finding files whose paths were > 300 characters.

My next thought was maybe there's a clear pattern to what kind of files can't be found, but in a given folder it'd find most of the PDFs successfully but then fail to locate one or a few of the others. This is just an observed example, it is not specific to PDFs.

I've dedicated probably 6-8 hours total into trying to troubleshoot and investigate this but I'm pretty stumped at this point.

Code

do_test.py - uses the hurry.filesize package for rough file sizes

import os
import datetime
from hurry.filesize import size
from pprint import pprint

# Test directory
src = "//[DC]/PATH/TO/FOLDER"

def simple_file_check(src_dir):
    total_bytes = 0
    total_files = 0
    total_folders = 0
    total_not_found = 0
    files_not_found = []

    for (root, dirs, files) in os.walk(src_dir):
        # just count files and folders for now
        total_files += len(files)
        total_folders += len(dirs)
        # Get full-path file names
        fnames = [os.path.join(root, f).replace("\\","/") for f in files]

        # Get their sizes and sum it up
        fsizes = []
        for f in fnames:
            try:
                fsizes.append(os.stat(f).st_size)
            except Exception as e:
                files_not_found.append(f)
        total_bytes += sum(fsizes)

    total_size = size(total_bytes)
    total_not_found += len(files_not_found)
    pct_missing = total_not_found/total_not_found+total_files*100

    data = {
        "ttl-size": total_size,
        "ttl-files": total_files,
        "ttl-folders": total_folders,
        "ttl-not-found": total_not_found,
        "pct-missing": "{}%".format(pct_missing)
    }
    pprint(data)

def time_it_pls(func, *arg):
    begin_dt = datetime.datetime.now()
    begin = str(begin_dt)[:19]
    print("beginning execution at: {}".format(begin))
    func(*arg)
    end_dt = datetime.datetime.now()
    end = str(end_dt)[:19]
    print("ending execution at: {}".format(end))
    print("time taken: {}".format(end_dt - begin_dt))

time_it_pls(simple_file_check, src)

result

beginning execution at: 2023-06-21 14:50:06
{'pct-missing': '0.19806269922322284%',
 'ttl-files': 193878,
 'ttl-folders': 18150,
 'ttl-not-found': 384,
 'ttl-size': '210G'}
ending execution at: 2023-06-21 14:51:11
time taken: 0:01:05.302772

specific error message without the Exception block

Traceback (most recent call last):
  File "C:\it_scripts\do_test.py", line 53, in <module>   
    time_it_pls(simple_file_check, src)
  File "C:\it_scripts\do_test.py", line 47, in time_it_pls
    func(*arg)
  File "C:\it_scripts\do_test.py", line 25, in simple_file_check
    fsizes.append(os.stat(f).st_size)
                  ^^^^^^^^^^
FileNotFoundError: [WinError 3] The system cannot find the path specified: '//DC/PATH/TO/FILE'

--edit--

I get a similar error when trying to use open() against just an individual file like so in the interpreter.

>>> f = "//DC/PATH/TO/FILE" # actual path length is 267 characters long and copied from the exception in the previous example.
>>> d = open(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '//DC/PATH/TO/FILE'

--edit 2--

We are getting closer! Just trying to list the folder in PowerShell, I can see that the file exists but if I try running ls against the individual file I get an error. So it's not python specific and is hinting at something weird on the Windows side.

Here's a much less redacted version of the output and error on the PS-side. Please understand that this does need to be redacted to an extent due to the sensitive nature of these files.

PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\"


    Directory: \\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing
    Letters\Rejections


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         5/31/2019   3:06 PM          13025 Samole closing Letter - No DV Simple assault.docx
-a----        11/21/2018   3:10 PM          16232 Sample Closing Letter-Not a qualifying crime (Sp).dotx
-a----         7/26/2018  11:32 AM          13581 Sample Closing Letter-RE PC does not qualify a indirect victim.dotx
-a----        11/21/2018   3:14 PM          12908 Sample Closing Letter-RE U Cert Request Denied.dotx
-a----          7/9/2018   7:25 PM          13500 Sample Closing Letter-Unqualifying crime.dotx
-a----         7/26/2018   6:19 PM          12769 Sample Closing Ltr w Copy of File (Sp), Over Income.dotx
-a----         7/26/2018   1:24 PM          16432 Sample Rejection Letter, unqualifying crime.dotx


PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx"
ls : Cannot find path '\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing 
Letters\Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx' because it does not exist.
At line:1 char:1
+ ls "\\DC\#CONSOLIDATION  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\DC\...ect victim.dotx:String) [Get-ChildItem], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
cn flag
Start with the obvious: does accessing the path locally work as expected.
saniboy avatar
es flag
@GregAskew it does not. It is odd that there were instances where pathnames longer than 260 were able to be accessed as expected, but server 2012 R2 itself doesn't have the registry option for allowing longer pathnames. I am currently in the process of trying to secure a 2019 upgrade for us to see if that resolves the issue.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.