Yes, applications draw their own widgets and windows, but no more than that. In particular they are not aware of what other applications are drawing on the screen. Each application tells the window manager what pixels it wants to draw on the screen, and then the window manager is responsible for combining those to produce a multi-application display to the user without requiring any direct communication between applications. See https://en.wikipedia.org/wiki/Compositing_window_manager
The other function of the Wayland or X is to direct user input (keyboard and mouse movement/clicks) to the correct application. Without the window manager each application would have to receive all keypresses, mouse movements and clicks and then determine whether that input was directed to itself or to another application. While this could be handled in UI toolkits, it is clearly more efficient (and more secure) to make the calculation once in a lower-level component.
At a lower level X and Wayland provide an abstracted API for programs to work against rather than the low-level of individual Graphics card APIs.
As a bonus the client-server architecture allows us to run graphical applications over SSH.