If the 'later' nodes do not depend on an output of, or successful completion of, the 'earlier' nodes then the later nodes can be run independently in parallel with the earlier nodes.
However, whether the later nodes are wasting CPU time depends on the existing number of nodes that are running in parallel in your data flow and what else is running on the Analyze server at that time. The nodes in your data flow will 'compete' with nodes in other users' data fllows for the server's resources.
By default the maximum number of nodes that can run in parallel in a data flow is set to four. This limit can be increased by the system administrator by setting a configuraton property per the Help documentation on setting the thread limits. The thread limit is per data flow - if there were four data flows running in parallel on the system then a maximum of 16 nodes would be running at any one time.
For optimal performance of the server you should aim to have a maximum of ~4 threads per CPU core. Above this figure the kernel's scheduler will waste a lot of time context switching between the threads - leading to reduced overal performance.
You should aim to schedule jobs to smooth out the overall load on the server.
Using the newer nodes (Transform, Split, Sort, etc) that are written in Java will also improve performance - especially when you have many nodes that are each dealing with a moderate amount of data. This is because they will then execute 'in-Container' - which considerably reduces the start-up time of the node compared with running the node in it's own process.