https://archive.ics.uci.edu/ml/datasets.php
https://www.nber.org/research/data?page=1&perPage=50
https://github.com/awesomedata/awesome-public-datasets
https://data.world/datasets/machine-learning
https://data.noaa.gov/datasetsearch/
https://www.usgs.gov/products/data
https://www.fema.gov/about/openfema/data-sets
etc...
And of course don't ignore the data you can collect yourself one way or another. A few cheap Arduino Nano or Rpi Pico boards, some sensors, and you can build quite a variety of distributed data collection systems. Use solar panels for power in remote areas, and 4G / cellular data networks and you can get data from all over the place. You can also use a cheap SDR "dongle" to pull down data from various weather satellites and other sources. And don't forget about the API's / data export mechanisms for apps you might use like Fitbit, Strava, MapMyRun, etc.
In a few applications of ML that I've worked with, there is no need for an outside dataset because the program generates it's own data. For example, the data could come from a simulation of some process.