Wanted: open data suitable for a data science project!
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys use #GBIF species occurrence point-data (@gbif ). Also spatial, and combinable: https://geodaten.bayern.de
-
@JorisMeys I've been playing with the BTO bird survey data lately - 1994-2024, several million survey counts, interesting structure, one or two data cleaning challenges, possible research Q's over species distribution change, climate change effects, either over single species or groups. https://zenodo.org/records/18598453
@JorisMeys This https://b-rowlingson.gitlab.io/n8workshop/Birds/birds.html is the sample analysis I did for a workshop - its very much done for teaching purposes rather than scientific ones, but there's probably a lot of science could come out of it.
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys I always find interesting data at https://www.ipums.org/
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Oh, Pirate Bay is full of training data. If it's good enough (and legal enough) for Meta it should be good enough for the rest of us.
😄
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Not sure if that's obvious enough but ECMWF offers a number of weather-related datasets. https://www.ecmwf.int/en/forecasts/datasets
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys have you heard of Jeffrey Epstein?
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys the VAST challenge would be perfect for this! https://github.com/vast-challenge
Background: https://datastori.es/data-stories-24-vast-challenge/
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys It might be an obvious choice, but the UK Data Service hosts many datasets. A minority of the datasets are open, and they tend to be pretty clean already, but the massive scale of some of them makes them interesting.
-
@JorisMeys It might be an obvious choice, but the UK Data Service hosts many datasets. A minority of the datasets are open, and they tend to be pretty clean already, but the massive scale of some of them makes them interesting.
@JorisMeys For example, the 2019 World Risk Poll is a global survey of fears and attitudes towards risk. It has lots of demographic detail too so you can simulate the effect of different sampling strategies.
It's a 70 MB plaintext spreadsheet with 150,000 rows. It's large enough that in my last visit to the Apple Store, I could compare devices by manually timing how long it took to open the file. (It took 7 seconds on anything with an M4 chip; 5 seconds with an M5.)
https://datacatalogue.ukdataservice.ac.uk/studies/study/8739#details
-
@adenoz Thanks, I didn't know that one. Very interesting, and indeed the kind of data structure I am looking for. It's a bit far from their major (it's students Bio-engineering), so I might opt for another dataset closer to that if I find one. But this one is definitely flagged and stored for future use.
@JorisMeys Students in bio-engineering might enjoy finding out facts about macromolecular structures from the PDB.
Both the RCSB PDB and PDBe portals offer APIs to query the meta-data about structures deposited in the PDB:
https://www.rcsb.org/docs/programmatic-access/web-apis-overview
https://www.ebi.ac.uk/pdbe/pdbe-rest-api
The very basics let you replicate the online dashboards on these portals, showing number of entries deposited or released per year. But the APIs give access to every piece of meta-data in there, so you can really ask sophisticated questions. I played a bit with it a while ago, see some examples here: https://guillawme.github.io/insights-from-the-pdb/
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys @genenetwork How big is big? For biology/genetics the GeneNetwork repository (https://genenetwork.org) has a lot of whole-transcriptome datasets. Each is not huge but some interesting analyses could also be done by combining them in creative ways. I’ve used this resource a lot for bioinformatics training.
Most of the datasets can be downloaded as flatfiles and the API returns JSON. -
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys as other folks chiming in with their national data portals are doing, here are 3‘100 OGD datasets from Switzerland with a JSON format: https://opendata.swiss/en/dataset?q=&res_format=JSON&sort=max%28res_latest_issued%2C+res_latest_modified%29+desc
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys You could use Crossref metadata. Point them to api.crossref.org and explore millions of records about different scholarly content
-
undefined oblomov@sociale.network shared this topic