Wanted: open data suitable for a data science project!
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys OpenAlex may be too big? https://docs.openalex.org/download-all-data/openalex-snapshot
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys You could use some animal tracking data, e.g. from movebank. We also have drone images of seals if you are interested.
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys UK crash data is good: https://github.com/mszell/fyp2021
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
The chess server LiChess @lichess releases all chess games played on the server under cc0. 7,507,487,928 games so far, released in monthly batches.
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys *mutters incoherently about Sankey diagrams*
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Have you offered them the Open Food Facts data already?
https://world.openfoodfacts.org/
https://prices.openfoodfacts.org/Contributions to the data quality are welcome: https://wiki.openfoodfacts.org/Data_quality
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Here are some links I prepared earlier https://tomstafford.github.io/psy6422/module-project.html#finding-data
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys I've been playing with the BTO bird survey data lately - 1994-2024, several million survey counts, interesting structure, one or two data cleaning challenges, possible research Q's over species distribution change, climate change effects, either over single species or groups. https://zenodo.org/records/18598453
-
@JorisMeys You could use some animal tracking data, e.g. from movebank. We also have drone images of seals if you are interested.
@GeertAarts @JorisMeys there's also the LiveMouseTracker datasets, very detailed movement and interaction data of up to 4 lab mice in SQLite format.
See here for the project's website https://micecraft.org/lmt/,
and here for an example dataset: https://micecraft.org/lmt/download/20180110_validation_4_ind_Experiment_6644_e.sqlite(This project is cool for many other reasons, not least the ikea-style build instructions or the fact that they repurpose Kinect cameras for scientific work as high-res 3D video cameras).
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys In quantum field theory, one has certain regular graphs, each of which evaluates to a real number through some very complicated integral. The task is to predict that number through correlations with properties of the graph (number of cycles, cuts, etc.). For almost 2 million graphs, these numbers are freely available. See last paragraph of https://paulbalduf.com/research/statistics-periods/
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
UK DEFRA has some https://uk-air.defra.gov.uk/data/data-availability
https://sensor.community/en/ has a large Air quality dataset.
There is also an archive of Scottish Datasets.
https://opendata.scot/some of that may be useful.
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
UK Office for National Statistics have lots of data sets which might be relevant for you 🙂
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys use #GBIF species occurrence point-data (@gbif ). Also spatial, and combinable: https://geodaten.bayern.de
-
@JorisMeys I've been playing with the BTO bird survey data lately - 1994-2024, several million survey counts, interesting structure, one or two data cleaning challenges, possible research Q's over species distribution change, climate change effects, either over single species or groups. https://zenodo.org/records/18598453
@JorisMeys This https://b-rowlingson.gitlab.io/n8workshop/Birds/birds.html is the sample analysis I did for a workshop - its very much done for teaching purposes rather than scientific ones, but there's probably a lot of science could come out of it.
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys I always find interesting data at https://www.ipums.org/
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Oh, Pirate Bay is full of training data. If it's good enough (and legal enough) for Meta it should be good enough for the rest of us.
😄
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys Not sure if that's obvious enough but ECMWF offers a number of weather-related datasets. https://www.ecmwf.int/en/forecasts/datasets
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys have you heard of Jeffrey Epstein?
-
Wanted: open data suitable for a data science project!
Every year, we ask our students 2nd bach to do a project. We give them a (big) dataset (preferably in JSON or a bunch of files combined), ask some research question, and ask them to perform data cleaning and exploration related to that question using #Rstats.
I've used most obvious choices, so I'm turning to Fedi to find new, interesting datasets. If you have no idea, sharing helps too!
@JorisMeys the VAST challenge would be perfect for this! https://github.com/vast-challenge
Background: https://datastori.es/data-stories-24-vast-challenge/