Big Data on The Salopian Scientific Collective

Big Data on The Salopian Scientific Collective https://danielgreenwood.ch/tags/big-data/ Recent content in Big Data on The Salopian Scientific Collective Hugo 0.125.1 en-US Daniel Greenwood Fri, 12 Apr 2024 00:00:00 +0000 Efficiently handle slightly big data with Apache Arrow in R https://danielgreenwood.ch/2024/04/12/efficiently-handle-slightly-big-data-with-apache-arrow-in-r/ Fri, 12 Apr 2024 00:00:00 +0000 https://danielgreenwood.ch/2024/04/12/efficiently-handle-slightly-big-data-with-apache-arrow-in-r/ In systems biology, we often need to work with slightly big data. Not so big to justify setting up a database or using a high-performance cluster, but still a bit too big to comfortably work with in memory. We are talking about files in the 10 to 500 GB range, such as: Omics data like RNAseq or proteomics Single-cell phenotype data from high-content microscopy Large public data repositories, like the Human Cell Atlas The Arrow package for R lets us keep our data set on disk, dynamically loading only the rows and columns needed for our analysis.