questions • Once data grows beyond ~50 rows, it is challenging to manipulate “by hand” so we want to use tools that are: – scalable (grow with our data) – reproducible (we can repeat them) – accuracy-enabling (reduce human error)
be the most efficient way to store and query data • Good format for collaborative data gathering • In some disciplines, data is commonly stored in databases; useful to be able to access
data thinking • select + filter • split-apply-combine – using a “language” to manipulate data – what we learn today will be revisited in the R lesson tomorrow