R Programming Coursera Assignment 1% Solution W/V

I am taking the R programming course from the Data Science Specialization offered by the John Hopkins University on Coursera. This blog post is a personal notes taking where we can follow the reasoning during the exercices.

Today I try to complete the Assignement 1 “Air Pollution” Part 1. We are given a .zip file that contains 332 *.csv files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. Here is my walkthrough.

Part 1 : pollutantmean()

The Part 1 is about writing the pollutantmean(directory, pollutant, id=1:332) function which returns the mean of a specified pollutant out of one or many CSV (requested by id) in the specified directory.

The results should be:

My try :

There are 2 cases: when ID is given for one single monitor, when ID is given for many monitors in a row.

pollutantmean <- function(directory, pollutant, id = 1:332) { files <- list.files(directory, full.names = TRUE) # Case where id indicates 1 file if (length(files[id])==1){ mean(read.csv(files[id])[,pollutant], na.rm=1) } # Case where id indicates many files in a row else { datas <- data.frame() for (i in 1:length(files[id])){ datas <- rbind(datas, read.csv(files[i])) } mean(datas[,pollutant], na.rm=1) } }

Results are:

> pollutantmean("specdata", "sulfate", 1:10) [1] 4.064128 > pollutantmean("specdata", "nitrate", 70:72) [1] 0.8599547 > pollutantmean("specdata", "nitrate", 23) [1] 1.280833

The first and the third requests works but not the second one… The mistake is that the loop is always starting at i=1 instead of the given set (that is why 1:10 returns the right answer, but 70:72 actually returns the result for 1:72). By simply fixing the loop, the results are all right:

## Fixed loop for (i in id){ datas <- rbind(datas, read.csv(files[i])) }> pollutantmean("specdata", "sulfate", 1:10) [1] 4.064128 > pollutantmean("specdata", "nitrate", 70:72) [1] 1.706047 > pollutantmean("specdata", "nitrate", 23) [1] 1.280833

What I try do next is to fix the function to makes it works with disparate ID given. I do :
– Read the monitor files list into the files vector, then binding into the bind23_26 vector files 23 and 26 (it actually adds the 26’s datas just after the 23’s datas into one single data.frame).
– Create a vector containing id=23 and id=26 and requesting them into the pollutantmean() function.

> files <- list.files("specdata", full.names=1) > bind23_26 <- read.csv(files[23]) > bind23_26 <- rbind(bind23_26, read.csv(files[26])) > mean(bind23_26[,"nitrate"], na.rm=1) [1] 4.169054 > v <- c(23,26) > pollutantmean("specdata", "nitrate", v) [1] 4.169054

Surprisingly it works without fixing the loop. I learned that loops can works with (i in c(1, 4, 5, …) ).

Next, I guess I have to fix the results to be shown at 10-3 just like the example, but the assignment asks not to round the values…

Finally, I can erase the case where ID is a single element since for loop can obviously browse a set of 1 number.

## pollutantmean.R pollutantmean <- function(directory, pollutant, id = 1:332) { files <- list.files(directory, full.names = TRUE) datas <- data.frame() for (i in id){ datas <- rbind(datas, read.csv(files[i])) } mean(datas[,pollutant], na.rm=1) }

Part 2 : complete()
Part 3 : corr()

Like this:

LikeLoading...

- Бринкерхофф посмотрел на нее осуждающе.  - Дай парню передохнуть. Ни для кого не было секретом, что Мидж Милкен недолюбливала Тревора Стратмора. Стратмор придумал хитроумный ход, чтобы приспособить Попрыгунчика к нуждам агентства, но его схватили за руку. Несмотря ни на что, АН Б это стоило больших денег.

One thought on “R Programming Coursera Assignment 1% Solution W/V

Leave a Reply

Your email address will not be published. Required fields are marked *