how to find missing values in r

R Find Missing Values (six Examples for Data Frame, Column & Vector)

Let's face up it:

Missing values are an issue of almost every raw data fix!

If we don't handle our missing data in an appropriate way, our estimates are probable to be biased.

However, before nosotros tin can bargain with missingness, we demand to identify in which rows and columns the missing values occur.

In the following, I will show you several examples how to notice missing values in R.

Example one: One of the nearly common ways in R to find missing values in a vector

expl_vec1                <-                c(                four,                8,                12, NA,                99,                -                20, NA)                # Create your own instance vector with NA's                is                .                na                (expl_vec1)                # The is.na() function returns a logical vector. The vector is True in instance                # of a missing value and Faux in case of an observed value                which(                is                .                na                (expl_vec1)                )                # The which() function returns the positions with missing values in your vector.                # In our case there are NA'southward at positions 4 & 7                ### [one] 4 7

expl_vec1 <- c(4, 8, 12, NA, 99, - twenty, NA) # Create your ain instance vector with NA's is.na(expl_vec1) # The is.na() role returns a logical vector. The vector is TRUE in instance # of a missing value and False in case of an observed value which(is.na(expl_vec1)) # The which() function returns the positions with missing values in your vector. # In our case there are NA's at positions 4 & 7 ### [1] 4 vii

You can find a more detailed explanation for this example in the following video:

Example ii: Observe missing values in a cavalcade of a data frame

expl_data1                <-                data.                frame                (x1                =                c(NA,                seven,                8,                9,                3                ),                # Numeric variable with ane missing value                x2                =                c(                4,                1, NA, NA,                4                ),                # Numeric variable with two missing values                x3                =                c(                1,                4,                2,                ix,                6                ),                # Numeric variable without any missing values                x4                =                c(                "Howdy",                "I am not NA", NA,                "I beloved R", NA)                )                # Gene variable with                # ii missing values                expl_data1                # This is how our information with missing values looks similar

expl_data1 <- information.frame(x1 = c(NA, 7, 8, 9, 3), # Numeric variable with one missing value x2 = c(iv, 1, NA, NA, 4), # Numeric variable with 2 missing values x3 = c(1, 4, 2, 9, 6), # Numeric variable without any missing values x4 = c("Hello", "I am not NA", NA, "I beloved R", NA)) # Factor variable with # two missing values expl_data1 # This is how our information with missing values looks like

Example Data R Find Missing Values

Tabular array 1: Example Data Frame with Missing Values

which(                is                .                na                (expl_data1$x1)                )                # Same procedure as in Example 1, just this fourth dimension with the column of a data frame;                # Missing value in x1 at position 1                which(                is                .                na                (expl_data1$x2)                )                # Variable x2 has missing values at positions three and 4                which(                is                .                na                (expl_data1$x3)                )                # The variable x3 in column 3 has no missing values                which(                is                .                na                (expl_data1$x4)                )                # Our factor variable x4 in column iv has missing values at positions 3 and v;                # The same procedure can be applied to factors

which(is.na(expl_data1$x1)) # Same procedure as in Instance 1, but this time with the column of a data frame; # Missing value in x1 at position 1 which(is.na(expl_data1$x2)) # Variable x2 has missing values at positions iii and iv which(is.na(expl_data1$x3)) # The variable x3 in column 3 has no missing values which(is.na(expl_data1$x4)) # Our factor variable x4 in column 4 has missing values at positions 3 and five; # The same procedure can be applied to factors

Example iii: Identify missing values in an R data frame

                # As in Example one, you lot tin can create a information frame with logical True and False values;                                # Indicating observed and missing values                is                .                na                (expl_data1)                apply(                is                .                na                (expl_data1),                ii, which)                # In order to become the positions of each column in your data gear up,                # you tin can apply the apply() office

# Equally in Instance one, you lot tin create a information frame with logical TRUE and FALSE values; # Indicating observed and missing values is.na(expl_data1) use(is.na(expl_data1), 2, which) # In order to get the positions of each cavalcade in your data set, # you tin can use the use() function

Case iv: Detect missing values in a column of an R matrix

                # Create matrix on the basis of the first 3 columns of our example data of Example 2                expl_matrix1                <-                as                .                matrix                (expl_data1[                ,                1                :                3                ]                )                expl_matrix1   which(                is                .                na                (expl_matrix1[                ,                1                ]                )                )                # The $ operator is invalid for columns of matrices.                # Therefore nosotros have to select our matrix columns by squared brackets                                which(                is                .                na                (expl_matrix1[                ,                two                ]                )                )                # Abreast the modify from the $ operator to squared brackets,                # nosotros can apply the same functions as in the other examples                which(                is                .                na                (expl_matrix1[                ,                3                ]                )                )                # Once again, no missing values in x3

# Create matrix on the ground of the kickoff 3 columns of our example information of Example 2 expl_matrix1 <- as.matrix(expl_data1[ , 1:three]) expl_matrix1 which(is.na(expl_matrix1[ , 1])) # The $ operator is invalid for columns of matrices. # Therefore we take to select our matrix columns by squared brackets which(is.na(expl_matrix1[ , 2])) # Beside the modify from the $ operator to squared brackets, # we can employ the same functions equally in the other examples which(is.na(expl_matrix1[ , 3])) # Again, no missing values in x3

Example five: Identify NA values in a matrix

                # We can bank check the missing values of the whole matrix with the same procedure every bit in Case 3                use(                is                .                na                (expl_matrix1),                2, which)

# We tin can bank check the missing values of the whole matrix with the same procedure every bit in Example 3 employ(is.na(expl_matrix1), two, which)

Example half-dozen: Find missing values in R with the complete.cases() function

                # An alternative to the is.na() office is the role complete.cases(),                # which searches for observed values instead of missing values                which(complete.                cases                (expl_vec1)                )                # Place observed values (opposite result as in Example i)                which(complete.                cases                (expl_vec1)                ==                Imitation                )                # Reproduce result of Example 1 past calculation == FALSE                complete.                cases                (expl_data1)                # If a data frame or matrix is checked by consummate.case(),                # the function returns a logical vector indicating whether a row is complete

# An alternative to the is.na() role is the part complete.cases(), # which searches for observed values instead of missing values which(complete.cases(expl_vec1)) # Identify observed values (contrary effect as in Case 1) which(complete.cases(expl_vec1) == False) # Reproduce consequence of Example 1 by adding == FALSE complete.cases(expl_data1) # If a information frame or matrix is checked by complete.case(), # the function returns a logical vector indicating whether a row is complete

Video Instance – Detect Missing Values in a Real Information Set

The post-obit video of my YouTube channel shows in a live instance how to find NA, how to count NA, how to omit NA, and how to remove missing values.

Have a look at infinitesimal 1:05.

I'chiliad showing here the aforementioned approach that I have explained in Example i.

R – Count Missing Values per Row and Column

Besides the positioning of your missing data, the question might arise how to count missing values per row, by column, or in a single vector. Let'southward check how to practise this based on our example data above:

                # With the sum() and the is.na() functions you can notice the number of missing values in your data                sum(                is                .                na                (expl_vec1)                )                # 2 missings in our vector                sum(                is                .                na                (expl_data1)                )                # The aforementioned method works for the whole data frame; 5 missings overall                sum(                is                .                na                (expl_matrix1)                )                # The procedure works also for matrices; The NA count is 3 in our case

# With the sum() and the is.na() functions you lot tin can find the number of missing values in your data sum(is.na(expl_vec1)) # 2 missings in our vector sum(is.na(expl_data1)) # The same method works for the whole information frame; Five missings overall sum(is.na(expl_matrix1)) # The procedure works likewise for matrices; The NA count is three in our case

How to Handle Missing Data in R?

Once we found and located missing values and their index positions in our data, the question appears how nosotros should treat these not available values. Complete case data is needed for most data analyses in R!

The default method in the R programming language is listwise deletion, which deletes all rows with missing values in one or more columns.

Basic data manipulations can be washed with the na.omit command or with the is.na R office.

A more sophisticated approach – which is usually preferable to a complete case analysis – is the imputation of missing values.

Very elementary imputation approaches would be mean imputation (mode imputation in case of categorical variables) or the replacement of NA'south with 0.

However, in club to create a more than reasonable consummate data set, missing data imputation usually replaces missing values with estimates that are based on statistical models (due east.g. via regression imputation or predictive mean matching).

Now It's Your Plow

And then that is how I'm checking for missing values in my data sets.

Now I'd similar to hear about your thoughts: What's your favorite approach?

Are you lot going to use the is.na function of Example 1? Or volition yous find NA's past searching for complete cases?

Let me know by leaving a comment beneath. I volition answer to every question!

Appendix

How to create the graphic of the header of this page

The header graphic shows a simple dotplot created with the R package ggplot2.

The nighttime blue values betoken observed values; The lite blue values point missingness.

Since the missing values appear more often in the upper correct role of the plot, they tin not be considered every bit Missing Completely At Random anymore.

                ready                .                seed                (                8765                )                # Reproducability                var1                <-                rnorm(                2000,                10,                three                )                # Normal distribution                var2                <-                var1                +                rnorm(                2000                )                # Correlated normal distribution                range01                <-                function(x)                {                (x                -                min(10)                )                /                (max(x)                -                min(10)                )                }                # Suppress probabilities of missingness betwixt 0 and 1                var2_miss                <-                rbinom(                2000,                one, range01(var1^                3                )                )                ==                1                # Insert missing values for var2 in dependance of var1                data_ggplot_missings                <-                information.                frame                (var1, var2)                # Store var1 and var2 in a data frame                colours                <-                rep(                ane,                2000                )                # Set colours                                colours[var2_miss]                <-                ii                ggplot_missings                <-                ggplot(data_ggplot_missings, aes(x                =                var1, y                =                var2)                )                +                # Create ggplot                geom_point(aes(col                =                colours, size                =                1.1                )                )                +                theme(legend.                position                =                "none"                )

set.seed(8765) # Reproducability var1 <- rnorm(2000, 10, three) # Normal distribution var2 <- var1 + rnorm(2000) # Correlated normal distribution range01 <- function(ten){(x - min(ten)) / (max(x) - min(x))} # Suppress probabilities of missingness between 0 and 1 var2_miss <- rbinom(2000, 1, range01(var1^iii)) == 1 # Insert missing values for var2 in dependance of var1 data_ggplot_missings <- data.frame(var1, var2) # Store var1 and var2 in a data frame colours <- rep(ane, 2000) # Ready colours colours[var2_miss] <- ii ggplot_missings <- ggplot(data_ggplot_missings, aes(ten = var1, y = var2)) + # Create ggplot geom_point(aes(col = colours, size = 1.1)) + theme(legend.position = "none")

Source: https://statisticsglobe.com/r-find-missing-values/

Posted by: engelhardtyourat.blogspot.com