by idoda
Last Updated July 13, 2019 01:26 AM

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

My dataframe, which I load from a CSV file using `read.csv`

, has a column `comments`

, which is empty most of the time.

The column `marked_results.comments`

looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

```
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
....
```

Now I try to drop those entries, **only this works:**

`marked_results.comments.isnull()`

**All these don't work:**

`marked_results.comments.dropna()`

only gives the same column, nothing gets dropped, confusing.`marked_results.comments == NaN`

only gives a series of all`False`

s. Nothing was NaNs... confusing.- likewise
`marked_results.comments == nan`

I also tried:

```
comments_values = marked_results.comments.unique()
array(['VP', 'TEST', nan], dtype=object)
# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!
```

You need to test `NaN`

with `math.isnan()`

function (Or `numpy.isnan`

). NaNs cannot be checked with the equality operator.

```
>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False
```

Help Function ->

```
isnan(...)
isnan(x) -> bool
Check if float x is not a number (NaN).
```

You should use `isnull`

and `notnull`

to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.

Using the Series method `dropna`

on a column won't affect the original dataframe, but do what you want:

```
In [11]: df
Out[11]:
comments
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
In [12]: df.comments.dropna()
Out[12]:
0 VP
1 VP
2 VP
3 TEST
Name: comments, dtype: object
```

The `dropna`

*DataFrame* method has a subset argument (to drop rows which have NaNs in specific columns):

```
In [13]: df.dropna(subset=['comments'])
Out[13]:
comments
0 VP
1 VP
2 VP
3 TEST
In [14]: df = df.dropna(subset=['comments'])
```

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger