by RedM
Last Updated November 08, 2018 21:22 PM

I've got 2 geodataframes:

```
import geopandas as gpd
from shapely.geometry import Point
gpd1 = gpd.GeoDataFrame([['John',1,Point(1,1)],['Smith',1,Point(2,2)],['Soap',1,Point(0,2)]],columns=['Name','ID','geometry'])
gpd2 = gpd.GeoDataFrame([['Work',Point(0,1.1)],['Shops',Point(2.5,2)],['Home',Point(1,1.1)]],columns=['Place','geometry'])
```

and I want to find the name of the nearest point in gpd2 for each row in gpd1:

```
desired_output =
Name ID geometry Nearest
0 John 1 POINT (1 1) Home
1 Smith 1 POINT (2 2) Shops
2 Soap 1 POINT (0 2) Work
```

I've been trying to get this working using a lambda function:

```
gpd1['Nearest'] = gpd1.apply(lambda row: min_dist(row.geometry,gpd2)['Place'] , axis=1)
```

with

```
def min_dist(point, gpd2):
geoseries = some_function()
return geoseries
```

Figured it out:

```
def min_dist(point, gpd2):
gpd2['Dist'] = gpd2.apply(lambda row: point.distance(row.geometry),axis=1)
geoseries = gpd2.iloc[gpd2['Dist'].argmin()]
return geoseries
```

Of course some criticism is welcome. I'm not a fan of recalculating gpd2['Dist'] for every row of gpd1...

You can directly use the Shapely function Nearest points (the geometries of the GeoSeries are Shapely geometries):

```
from shapely.ops import nearest_points
# unary union of the gpd2 geomtries
pts3 = gpd2.geometry.unary_union
def near(point, pts=pts3):
# find the nearest point and return the corresponding Place value
nearest = gpd2.geometry == nearest_points(point, pts)[1]
return gpd2[nearest].Place.get_values()[0]
gpd1['Nearest'] = gpd1.apply(lambda row: near(row.geometry), axis=1)
gpd1
Name ID geometry Nearest
0 John 1 POINT (1 1) Home
1 Smith 1 POINT (2 2) Shops
2 Soap 1 POINT (0 2) Work
```

Explication

```
for i, row in gpd1.iterrows():
print nearest_points(row.geometry, pts3)[0], nearest_points(row.geometry, pts3)[1]
POINT (1 1) POINT (1 1.1)
POINT (2 2) POINT (2.5 2)
POINT (0 2) POINT (0 1.1)
```

If you have large dataframes, I've found that `scipy`

's cKDTree spatial index `.query`

method returns very fast results for nearest neighbor searches. As it uses a spatial index it's orders of magnitude faster than looping though the dataframe and then finding the minimum of all distances. It is also faster than using shapely's `nearest_points`

with RTree (the spatial index method available via geopandas) because cKDTree allows you to vectorize your search whereas the other method does not.

Here is a helper function that will return the distance and 'Name' of the nearest neighbor in `gpd2`

from each point in `gpd1`

. It assumes both gdfs have a `geometry`

column (of points).

```
from scipy.spatial import cKDTree
def ckdnearest(gdA, gdB, bcol):
nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
btree = cKDTree(nB)
dist, idx = btree.query(nA,k=1)
df = pd.DataFrame.from_dict({'distance': dist.astype(int),
'bcol' : gdB.loc[idx, bcol].values })
return df
```

For your sample dataframes and desired result you'd run:

```
ckdnearest(gpd1, gpd2,'Name')
```

It returns a dataframe with `distance`

and `Name`

columns that you can insert back into `gpd1`

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger