Closed
Description
Consider the following dataframe:
df = pd.DataFrame([[179293473,'2016-06-01 00:00:03.549745','http://www.dr.dk/nyheder/',39169523],[179293473,'2016-06-01 00:04:22.346018','https://www.information.dk/indland/2016/05/hvert-tredje-offer-naar-anmelde-voldtaegt-tide', 39125224],
[179773461, '2016-06-01 22:13:16.588146', 'https://www.google.dk', 31658124],
[179773461, '2016-06-01 22:14:04.059781', 'https://www.google.dk', 31658124],
[179773461, '2016-06-01 22:16:37.230587', np.nan, 31658124],
[179773461, '2016-06-01 22:23:09.847149', 'https://www.google.dk', 32718401],
[179773461, '2016-06-01 22:23:55.158929', np.nan, 32718401],
[179773461, '2016-06-01 22:27:00.857224', np.nan, 32718401]],
columns=['SessionID', 'PageTime', 'ReferrerURL', 'PageID'])
which looks like this:
SessionID | PageTime | ReferrerURL | PageID |
---|---|---|---|
179293473 | 2016-06-01 00:00:03.549745 | http://www.dr.dk/nyheder/ | 39169523 |
179293473 | 2016-06-01 00:04:22.346018 | https://www.information.dk/ | 39125224 |
179773461 | 2016-06-01 22:13:16.588146 | https://www.google.dk | 31658124 |
179773461 | 2016-06-01 22:14:04.059781 | https://www.google.dk | 31658124 |
179773461 | 2016-06-01 22:16:37.230587 | NaN | 31658124 |
179773461 | 2016-06-01 22:23:09.847149 | https://www.google.dk | 32718401 |
179773461 | 2016-06-01 22:23:55.158929 | NaN | 32718401 |
179773461 | 2016-06-01 22:27:00.857224 | NaN | 32718401 |
Run:
df.groupby('SessionID').nth(-1)
Out:
SessionID | PageID | PageTime | ReferrerURL |
---|---|---|---|
179293473 | 39125224 | 2016-06-01 00:04:22.346018 | https://www.information.dk/ |
179773461 | 32718401 | 2016-06-01 22:27:00.857224 | NaN |