@@ -357,6 +357,142 @@ takes a list of columns to sort by.
357
357
tips = tips.sort_values([' sex' , ' total_bill' ])
358
358
tips.head()
359
359
360
+
361
+ String Processing
362
+ -----------------
363
+
364
+ Length
365
+ ~~~~~~
366
+
367
+ SAS determines the length of a character string with the ``LENGTHN ``
368
+ and ``LENGTHC `` functions. ``LENGTHN `` excludes trailing blanks and
369
+ ``LENGTHC `` includes trailing blanks.
370
+
371
+ .. code-block :: none
372
+
373
+ data _null_;
374
+ set tips;
375
+ put(LENGTHN(time));
376
+ put(LENGTHC(time));
377
+ run;
378
+
379
+ Python determines the length of a character string with the ``len `` function.
380
+ ``len `` includes trailing blanks. Use ``len `` and ``rstrip `` to exclude
381
+ trailing blanks.
382
+
383
+ .. code-block :: none
384
+
385
+ tips['time'].str.len()
386
+ tips['time'].str.rstrip().str.len()
387
+
388
+
389
+ Find
390
+ ~~~~
391
+
392
+ SAS determines the position of a character in a string with the
393
+ ``FINDW `` function. ``FINDW `` takes the string defined by
394
+ the first argument and searches for the first position of the substring
395
+ you supply as the second argument.
396
+
397
+ .. code-block :: none
398
+
399
+ data _null_;
400
+ set tips;
401
+ put(FINDW(sex,'ALE'));
402
+ run;
403
+
404
+ Python determines the position of a character in a string with the
405
+ ``find `` function. ``find `` searches for the first position of the
406
+ substring. If the substring is found, the function returns its
407
+ position. Keep in mind that Python indexes are zero-based and
408
+ the function will return -1 if it fails to find the substring.
409
+
410
+ .. code-block :: none
411
+
412
+ tips['sex'].str.find("ALE")
413
+
414
+
415
+ Substring
416
+ ~~~~~~~~~
417
+
418
+ SAS extracts a substring from a string based on its position
419
+ with the ``SUBSTR `` function.
420
+
421
+ .. code-block :: none
422
+
423
+ data _null_;
424
+ set tips;
425
+ put(substr(sex,1,1));
426
+ run;
427
+
428
+ In Python, you can use ``[] `` notation to extract a substring
429
+ from a string by position locations. Keep in mind that Python
430
+ indexes are zero-based.
431
+
432
+ .. code-block :: none
433
+
434
+ tips['sex'].str[0:1]
435
+
436
+
437
+ Scan
438
+ ~~~~
439
+
440
+ The SAS ``SCAN `` function returns the nth word from a string.
441
+ The first argument is the string you want to parse and the
442
+ second argument specifies which word you want to extract.
443
+
444
+ .. code-block :: none
445
+
446
+ data firstlast;
447
+ input String $60.;
448
+ First_Name = scan(string, 1);
449
+ Last_Name = scan(string, -1);
450
+ datalines2;
451
+ John Smith;
452
+ Jane Cook;
453
+ ;;;
454
+ run;
455
+
456
+ Python extracts a substring from a string based on its text
457
+ by using regular expressions. There are much more powerful
458
+ approaches, but this just shows a simple approach.
459
+
460
+ .. code-block :: none
461
+
462
+ firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
463
+ firstlast['First_Name'] = firstlast['String'].str.split(" ", expand=True)[0]
464
+ firstlast['Last_Name'] = firstlast['String'].str.rsplit(" ", expand=True)[0]
465
+
466
+
467
+ Upcase, Lowcase, and Propcase
468
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
469
+
470
+ The SAS ``UPCASE ``, ``LOWCASE ``, and ``PROPCASE `` functions change
471
+ the case of the argument.
472
+
473
+ .. code-block :: none
474
+
475
+ data firstlast;
476
+ input String $60.;
477
+ string_up = UPCASE(string);
478
+ string_low = LOWCASE(string);
479
+ string_prop = PROPCASE(string);
480
+ datalines2;
481
+ John Smith;
482
+ Jane Cook;
483
+ ;;;
484
+ run;
485
+
486
+ The equivalent Python functions are ``upper ``, ``lower ``, and ``title ``.
487
+
488
+ .. code-block :: none
489
+
490
+ firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
491
+ firstlast['string_up'] = firstlast['String'].str.upper()
492
+ firstlast['string_low'] = firstlast['String'].str.lower()
493
+ firstlast['string_prop'] = firstlast['String'].str.title()
494
+
495
+
360
496
Merging
361
497
-------
362
498
0 commit comments