Skip to content

Commit be34885

Browse files
added string processing
1 parent e41fe7f commit be34885

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

doc/source/comparison_with_sas.rst

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,142 @@ takes a list of columns to sort by.
357357
tips = tips.sort_values(['sex', 'total_bill'])
358358
tips.head()
359359
360+
361+
String Processing
362+
-----------------
363+
364+
Length
365+
~~~~~~
366+
367+
SAS determines the length of a character string with the ``LENGTHN``
368+
and ``LENGTHC`` functions. ``LENGTHN`` excludes trailing blanks and
369+
``LENGTHC`` includes trailing blanks.
370+
371+
.. code-block:: none
372+
373+
data _null_;
374+
set tips;
375+
put(LENGTHN(time));
376+
put(LENGTHC(time));
377+
run;
378+
379+
Python determines the length of a character string with the ``len`` function.
380+
``len`` includes trailing blanks. Use ``len`` and ``rstrip`` to exclude
381+
trailing blanks.
382+
383+
.. code-block:: none
384+
385+
tips['time'].str.len()
386+
tips['time'].str.rstrip().str.len()
387+
388+
389+
Find
390+
~~~~
391+
392+
SAS determines the position of a character in a string with the
393+
``FINDW`` function. ``FINDW`` takes the string defined by
394+
the first argument and searches for the first position of the substring
395+
you supply as the second argument.
396+
397+
.. code-block:: none
398+
399+
data _null_;
400+
set tips;
401+
put(FINDW(sex,'ALE'));
402+
run;
403+
404+
Python determines the position of a character in a string with the
405+
``find`` function. ``find`` searches for the first position of the
406+
substring. If the substring is found, the function returns its
407+
position. Keep in mind that Python indexes are zero-based and
408+
the function will return -1 if it fails to find the substring.
409+
410+
.. code-block:: none
411+
412+
tips['sex'].str.find("ALE")
413+
414+
415+
Substring
416+
~~~~~~~~~
417+
418+
SAS extracts a substring from a string based on its position
419+
with the ``SUBSTR`` function.
420+
421+
.. code-block:: none
422+
423+
data _null_;
424+
set tips;
425+
put(substr(sex,1,1));
426+
run;
427+
428+
In Python, you can use ``[]`` notation to extract a substring
429+
from a string by position locations. Keep in mind that Python
430+
indexes are zero-based.
431+
432+
.. code-block:: none
433+
434+
tips['sex'].str[0:1]
435+
436+
437+
Scan
438+
~~~~
439+
440+
The SAS ``SCAN`` function returns the nth word from a string.
441+
The first argument is the string you want to parse and the
442+
second argument specifies which word you want to extract.
443+
444+
.. code-block:: none
445+
446+
data firstlast;
447+
input String $60.;
448+
First_Name = scan(string, 1);
449+
Last_Name = scan(string, -1);
450+
datalines2;
451+
John Smith;
452+
Jane Cook;
453+
;;;
454+
run;
455+
456+
Python extracts a substring from a string based on its text
457+
by using regular expressions. There are much more powerful
458+
approaches, but this just shows a simple approach.
459+
460+
.. code-block:: none
461+
462+
firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
463+
firstlast['First_Name'] = firstlast['String'].str.split(" ", expand=True)[0]
464+
firstlast['Last_Name'] = firstlast['String'].str.rsplit(" ", expand=True)[0]
465+
466+
467+
Upcase, Lowcase, and Propcase
468+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
469+
470+
The SAS ``UPCASE``, ``LOWCASE``, and ``PROPCASE`` functions change
471+
the case of the argument.
472+
473+
.. code-block:: none
474+
475+
data firstlast;
476+
input String $60.;
477+
string_up = UPCASE(string);
478+
string_low = LOWCASE(string);
479+
string_prop = PROPCASE(string);
480+
datalines2;
481+
John Smith;
482+
Jane Cook;
483+
;;;
484+
run;
485+
486+
The equivalent Python functions are ``upper``, ``lower``, and ``title``.
487+
488+
.. code-block:: none
489+
490+
firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']})
491+
firstlast['string_up'] = firstlast['String'].str.upper()
492+
firstlast['string_low'] = firstlast['String'].str.lower()
493+
firstlast['string_prop'] = firstlast['String'].str.title()
494+
495+
360496
Merging
361497
-------
362498

0 commit comments

Comments
 (0)