Physical File GREP (PFGREP): Fast IBM i Source Code Search
Our 2023 article on searching source physical file members using the QShell grep command showed grep’s potential. In practice, while we found QShell grep to be flexible, we also experienced slow performance and occasional errors.
Now, our own Calvin Buckley has built an improved grep command called pfgrep to search traditional IBM i source physical file members. Quick and reliable, pfgrep is also free and open source.
Installation
Detailed installation instructions are found on the pfgrep Github page, but here are the basic three installation methods:
- Download the .rpm file from the Github site, then install via yum
- Seiden Group customers with access to our repositories can install directly from our repos using yum
- Git clone from Github to build from source
Using pfgrep
Example: To find any mention of php (or PHP, or pHP) in physical files in the ALAN library, I could launch a PASE shell (SSH Bash, QShell, CALL QP2TERM) and run this case-insensitive, recursive search:
pfgrep -i -r php /QSYS.LIB/ALAN.LIB/
On my IBM i system, the results include PHP references in CL, RPG, command source, and more, that I wouldn’t have found without such a powerful search tool.
Partial output:
1 2 3 4 5 |
/QSYS.LIB/ALAN.LIB/QCLSRC.FILE/CALLPHPCMD.MBR: CHGVAR VAR(&CMD) VALUE('/QOpenSys/pkgs/bin/php /QSYS.LIB/ALAN.LIB/QCLSRC.FILE/PASE2MON.MBR: CALL PGM(QP2SHELL) PARM('/QOpenSys/pkgs/bin/php' /QSYS.LIB/ALAN.LIB/QCMDSRC.FILE/PPHPPARSE.MBR: CMD PROMPT('PHP Log Parser') /QSYS.LIB/ALAN.LIB/QRPGLESRC.FILE/DB2UTIL.MBR: // PHP API -- start 32bit /QSYS.LIB/ALAN.LIB/QSOURCE.FILE/SCNXTSRN.MBR: // Program - SCNXTSRN Get Next System Ref. No. USED BY PHP!!!! * |
Performance
The primary motivation for developing pfgrep was to dramatically speed up code searches.
For example, for QShell grep to look through all the ILE C and C++ header files for the string ‘Qp01’, it needed 26.963 seconds:
1 2 3 4 5 |
$ time qsh -c "/usr/bin/grep -R -q Qp0l /QIBM/include" real 0m26.963s user 0m0.038s sys 0m0.004s |
pfgrep on the same system took only 3.098 seconds:
1 2 3 4 5 |
$ time pfgrep -r -q Qp0l /QIBM/include real 0m3.098s user 0m1.733s sys 0m0.003s |
For more examples with PCRE searches, additional command options, and helper utilities pfzip, pfcat, and pfstat, see the pfgrep Github site.
Integration with VS Code for i
Work is being done to integrate pfgrep into Code for i so VS Code users can benefit from its speed and power.
Keep up with the latest in VS Code for i and open source
Come to our free Code for i Fridays meetings and consider a Developer Support contract (VS Code support, pfgrep, PHP, Node, Python, RPG, Git, more) to receive one-on-one mentoring from our team as well as access to Seiden Developer Council meetings.
I replied earlier, but it seems WordPress ate my comment. 🤬
I don’t have visibility into how IBM did it, but I suspect a lot of it might be how it reads. pfgrep tries to do the I/O for reading the file all at once; it might do encoding conversion or the actual regex match in line based chunks, but file I/O is the heaviest because it involves going to disk and making a lot of round trips through syscalls. It’s quite possible qsh grep is just reading a line at a time when it does its work.
I have the same curiosity: where does this time difference come from?
Why is there such a difference? Under the covers, what is the difference how these two commands work?