Including Python Code by Line Numbers
Table of Contents
1. Intro
For quite some problems I have a separate python program, which I want to keep separate, and a \(\LaTeX\) or org mode file that explains or contains part of the python file. It is easy to include part of the code by using line numbers, like so
\inputminted[firstline=12, lastline=23]{python}{python_file.py}
However, if I add code above line 12 or add some code between lines 12 and 23, or move the block of code altogether, I have to update the line numbers. As I don't like this type of work, I decided to write some code to solve this problem, for \(\LaTeX\) and orgmode files.
2. Plan given to ChatGPT
As a first step, I drafted a list of requirement that my python program had to satisfy. Then I gave this list to ChatGPT, and got a program that did not work completely, but 80% or so was ok. I was amazed! Here are my initial requirements.
- In a \(\LaTeX\) file look up lines with strings that contain
inputminted
. - Look up a tag that appears in such lines as a comment at the end of a line. For instance,
tictoc
is the comment at the end of this line\inputminted[firstline=30, lastline=35]{python}{python_file.py} % tictoc
. - Also look up the name of the python file after the
\inputminted
command, herepython_file.py
. - Open the python file, and look up the line numbers of the code between lines tag with the comments
# block tictoc
. - Update the
firstline=
and thelastline=
in the \(\LaTeX\) file accordingly.
BTW, as it's easy to copy and move complete lines, I use the same string to demarkate the starting and termining lines; in other words, I don't use comments as # begin tictoc
and # end tictoc
.
It took a few of additional roundes with ChatGPT, and some additional work on my own, but the final result works nicely for my goals. Once the version worked for \(\LaTeX\), I updated it so that it can work with org mode files.
So, for a \(\LaTeX\) file, tag like this:
\inputminted[firstline=12+, lastline=23]{python}{python_file.py} % tictoc
and for an orgmode file, like this:
#+INCLUDE: "python_file.py" src python:lines "84-110" ## tictoc
Note the intentional double ##
to comment the tag.
3. The Code
3.1. The modules
import argparse
import glob
import re
3.2. Finding the tagged line numbers in the python file
This function looks up the line numbers. It strips trailing white space between the python code and the terminating comment tag.
def find_block_lines(python_lines, marker):
"""Return the linenumbers of blocks of python code morked like this:
# block marker
import numpy as np
# block marker
Strip the lines that end with # block marker.
Mind that inputminted starts counting at 1 while python starts at 0.
Therefore we need to add +1 when returning the line numbers.
"""
indices = [
idx
for idx, line in enumerate(python_lines)
if f"# block {marker}" in line
]
if len(indices) != 2:
return -1, -1
# remove the lines with # block marker string
start_idx = indices[0] + 1
end_idx = indices[1] - 1
# remove trailing empty lines
while end_idx > indices[0] and python_lines[end_idx].strip() == "":
end_idx -= 1
# Add one to convert python index to inputminted index
return start_idx + 1, end_idx + 1
3.3. Updating a \(\LaTeX\) file
This function looks up the tags mentioned in the \(\LaTeX\) file. In the for loop, it looks up the line numbers of the tagged code in the python file, then updates the \(\LaTeX\) file.
def update_latex_file(latex_file):
with open(latex_file, "r") as fp:
latex_content = fp.read()
pattern = (
r"\\inputminted\[firstline=\d+, lastline=\d+\]"
r"{python}{(.*?\.py)}\s*\% ([\w]+)"
)
matches = re.findall(
pattern,
latex_content,
)
for python_file, marker in matches:
with open(python_file, "r") as py_file:
lines = py_file.readlines()
begin_index, end_index = find_block_lines(lines, marker)
print(begin_index, end_index)
if begin_index != -1 and end_index != -1:
pattern = (
fr'\\inputminted\[firstline=\d+, lastline=\d+\]{{python}}'
fr'{{{python_file}}}\s*\%\s*{marker}'
)
replacement = (
fr'\\inputminted[firstline={begin_index}, '
fr'lastline={end_index}]{{python}}{{{python_file}}} \% {marker}'
)
latex_content = re.sub(pattern, replacement, latex_content)
with open(latex_file, "w") as fp:
fp.write(latex_content)
3.4. Updating an orgmode file
Updating the org file works similarly. However, in the org mode I write the tag after two hashes, like ## tag
. I noticed that org mode changes the % in the code for the \(\LaTeX\) files above.
def update_org_file(org_filename):
with open(org_filename, "r") as fp:
org_content = fp.read()
matches = re.findall(
r'#\+INCLUDE: \"(.*?)\" src python:lines \"\d+-\d+\" ## ([\w]+)',
org_content,
)
for python_file_name, marker in matches:
with open(python_file_name, "r") as py_file:
python_lines = py_file.readlines()
begin_index, end_index = find_block_lines(python_lines, marker)
print(begin_index, end_index)
if begin_index != -1 and end_index != -1:
pattern = (
fr'#\+INCLUDE: "{python_file_name}" src '
fr'python:lines "\d+-\d+" ## {marker}'
)
replacement = (
fr'#+INCLUDE: "{python_file_name}" src '
fr'python:lines "{begin_index}-{end_index+1}" ## {marker}'
)
org_content = re.sub(pattern, replacement, org_content)
with open(org_filename, "w") as fp:
fp.write(org_content)
3.5. Last steps
I want to be able to call the function on multiple files at ones.
def process_files(pattern):
for file_path in glob.glob(pattern):
if file_path.endswith('.org'):
update_org_file(file_path)
elif file_path.endswith('.tex'):
update_latex_file(file_path)
else:
print(f"{file_path} is not an .org nor a .tex file")
The main
reads the filenames as arguments and has the files updated.
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Update latex or org files based on Python markers."
)
parser.add_argument(
"file_pattern",
type=str,
help="Provide a .tex or .org file, or *.tex/*.org",
)
args = parser.parse_args()
process_files(args.file_pattern)