https://agateau.com/tags/mup/feedPosts tagged mup2020-04-13T18:59:31+02:00Aurélien Gâteaupython-feedgenhttps://agateau.com/2014/mup-a-markup-previewerMUP, a Markup Previewer2014-12-31T16:19:49+01:00<p>Following up on <a href="https://agateau.com/2014/lightweight-project-management/">my decision to promote more side-projects</a>, here is a new one: MUP.</p>
<p>MUP is a markup previewer. It supports multiple markup formats. You can use it to read markup text, but it is also useful when writing markup text to check how your work looks, thanks to its refresh-as-you-save feature.</p>
<p><img alt="MUP in action" src="https://agateau.com/hotlink/mup.png"/></p>
<h2>Features</h2>
<ul>
<li>Supports multiple markup formats, easy to extend</li>
<li>Automatically refreshes itself when the document is modified, tries to retain the position in the document after refreshing</li>
<li>Skips metadata headers, such as those used by static blog generators like Jekyll</li>
<li>Supports gzipped documents, useful to read documentation shipped with Debian packages</li>
</ul>
<h2>Supported Formats</h2>
<p>MUP supports Markdown and reStructuredText using Python modules.</p>
<p>It also supports other formats using external converters. External converters are command line tools which are invoked by MUP to convert input files. To be used as an external converter, the tool must accept markup on stdin and produces HTML on stdout.</p>
<p>Right now, MUP supports the following converters:</p>
<ul>
<li>Markdown variants (you can never have too many Markdown parsers!):<ul>
<li>Pandoc Markdown</li>
<li>GitHub Flavored Markdown via Kramdown</li>
<li>CommonMark</li>
<li>Gruber original Markdown</li>
</ul>
</li>
<li>Man pages:<ul>
<li>Ronn</li>
<li>Groff</li>
</ul>
</li>
<li>Asciidoc</li>
</ul>
<p>Adding a new converter is only a matter of creating a yaml file to describe its command line and the files it works with.</p>
<h2>Sounds Familiar?</h2>
<p>You may have heard about this project as "mdview". It used to be named like this but I renamed it because a) it supports more than Markdown and b) there are at least a dozen projects named "mdview" on GitHub :)</p>
<h2>Want It?</h2>
<p>MUP is on GitHub: <a href="https://github.com/agateau/mup">https://github.com/agateau/mup</a>. <code>git clone</code> it and follow the instructions from the Install section of the README.</p>2014-12-31T16:19:49+01:00https://agateau.com/2015/some-news-from-mupSome News From MUP2015-05-06T22:30:51+02:00<p>Time for a quick update on <a href="http://github.com/agateau/mup">MUP, the markup previewer</a>. Since I last wrote about it, it gained a few features: it was already capable of displaying man pages, but I added a simple wrapper to be able to open man pages just like the regular <code>man</code> command. You can now run <code>mupman grep</code> to learn all about <code>grep</code>.</p>
<p>Interestingly, I initially added support for man pages as a way to add yet another markup to MUP, and these days man pages is what I read most often with MUP, to the point where I created a <code>mm</code> shortcut to start it faster :)</p>
<h2>Search bar</h2>
<p>This prompted for another feature. Pressing "Ctrl+F" or "/" brings a simple and unobtrusive search bar at the bottom of the window:</p>
<p><img alt="The Search Bar" src="https://agateau.com/2015/some-news-from-mup/searchbar.png"/></p>
<h2>Fork</h2>
<p>Another change I made is to have MUP fork by default, no longer blocking the caller. I find this handy when I open a README or a man page as I can use a command while reading its documentation. It is also useful when editing a text in Vim: just type <code>:!mup %</code> to start MUP on the current file. One less character from the previous <code>:!mup % &</code>, massive productivity improvement!</p>
<h2>More converters!</h2>
<p>Finally, I added two new Markdown converters, because you can never have enough Markdown converters. These converters are a bit unusual: they use GitHub API. This means your text is sent over to GitHub and comes back as HTML. They are obviously slower than the other converters, but they are useful if you want to be sure your README.md will looks as you expect on your project landing page, without having to do multiple commits and pushes to get it right.</p>
<p>You might wonder why I say <em>two</em> converters. It's because GitHub actually supports two flavors of Markdown: plain Markdown is used for READMEs, while GFM - GitHub Flavored Markdown - is used in issues and in other places. The difference between the two is that GFM takes line breaks into account.</p>
<p>I might actually drop the GFM converter at some point, it feels less useful than the plain Markdown one. We'll see.</p>
<h2>User interface refresh</h2>
<p>The toolbar got reworked as well: it now comes with back and forward buttons, and a menu button in the right corner to hide some less important actions such as Reload or Open with Editor.</p>
<p><img alt="The New Toolbar" src="https://agateau.com/2015/some-news-from-mup/toolbar.png"/></p>
<p>Implementing the history for the back and forward buttons was a bit tricky to get right. I tried to use QWebHistory but could not find a way to use it because MUP generates the HTML code to display (as opposed to pointing the QWebView to existing files), so I had to roll my own implementation. If you know how to use QWebHistory in this context, I'd be happy to hear from you.</p>
<h2>What's next?</h2>
<p>I am considering modernizing the application a bit. First migrating it to Python 3, then to Qt 5, we'll see how it goes.</p>2015-05-06T22:30:51+02:00https://agateau.com/2016/refreshing-mupRefreshing MUP2016-06-05T13:46:06+02:00<p><a href="https://github.com/agateau/mup">MUP</a>, my markup previewer, was starting to show its age, being based on PyQt 4 and Python 2. I spent a bit of time last week to port it to PyQt 5 and Python 3.</p>
<p><a class="reference external image-reference" href="https://agateau.com/2016/refreshing-mup/screenshot.png"><img alt="MUP Screenshot" src="https://agateau.com/2016/refreshing-mup/thumb_screenshot.png"/></a></p>
<p>It looks more modern now, <a href="https://github.com/agateau/mup">give it a try</a>!</p>2016-06-05T13:46:06+02:00https://agateau.com/2020/generating-reports-with-python-markdown-and-entrGenerating reports with Python, Markdown and entr2020-04-13T18:59:31+02:00<p>Let's say you need to parse and analyse some raw data, for example a log file, to generate a report.</p>
<p>An easy way to get started with this is to write some Python, Perl, Ruby or shell code to work on your file and print meaningful information about it.</p>
<p>To illustrate this article I am going to write a Python script to "analyze" the <code>var/log/syslog</code> log file, whose entries look like this:</p>
<div class="codehilite"><pre><span/><code>Apr 12 11:53:42 dozer org.kde.KScreen[6995]: kscreen.xrandr: #011Primary: false
Apr 12 11:53:42 dozer org.kde.KScreen[6995]: kscreen.xrandr: Output 68 : connected = false , enabled = false
Apr 12 11:53:43 dozer org.kde.KScreen[6995]: kscreen.xrandr: Emitting configChanged()
Apr 12 11:53:47 dozer dbus-daemon[991]: [system] Activating service name='org.kde.powerdevil.backlighthelper' requested by ':1.416652' (uid=1001 pid=7186 comm="/usr/lib/x86_64-linux-gnu/libexec/org_kde_powerdev" label="unconfined") (usi
Apr 12 11:53:47 dozer org.kde.powerdevil.backlighthelper: QDBusArgument: read from a write-only object
Apr 12 11:53:47 dozer org.kde.powerdevil.backlighthelper: message repeated 2 times: [ QDBusArgument: read from a write-only object]
Apr 12 11:53:47 dozer dbus-daemon[991]: [system] Successfully activated service 'org.kde.powerdevil.backlighthelper'
Apr 12 11:54:04 dozer kernel: [716525.975149] usb 3-2: USB disconnect, device number 64
Apr 12 11:54:04 dozer kernel: [716525.975156] usb 3-2.1: USB disconnect, device number 65
Apr 12 11:54:04 dozer kernel: [716526.040164] usb 3-2.3: USB disconnect, device number 66
Apr 12 11:54:04 dozer upowerd[1672]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.1/3-2.1:1.0/0003:046D:C046.0025
Apr 12 11:54:04 dozer upowerd[1672]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.1/3-2.1:1.0
Apr 12 11:54:04 dozer upowerd[1672]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.1
Apr 12 11:54:04 dozer acpid: input device has been disconnected, fd 6
</code></pre></div>
<p>The script is going to take the log entries from stdin, iterate on the lines and produce a report on stdout.</p>
<h2>Parsing the log file</h2>
<p>The first thing to do is to parse the log file. Since we read the log from stdin, we can start with something like this:</p>
<div class="codehilite"><pre><span/><code><span class="kn">import</span> <span class="nn">sys</span>
<span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
<span class="c1"># TODO</span>
<span class="k">def</span> <span class="nf">parse_log</span><span class="p">():</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">readlines</span><span class="p">():</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="n">entry</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">entry</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">parse_log</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<!-- break -->
<p>We are going to use regular expressions to parse the log lines, and store the fields in a <a href="https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple">named tuple</a>:</p>
<div class="codehilite"><pre><span/><code><span class="ch">#!/usr/bin/env python3</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span>
<span class="n">LOG_RE</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(?P<date>\w+ \d+ \d+:\d+:\d+)"</span>
<span class="sa">r</span><span class="s2">" \w+"</span> <span class="c1"># the hostname, we ignore it</span>
<span class="sa">r</span><span class="s2">" (?P<app>[-a-zA-Z.]+)"</span>
<span class="sa">r</span><span class="s2">" *[^:]*:"</span> <span class="c1"># ignore any process identifier</span>
<span class="sa">r</span><span class="s2">" (?P<message>.*)"</span><span class="p">)</span>
<span class="n">Entry</span> <span class="o">=</span> <span class="n">namedtuple</span><span class="p">(</span><span class="s2">"Entry"</span><span class="p">,</span> <span class="p">(</span><span class="s2">"date"</span><span class="p">,</span> <span class="s2">"app"</span><span class="p">,</span> <span class="s2">"message"</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">LOG_RE</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="n">Entry</span><span class="p">(</span><span class="n">date</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"date"</span><span class="p">),</span>
<span class="n">app</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"app"</span><span class="p">),</span>
<span class="n">message</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"message"</span><span class="p">))</span>
</code></pre></div>
<p>(Complete script: <a href="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/sysloganalyzer1.py">sysloganalyzer1.py</a>)</p>
<p>I like to split the regular expression in multiple lines so that it's easy to comment, and to use named groups (the <code>?P<foo></code> things) to make the code extracting the group text from the matches easier to read.</p>
<p>We can run this script with:</p>
<div class="codehilite"><pre><span/><code>$ ./sysloganalyzer1.py < /var/log/syslog
</code></pre></div>
<p>And get this output:</p>
<div class="codehilite"><pre><span/><code>Entry(date='Apr 13 11:33:45', app='org.kde.powerdevil.backlighthelper', message='message repeated 2 times: [ QDBusArgument: read from a write-only object]')
Entry(date='Apr 13 11:33:45', app='dbus-daemon', message="[system] Successfully activated service 'org.kde.powerdevil.backlighthelper'")
Entry(date='Apr 13 11:35:13', app='wpa', message='wlp1s0: WPA: Group rekeying completed with 00:0f:66:84:2a:da [GTK=CCMP]')
Entry(date='Apr 13 11:35:20', app='wpa', message='wlp1s0: WPA: Group rekeying completed with 00:0f:66:84:2a:da [GTK=CCMP]')
Entry(date='Apr 13 11:35:28', app='rtkit-daemon', message='Supervising 5 threads of 3 processes of 1 users.')
Entry(date='Apr 13 11:35:28', app='rtkit-daemon', message='Supervising 5 threads of 3 processes of 1 users.')
Entry(date='Apr 13 11:35:28', app='systemd-resolved', message='Using degraded feature set (UDP) for DNS server 192.168.8.1.')
Entry(date='Apr 13 11:43:05', app='systemd-timesyncd', message='Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).')
Entry(date='Apr 13 11:45:01', app='CRON', message='(perso) CMD (/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime backup-job >/dev/null)')
</code></pre></div>
<p>To iterate quickly on these kind of scripts, it's more efficient to use the <a href="https://eradman.com/entrproject/">entr</a> tool to automatically rerun the script when either the data or the script itself changes. This is how I actually ran the script:</p>
<div class="codehilite"><pre><span/><code>$ ls /var/log/syslog sysloganalyzer1.py | entr -c -s './sysloganalyzer1.py < /var/log/syslog'
</code></pre></div>
<p>Entr expects a list of files on stdin, and automatically reruns the command passed as argument if any of these files change. The <code>-c</code> option makes it clear the screen before running the command and the <code>-s</code> option makes it run the command through a shell. We need the shell here because our script expects data from stdin.</p>
<p>Note: be careful if you use <code>ls</code> to feed Entr: if you have aliased <code>ls</code> to be <code>ls --color</code>, Entr is going to fail, because it does not understand the ANSI escape codes used by <code>ls</code>to colorize the output! You can fix that by changing your alias to <code>ls --color=auto</code>.</p>
<h2>Parsing the date</h2>
<p>Having the date as a raw string makes it hard to work with it, so we are going to create a <code>datetime</code> object from it by changing <code>parse_line()</code> like this:</p>
<div class="codehilite"><pre><span/><code><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="c1"># ...</span>
<span class="n">YEAR</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">year</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">LOG_RE</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">date_str</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"date"</span><span class="p">)</span>
<span class="n">date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">date_str</span><span class="p">,</span> <span class="s2">"%b </span><span class="si">%d</span><span class="s2"> %H:%M:%S"</span><span class="p">)</span>
<span class="n">date</span> <span class="o">=</span> <span class="n">date</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="n">year</span><span class="o">=</span><span class="n">YEAR</span><span class="p">)</span>
<span class="k">return</span> <span class="n">Entry</span><span class="p">(</span><span class="n">date</span><span class="o">=</span><span class="n">date</span><span class="p">,</span>
<span class="n">app</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"app"</span><span class="p">),</span>
<span class="n">message</span><span class="o">=</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s2">"message"</span><span class="p">))</span>
</code></pre></div>
<p>Note that we had to set the year manually, because the date format used by the log file does not set it...</p>
<p>The output of our script now looks like this:</p>
<div class="codehilite"><pre><span/><code>Entry(date=datetime.datetime(2020, 4, 13, 13, 59, 50), app='dhclient', message='bound to 10.0.0.108 -- renewal in 36583 seconds.')
Entry(date=datetime.datetime(2020, 4, 13, 13, 59, 50), app='systemd', message='Starting resolvconf-pull-resolved.service...')
</code></pre></div>
<p>(Complete script: <a href="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/sysloganalyzer2.py">sysloganalyzer2.py</a>)</p>
<h2>Reporting</h2>
<p>We are now ready to work with this data to extract meaningful information.</p>
<p>In this example we are going to create a report to show:</p>
<ul>
<li>a list of the apps writing to the log file, sorted by the number of messages they logged</li>
<li>messages for the last 30 minutes, grouped by apps</li>
</ul>
<p>First lets modify the <code>main()</code> function to store all log entries in an <code>entries</code> list:</p>
<div class="codehilite"><pre><span/><code><span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">entries</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">parse_log</span><span class="p">())</span>
<span class="c1"># ...</span>
</code></pre></div>
<h3>Reporting message count per app</h3>
<p>To implement the app message counter we can use the <a href="https://docs.python.org/3/library/collections.html?highlight=counter#collections.Counter">Counter</a> class from the <code>collections</code> module. A basic version, without sorting the results, requires only this:</p>
<div class="codehilite"><pre><span/><code><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">namedtuple</span><span class="p">,</span> <span class="n">Counter</span>
<span class="c1"># ...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">entries</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">parse_log</span><span class="p">())</span>
<span class="n">app_message_count</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">app</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Messages per app"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">app</span><span class="p">,</span> <span class="n">count</span> <span class="ow">in</span> <span class="n">app_message_count</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"- </span><span class="si">{</span><span class="n">app</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">count</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>Running this script, we get the following result:</p>
<div class="codehilite"><pre><span/><code>Messages per app
- colord: 1
- anacron: 12
- dbus-daemon: 83
- org.kde.powerdevil.backlighthelper: 66
- CRON: 43
- backintime: 136
- wpa: 116
- systemd-resolved: 60
- systemd: 147
- org.kde.KScreen: 763
- kernel: 406
- upowerd: 88
- acpid: 6
- NetworkManager: 358
- whoopsie: 48
- avahi-daemon: 106
- nm-dispatcher: 56
- systemd-sleep: 8
- bluetoothd: 16
- systemd-rfkill: 4
- ModemManager: 4
- dhclient: 25
- mtp-probe: 6
- rtkit-daemon: 54
- systemd-timesyncd: 4
- pulseaudio: 1
- org.kde.ActivityManager: 31
- org.kde.kpasswdserver: 39
- backintime-qt: 11
- crontab: 6
- snapd: 1
- fstrim: 2
- cron: 1
</code></pre></div>
<p>Let's sort by message count by changing our <code>for</code> loop to this:</p>
<div class="codehilite"><pre><span/><code> <span class="k">for</span> <span class="n">app</span><span class="p">,</span> <span class="n">count</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">app_message_count</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="o">-</span><span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
</code></pre></div>
<p><code>app_message_count.items()</code> returns a list of tuples, where the first element is the app name and the second element is the message count. The <code>sorted()</code> function lets us sort our list and we can pass it a <code>key</code> argument so it knows how to sort. By passing it <code>lambda x: -x[1]</code> we tell it to use the second item of the tuple (the message count) and sort in descending order.</p>
<p>Now the output looks like this:</p>
<div class="codehilite"><pre><span/><code>Messages per app
- org.kde.KScreen: 763
- kernel: 406
- NetworkManager: 358
- systemd: 147
- backintime: 136
- wpa: 116
- avahi-daemon: 106
- upowerd: 88
- dbus-daemon: 83
- org.kde.powerdevil.backlighthelper: 66
- systemd-resolved: 60
- nm-dispatcher: 56
- rtkit-daemon: 54
- whoopsie: 48
- CRON: 43
- org.kde.kpasswdserver: 39
- org.kde.ActivityManager: 31
- dhclient: 25
- bluetoothd: 16
- anacron: 12
- backintime-qt: 11
- systemd-sleep: 8
- acpid: 6
- mtp-probe: 6
- crontab: 6
- systemd-rfkill: 4
- ModemManager: 4
- systemd-timesyncd: 4
- fstrim: 2
- colord: 1
- pulseaudio: 1
- snapd: 1
- cron: 1
</code></pre></div>
<p>(Looks like KScreen is quite verbose...)</p>
<h3>Reporting messages logged for the last 30 minutes, grouped by apps</h3>
<p>This second part is a pretext to show you the <a href="https://docs.python.org/3/library/itertools.html?highlight=groupby#itertools.groupby">groupby()</a> function from the <code>itertools</code> module.</p>
<p>But first let's keep only the entries from the last 30 minutes:</p>
<div class="codehilite"><pre><span/><code><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="c1">#...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="c1">#...</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Latest entries"</span><span class="p">)</span>
<span class="n">min_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">30</span><span class="p">)</span>
<span class="n">latest_entries</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">entries</span> <span class="k">if</span> <span class="n">x</span><span class="o">.</span><span class="n">date</span> <span class="o">></span> <span class="n">min_date</span><span class="p">)</span>
</code></pre></div>
<p><code>groupby()</code> groups elements from an iterable according to a certain criteria. It expects the elements to be sorted by the criteria, so before calling <code>groupby()</code>, we must sort the entries by app. It's also a good idea to sort the list by date as a second criteria so that our entries are nicely ordered within their group. We do so by adding this line:</p>
<div class="codehilite"><pre><span/><code> <span class="n">latest_entries</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">latest_entries</span><span class="p">,</span>
<span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">app</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">date</span><span class="p">))</span>
</code></pre></div>
<p>Now we are ready to call <code>groupby()</code>:</p>
<div class="codehilite"><pre><span/><code> <span class="k">for</span> <span class="n">app</span><span class="p">,</span> <span class="n">entries</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="n">latest_entries</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">app</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"- </span><span class="si">{</span><span class="n">app</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">" </span><span class="si">{</span><span class="n">entry</span><span class="o">.</span><span class="n">date</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">entry</span><span class="o">.</span><span class="n">message</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>If we run this script, the output looks like this:</p>
<div class="codehilite"><pre><span/><code>Messages per app
- org.kde.KScreen: 763
- kernel: 406
- NetworkManager: 358
- systemd: 147
- backintime: 142
- wpa: 117
- avahi-daemon: 106
- upowerd: 88
- dbus-daemon: 85
- org.kde.powerdevil.backlighthelper: 68
- systemd-resolved: 60
- nm-dispatcher: 56
- rtkit-daemon: 54
- CRON: 48
- whoopsie: 48
- org.kde.kpasswdserver: 39
- org.kde.ActivityManager: 31
- dhclient: 25
- bluetoothd: 16
- anacron: 12
- backintime-qt: 11
- systemd-sleep: 8
- acpid: 6
- mtp-probe: 6
- crontab: 6
- systemd-rfkill: 4
- ModemManager: 4
- systemd-timesyncd: 4
- fstrim: 2
- colord: 1
- pulseaudio: 1
- snapd: 1
- cron: 1
Latest entries
- CRON
2020-04-13 15:00:01, (perso) CMD (/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime backup-job >/dev/null)
2020-04-13 15:00:01, (agateau) CMD (/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime backup-job >/dev/null)
2020-04-13 15:15:01, (perso) CMD (/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime backup-job >/dev/null)
2020-04-13 15:15:01, (agateau) CMD (/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime backup-job >/dev/null)
2020-04-13 15:17:01, (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
- backintime
2020-04-13 15:00:01, INFO: Profile "Profil principal" is not scheduled to run now.
2020-04-13 15:00:01, INFO: Deferring backup while on battery
2020-04-13 15:00:01, WARNING: Backup not performed
2020-04-13 15:15:01, INFO: Profile "Profil principal" is not scheduled to run now.
2020-04-13 15:15:02, INFO: Deferring backup while on battery
2020-04-13 15:15:02, WARNING: Backup not performed
- dbus-daemon
2020-04-13 15:00:50, [system] Activating service name='org.kde.powerdevil.backlighthelper' requested by ':1.416652' (uid=1001 pid=7186 comm="/usr/lib/x86_64-linux-gnu/libexec/org_kde_powerdev" label="unconfined") (using servicehelper)
2020-04-13 15:00:50, [system] Successfully activated service 'org.kde.powerdevil.backlighthelper'
- org.kde.powerdevil.backlighthelper
2020-04-13 15:00:50, QDBusArgument: read from a write-only object
2020-04-13 15:00:50, message repeated 2 times: [ QDBusArgument: read from a write-only object]
- wpa
2020-04-13 15:04:52, wlp1s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-6 noise=-83 txrate=48000
</code></pre></div>
<p>(Complete script: <a href="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/sysloganalyzer3.py">sysloganalyzer3.py</a>)</p>
<h2>Formatting the report</h2>
<p>Raw text is not super nice to read. A common alternative is to generate HTML, but generating HTML is painful: you need to wrap everything in tags. This is no fun, especially if you want to produce bullet lists. You also need to think about escaping characters. Python makes this less error-prone, but attempting to generate HTML from a shell script does not sound like a good idea...</p>
<p>I find generating Markdown easier. Being line-based, Markdown is easy to generate from a script. The raw output is still readable, and once you are happy with it you can get a decent-looking report by piping the results to a Markdown formatter such as <a href="https://pandoc.org">Pandoc</a> or <a href="https://github.com/commonmark/cmark">CMark</a>.</p>
<p>Let's revisit our script to make it generate Markdown. There is actually not much to change. First we can prefix the string of our titles with <code>#</code> to turn them into sections headers:</p>
<ul>
<li><code>print("Messages per app")</code> becomes <code>print("# Messages per app")</code></li>
<li><code>print("Latest entries")</code> becomes <code>print("# Latest entries")</code></li>
</ul>
<p>We also need to add an empty line between sections. A simple <code>print()</code> will do.</p>
<p>Next, we can turn the app titles in the "Latest entries" section into sub-sections by changing the <code>print(f"- {app}")</code> into <code>print(f"## {app}")</code>.</p>
<p>Finally we can turn the log entries into bullet list elements by replacing the indent with <code>-</code>.</p>
<p>Our new <code>main()</code> function looks like this:</p>
<div class="codehilite"><pre><span/><code><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">entries</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">parse_log</span><span class="p">())</span>
<span class="n">app_message_count</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">app</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"# Messages per app"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">app</span><span class="p">,</span> <span class="n">count</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">app_message_count</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="o">-</span><span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"- </span><span class="si">{</span><span class="n">app</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">count</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"# Latest entries"</span><span class="p">)</span>
<span class="n">min_date</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">30</span><span class="p">)</span>
<span class="n">latest_entries</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">entries</span> <span class="k">if</span> <span class="n">x</span><span class="o">.</span><span class="n">date</span> <span class="o">></span> <span class="n">min_date</span><span class="p">)</span>
<span class="n">latest_entries</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">latest_entries</span><span class="p">,</span>
<span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">app</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">date</span><span class="p">))</span>
<span class="k">for</span> <span class="n">app</span><span class="p">,</span> <span class="n">entries</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="n">latest_entries</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">app</span><span class="p">):</span>
<span class="nb">print</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## </span><span class="si">{</span><span class="n">app</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"- </span><span class="si">{</span><span class="n">entry</span><span class="o">.</span><span class="n">date</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">entry</span><span class="o">.</span><span class="n">message</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>(Complete script: <a href="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/sysloganalyzer4.py">sysloganalyzer4.py</a>)</p>
<p>If we pipe our script through <code>pandoc</code> (still using <code>entr</code>) like this:</p>
<div class="codehilite"><pre><span/><code>ls /var/log/syslog sysloganalyzer4.py | entr -c -s \
'./sysloganalyzer4.py < /var/log/syslog | pandoc --standalone -f markdown_github --toc > report.html'
</code></pre></div>
<p>Then we get a decent report like this:</p>
<p><img alt="Report" src="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/report.png"/></p>
<p>Not the most beautiful report ever generated, but readable by everyone. And you can prettify it by providing Pandoc a nicer template or a CSS file if necessary.</p>
<p>A note about the Pandoc options used here:</p>
<p><code>--standalone</code>: tells Pandoc to generate a complete HTML document. That is, a file starting with an <code><html></code> element, instead of just HTML elements.</p>
<p><code>-f markdown_github</code>: tells Pandoc to use the "GitHub Flavored Markdown" Markdown dialect. I find this more appropriate for script output because it makes carriage-returns significant. If you use <code>-f markdown</code> instead, then the output of this code:</p>
<div class="codehilite"><pre><span/><code><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"foo"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"bar"</span><span class="p">)</span>
</code></pre></div>
<p>Will appears as "foo bar" in the report, instead of being on two separate lines.</p>
<p>You can do the same with CMark using the <code>--hardbreaks</code> option.</p>
<p><code>--toc</code>: generates a clickable table of content from the section headers.</p>
<h2>Quick preview (shameless plug)</h2>
<p>This HTML output is nice, but having to reload the browser is annoying. The <a href="https://eradman.com/entrproject/">entr</a> page provides a script to reload the current tab of your browser, but what I like to do is to use my markup previewer, <a href="https://github.com/agateau/mup">Mup</a>, to display the HTML. This way I can have an editor opened on my script on the left side of the screen and Mup showing the report on the right side. Every time I change the script, <code>entr</code> regenerates the report, Mup notices the changes and reloads it.</p>
<video width="1352" height="770" controls="">
<source src="https://agateau.com/2020/generating-reports-with-python-markdown-and-entr/mup-demo.mp4" type="video/mp4">
</source></video>
<h2>Takeaways</h2>
<ul>
<li>Python comes with quite a few handy classes to massage data and extract meaningful information: the <a href="https://docs.python.org/3/library/collections.html">collections</a> and <a href="https://docs.python.org/3/library/itertools.html">itertools</a> modules are worth studying;</li>
<li>Use <a href="https://eradman.com/entrproject/">entr</a> for fast iterations while working on your script;</li>
<li>Generating Markdown from a script is easy, and you can turn it into decent HTML using <a href="https://pandoc.org">Pandoc</a>, <a href="https://github.com/commonmark/cmark">CMark</a> or any other Markdown formatter;</li>
<li>Use an auto-updating tool like <a href="https://github.com/agateau/mup">Mup</a> to display the report while working on it.</li>
</ul>2020-04-13T18:59:31+02:00