Philippe Beliveau

Following Sylvain K.'s post on making your website lean, I went down the rabbit hole. Believe it or not, the size (like a regular file on your computer) of a web page has an impact. For this business, a 0.3 seconds difference in loading costs 8 million euro for customers in data usage (see presentation). Out of curiosity, I ran PageSpeed Insights from Google to find out my web page score for this blog. I scored a somewhat disappointing 71 on mobile and 83 on desktop (out of 100)... :


Mobile, before optimization


Desktop, before optimization

Most of the potential savings were in the form of optimizing the size and format of my posts pictures. Google recommends their own format called webp for images. As the extension of the file-type not so subtly suggest, it is optimized for the web. Thankfully, the superb scoop has Google command line application to convert images to webp:

    scoop install libwebp

Following the installation, I just converted my original images to webp using the standard ouput presets of cwebp:

    cwebp.exe input-image.png -o output-image.webp

And voilà! A beautiful reduction of 97% in size:

UPDATE: write.as has updated their service and now supports webp format. hooray! 2020-10-30 16:04:20 -0400

Unfortunately, when uploading to snap.as, the service included with write.as, the picture is converted to /png/ and the size balloons back to 1 800 KB...

So, while not entirely defeating the purpose, I would expect an image hosting service to support one of the main file type for the web. For now, I will resort to using my own CDN until I find a host or kindly ask snap.as to support this format. At least, the result speaks for itself:


Mobile, after optimization


Desktop, after optimization

Most of the savings left for mobile are out of my hands so to speak. I have limited control over the JavaScript used by write.as. And that's exactly how I want it. After all, the purpose of using a platform like write.as is to concentrate on the words rather than the delivery... Except when it's relatively easy to facilitate the delivery. ;)

Notice

This post is not in any way an endorsement for Google and the way they determine if your web page is fast enough.

img

Why Excel and spreadsheets are amazing

A couple of days ago I read an article from HBR about spreadsheets. The intro mentioned the place Microsoft Excel has in our lives. Specifically, it said that at any given time, 750 millions people use Excel in the world and that the average worker will spend 10% of their work life using it. I've worked at three major companies during my accounting studies. As you can guess, I was employed in the finance and/or reporting departments.

Now, all of those organizations have pricey ERPs (real pricey). Still, >99% of my time was spent working on Excel files to accomplish various tasks. I won't get into the subject of why employees and their supervisors don't use ERPs even when the license is already paid. Subject for another post.

However, I can tell you why they use spreadsheet software :

  • It's known. In fact, everyone know how to operate them. If you send a .xlsx file to someone, they know how to open it, how to manipulate it, etc. This may sound obvious, but believe me, it's hard to find something that you are certain every single employee and business understand and uses.
  • It's flexible. It can do anything you throw at it. It can crunch numbers, process documents, prepare presentations, act as a business intelligence tool (BI). You get the gist. Not a lot of tools are as flexible as a spreadsheet software.
  • It's easy. It as a simple graphic interface that everyone comprehend seamlessly. It's intuitive and it feels like something we've been using since the first time we ever came in contact with computers. (Excel has been a Microsoft product for more than 30 years).
  • It's cheap. Not dirt cheap, but close. Last time I verified, it was 11$ per month per user for the whole Office suite of software. So even your *mom-and-pop* small shop can afford it. If they can't, there is other viable options, like LibreOffice.
  • It's useful. Want to find out how much it costs per unit of *x*? Not a problem. Excel is perfect for those ad hoc reports that your boss wants today.

When it falls short

Sounds too good to be true? Here comes the catch: Everyone uses it. By everyone, I mean it. Old, young, no experience, a lot of it, etc. Do you see the problem now? No standard of working, millions of way to create the layout, and to manage those spreadsheets. This result in the utmost variety of quality of workbooks.

When spreadsheet workbooks are badly constructed, the outcome can be as benign as poor efficiency to outright financial disasters. Remember the *London Whale* scandal of 2012? The losses of JP Morgan were multiplied because they were copy-pasting between workbooks and the layouts were not straight forward, resulting in mistakes.

Some organizations recognized these dangers and try to correct them. A good chunk of my time as a financial analyst was to open workbooks used in critical parts of a company, take note of the damages, plan for repairs, or if the document is too flawed, make a brand-new one.

Indicators of a suboptimal workbook

When I open a spreadsheet I've never worked with, I look for a handful of elements to determine whether it's salvageable or not. Here are the indicators to find if you are dealing with a damaged workbook:

  • Workbook created a decade ago. Business needs change over time, it grows, it diversifies, etc. As a result, the spreadsheets used to follow and drive those processes need to be updated. In the software world, they call this a revision.
  • An increasing number of rotating employees had to use it over the years. When an increasing number of people worked on the same workbook without guidelines and/or without talking to each other, you run the risk of creating a monster: Every sheet looks completely different, you can't trace back the changes, and the number of links between sheets defy logic.
  • Level of importance to the business. I am a believer in making internal process as efficient as possible. When the operation of a spreadsheet is critical to the organization, the tolerance of inefficiency becomes way lower.
  • No documentation to support users of the Excel application. As an example, it requires another employee to take time out of his schedule to *teach* how to use it. Very little companies have documentation regarding the usage of their workbooks. As far as I'm concerned, documentation is as important as the workbook itself to debug, understand, and use efficiently the spreadsheet.
  • No one can explain in a single concise phrase how the workbook works and what it does. (No guarantee of result) A spreadsheet needs to have a specific goal, like determining the interest rates for 5 years bonds. If it does too many things, it has many breakage points instead of a single one.
  • A messy layout. This one is more complicated to explain but might be the most important indicator that a workbook is too far gone to be saved. Usually, you know 30 seconds into opening a spreadsheet and moving around if the layout works and makes sense. If you're scratching your head and asking yourself 'what the heck am I looking at?'; It's a messy layout.

Take note that the indicators are not to be use for an ad hoc spreadsheet. (i.e. report that will be use once.)

If you score 3 out of the 6 indicators, it might be time to retire this workbook. It's faster to start from zero with a fresh and solid base than doing the impossible. Making a good spreadsheet for years to come is not a big ordeal, and it does not require you to be an Excel MVP (yes, it's a real thing).

Just do the inverse of the indicators of a bad spreadsheet and you should be more than fine.

Resources

If you can, schedule time to read and learn on how to get better with Excel. To get you started, here are a few good reads to build robust workbooks in Excel :

Brief history

Creating coding conventions like PEP-8 for Python or the famous Linux coding style for the kernel development in C was never a priority with LaTeX. Why? Well, all the reasons to create a coding standard never were really present for LaTeX and the use of the language. Usually, a coding standard emerges when:

  1. Multiple people collaborate on a project. Cohesion and quality of the code can then become an issue without a standard.
  2. The code is audited or might be and thus code quality and readability become an important factor.
  3. A project might be used and refactored into other projects and compatibility becomes an important factor.
  4. The project must be supported for multiple years and a convention becomes important so that other people than the first creator can work without having to firstly understand the specific way a project was coded.

But for those familiar with LaTeX, rarely does a project in LaTeX is complex enough to create a standard of coding practices. Indeed, most TeX documents are initially created for personal use like a C.V. or school work, etc. Plus, only a few projects are maintained across multiple years and most of the time those projects are behind closed doors for private entities like a corporation generating monthly reports.

So for the past 32 years (since the creation of LaTeX), establishing a coding standard was not a pressing matter. Consequently, there is no coding style agreed upon by a large portion of LaTeX user.

No standard, is that a bad thing?

Yes.

If you ever had to work on a large project, like writing a book with more than yourself (multiples authors), you know what kind of hell it can be to maintain a certain level of readability when everyone writes their code differently. How many times do I have to review a text written like the author forgot to break lines :

\begin{document} this is a very long line that I wrote to show that
editing a mistake made in the middle of this line will be more complex to
address than someone breaking lines after a number of characters. some people
use \latex like a WYSIWYG and they tend to forget they can make the text much
more readable by ppressing "enter" on their keyboards from time to time.

This is only a small glimpse into the readability problems resulting from using different styles while working simultaneously on a document. Other obvious issues that occurs without style are the capacity to maintain and build on an existing project promptly.

Enters Didier Verna

At the Tex Users Group conference of 2011, Didier Vernais presented his vision of a coding standard for LaTeX. You can find the following paper freely available here: Towards LaTeX coding standards

Like myself, Didier had his fair share of scrapes. He goes as far as to qualify the present situation in the world of LaTeX as being close to anarchy. Browsing the GitHub repos containing tex files, I would say that it's a fair assertion. (not to say that languages having code styles, like Python, don't have to deal with the same problems)

I think Towards LaTeX coding standards brings a lot of simple and logical ideas that would increase the quality of LaTeX of documents. This is a summary of the proposed coding standard made by Didier. I would advise reading the whole document when you have time. Otherwise, this is to highlight elements of the paper so that this post can be serve as some sort of cheatsheet in writing robust and consistent LaTeX documents for the masses.

Notice: this only cover elements of the Didier paper on layout of documents (low level style) and not styles standards regarding higher abstraction levels like packages, conflict management, etc. Please refer to the whole document for those components of Didier style.

The Didier Coding Style (DCS)

Intended user:

  • You are writing documents in LaTeX in collaboration with other people; or/and
  • You want to make your LaTeX documents readable and robust; or/and
  • You are a perfectionist, like me.

I only address a small part of DCS because he treats mostly about coding style for LaTeX packages developers while I only want to cover the generic rules that should apply in regular documents.

Rule 1: Stay WYSIWG coherent

Essentially: Do not only think about what the resulting document will look like. You have to think about what the format of your code like it is itself the output. Some basic examples:

  • A blank line after a paragraph means the end of the paragraph:
This is my first paragraph.

This is a second paragraph.
  • When using '\\' or '\par', you should not write text following that statement. (meaning on the same line). In other words, jump a line when ending a paragraph.

  • Tabular environment should mimic the actual output layout even if it increases the amount of time to realize. Meaning, should you be able to read a table without producing the output.

Yes:

\begin{tabular}{ |p{3cm}||p{3cm}|p{3cm}|p{3cm}|  }
     \hline
     \multicolumn{4}{|c|}{Country List} \\
     \hline
     Country Name   & ISO ALPHA 2 Code & ISO ALPHA 3 Code & ISO numeric Code\\
     \hline
     American Samoa & AS  	     & ASM              & 016 \\
     Andorra        & AD               & AND              & 020     \\
     Angola         & AO               & AGO              & 024     \\
     \hline
\end{tabular}

No:

    \begin{tabular}{ |p{3cm}||p{3cm}|p{3cm}|p{3cm}|  }
     \hline
     \multicolumn{4}{|c|}{Country List} \\
     \hline
     Country Name&ISO ALPHA 2 Code&ISO ALPHA 3 Code&ISO numeric Code\\
     \hline
     American Samoa&   AS  & ASM&016\\
     Andorra& AD  & AND   &020\\
     Angola& AO  & AGO&024\\
     \hline
\end{tabular}

Rule 2: Spaces in math mode

This one is pretty straight forward: give some room to your text in math mode like you would do in regular text.

Yes:

$ f(x) = 2 + f(x-2) $

No:

$f(x)=2+f(x-2)$

Rule 3: One instruction per line

This is one instruction as one logical instruction per line. Not to be confused with simply a single instruction (logical vs. unique). Here is an example:

Yes:

\hskip.11em\@plus.33em\@minus.07em
\small

No:

\hskip.11em\@plus.33em\@minus.07em \small

Rule 4: Indent all forms of grouping

The most simple and yet the most important rule. Please indent when it's logical to do so. Didier has a great example of indentation :

Yes:

\def\@docinclude*1{%
    \clearpage
    \if@filesw\immediate\write\@mainaux{\string\@input{*1.aux}}\fi
    \@tempswatrue
    \if@partsw
        \@tempswafalse
        \edef\@tempb
        \@for\@tempa:=\@partlist\do{\ifx\@tempa\@tempb\@tempswatrue\fi}%
    \fi
    \if@tempswa
        \let\@auxout\@partaux
        \if@filesw
            \immediate\openout\@partaux #1.aux
            \immediate\write\@partaux{\relax}%
        \fi
    \fi

Please no…

\def\@docinclude*1{\clearpage
\if@filesw\immediate\write\@mainaux{\string\@input{*1.aux}}\fi
\@tempswatrue\if@partsw\@tempswafalse\edef\@tempb{*1}\@for
\@tempa:=\@partlist\do{\ifx\@tempa\@tempb\@tempswatrue\fi}\fi \if@tempswa
\let\@auxout\@partaux \if@filesw \immediate\openout\@partaux *1.aux
\immediate\write\@partaux{\relax}\fi % ... \fi :-(

Rule 5: Be modular

This rule is not treated in the formatting section in Didier paper. In fact, Didier mostly address the modularity aspect from a package developer standpoint. However, I would like to add this rule to the formatting section for one obvious reason: Working with 10 small files is easier and than working with one big file. Don't agree? Let's go back to rule 4 about indentation and let's say we are writing a book. It's easier to get a grasp of the code if this is the structure :

Yes:

+-- book
| +-- book.tex     |
| +-- chapters     |
| +-- chapter1.tex |
| +-- chapter2.tex |
| +-- ...          |

book.tex file

\begin{document}
\input{chapters/chapter1.tex}
\input{chapters/chapter2.tex}
...
\end{document}

No:

+-- book
| +-- book.tex |

book.tex file

\begin{document}
% 3000 lines later ...
\end{document}

Context

Access is a useful tool that bridges the gap between spreadsheet software like Excel, and more technical relational databases. The goal is to provide a simple, and nice user interface to interact with larger datasets than Excel could efficiently.

The benefit is clear: enjoy a more robust and faster way to handle data than Excel while having a smaller learning curve than traditional database solutions (i.e., MySQL, MariaDB, etc.).

One of the main shortcomings is the limited support for proper SQL queries. Much to my behest, it's not possible to store variables in a query. In my case, I was trying to query a dataset of patients for Ontario Case Costing to find missing nursing data.

For much of these faults, the common (and only?) recourse is to fall back on Visual Basic for Applications (VBA). Yes, the most beloved programming languages.

Extending Access VBA with DLL

VBA does not natively 'speak' to Access objects (tables, queries, etc.). To do so, you need to load DLLs the old-fashioned way in Visual Basic Editor (VBE): img

A library to manipulate Access objects is to use '[Recordset object library DAO]'. Without loading this library, you'll have to resort to SQL statements stored in string variables.

Error in loading DLL

This is nothing short of a testament to how unmaintained Access is in the various Office software: one of the key libraries in VBA does not even load. Anyhow, if you see this error when adding the reference 'Microsoft DAO 3.6 Object Library' there is a fix: img

Here is how you can address it:

  1. Navigate to your 'C:' drive and you should see the two following folders. The difference between the two folders is x86 is for 32 bits compiled software versus 64 bits. img
  2. Navigate to 'C:/Program Files (x86)/Common Files/Microsoft Shared/', copy the 'DAO' folder and paste it in the 64-bit folder at the same location: img
  3. Close and reopen Access and load the reference in the VBE.

You should now be able to declare recordset variables. Like so:

Private Sub GenerateMissingMinutes()

    ' Declare variables
    Dim strQuery        As String
    Dim strCommonKey    As String
    Dim objDatabase     As DAO.Database
    Dim objQuery        As DAO.Recordset
    Dim strEncounter    As String
    Dim intMinutes      As Integer
    Dim strDate         As String
    Dim strLastDate     As String
    Dim lgDate          As Long
    Dim lgLastDate      As Long
    Dim strLastEncount  As String
    Dim strFC           As String
    Dim strProduct      As String
    Dim strSource       As String
    Dim intVariance     As Integer
    Dim lgVariance      As Long

    ' Attribute variables
    strQuery = "Missing minutes"
    Set objDatabase = CurrentDb

    ' Open the query
    Set objQuery = objDatabase.OpenRecordset(strQuery)

    ' Loop and ram-load the query
    With objQuery
        .MoveFirst
        Do Until .EOF
            strEncounter = .Fields(0)
            strDate = .Fields(4)
            lgDate = CLng(strDate)
            intMinutes = .Fields(1)
            strFC = .Fields(2)
            strProduct = .Fields(3)
            If .Fields(5) <> "" Then
                strSource = .Fields(5)
            Else
                strSource = "null"
            End If

            If strEncounter = strLastEncount Then
                If lgDate <> lgLastDate And lgDate > lgLastDate Then
                    lgVariance = lgDate - lgLastDate
                    If lgVariance = 1 Then
                        .AddNew
                        .Fields(0) = strEncounter
                        .Fields(1) = intMinutes
                        .Fields(2) = strFC
                        .Fields(3) = strProduct
                        .Fields(4) = lgLastDate
                        .Fields(5) = strSource
                        .Update
                    End If
                End If
            End If

            strLastEncount = strEncounter
            strLastDate = strDate
            lgLastDate = CLng(strLastDate) + 1
            .MoveNext
        Loop
    End With

    Set objDatabase = Nothing
    Set objQuery = Nothing

End Sub

When will we finally see the Office suite on Linux? Sure, LibreOffice is good but Microsoft Office suite still is de facto applications used by big and small businesses. If we want to see any kind of push for the Linux desktop in the business environment, the best way to do so is to keep the same applications users are used to.

From my point of view, with my background and education in accounting and finances, I need to have Excel. I have probably more combined hours of usage for Excel than any other software. I would argue this is probably true for a lot of other accountants, financial analysts, etc out there. My level of know-how and productivity with a software I have spent so much time with is not something I can easily convert to another spreadsheet app.

That‘s why I need to have a computer capable of running Excel. Right now, it means using one of two operating systems: Windows, or MacOS. Both are proprietary operating systems (MacOS as some open parts arguably). Both are not widely used for servers. Both are nowhere as secure as the alternatives.

If there is any glimmer of hope, it‘s that Microsoft is listening (to money maybe, wink azure). Most of Azure servers are, surprise surprise, running Linux. Redmond has no other choice but to change it‘s ways and open some of it‘s tools to other platforms (Microsoft Teams), or just open as in open source the code (.NET). Hopefully, they continue to do so and maybe, just maybe, we can finally have a native running Excel in Linux.

Until then, I will resign myself to use Windows. That will allow me to use most of my tools (thanks to WSL) while having access to Excel.