Background image of Clari3D Website
Clari3D logo Clari3D name

A cross platform CAD 3D viewer for Linux, Mac, Windows and Web [link]

Try it for free

htgen logo

HTGEN

the Website compiler and optimizer

2017, June 21

© 2017-2019, Andéor, SAS

Abstract

Websites generally use PHP or DOT.NET to handle the dynamic aspect of the user interaction.

Websites designers use these languages and frameworks to generate the Web pages dynamically, even if the page does not contain any dynamic element. This causes server overloads

We believe that Websites could be more cleanly implemented by separating the dynamic and the static parts.

We have implemented a tool, HTGEN, a Website compiler that solves the static aspects at compile time; the dynamic parts are handled by the used framework such as PHP.

HTGEN significantly reduces the server overloads by serving mostly static HTML pages; static pages are well cached in the network infrastructure. In addition, HTGEN transparently minifies the generated HTML, JS and CSS files. And HTGEN is able to handle Markdown simplified syntax to write Web pages...

The Website www.clari3d.com is entirely written using this exiting technology.

Traditional Web design

Php include

Websites are often designed with PHP, directly or using a CMS that uses itself PHP. Parts of a Web page are included with the include instruction. This way, each time a page is requested by a client, PHP includes the parts, parses and executes them:

<html>
  <head>
  </head>
  <body>
    <\?php
      include 'header.php';
      include 'content.php';
      include 'footer.php';
    \?>
  </body>
</html>

Loading, parsing, analysing, executing PHP code takes a few milliseconds. It is fast enough to be imperceptible by the end user. But as every page of the billion of Web pages in the world takes a few milliseconds on each rendering, billions of milliseconds are consumed for nothing.

The question is why to do dynamic inclusion while the page is always the same?

Database server

Generally, a Website is associated to a database server. On the Internet, the commonly used database server is MySQL.

SQL database servers are really cutting edge software that listen their input port for requests, analyse them, fetch the data from the storage and format the result. The queries are written in SQL language that is an old-fashioned but powerful language that allows really complex queries. These servers are very well optimized, using query optimization, caching, etc.

These servers can store billions of records and access them very quickly with smart and efficient indexing algorithms. They are fault tolerant, meaning that all the changes in the storage are transactional: they are committed of cancelled. In addition, they allow replication of the data.

A CMS generally stores its contents in a MySQL server. Each parts of a document is stored in some tables and retrieved on rendering.

The question stays why to store content in a database while the content change generally slowly?

There is a movement in the Internet called NoDB Website design that preaches the database server removal: the data are stored in the Web server file system. Notice that this is possible for small amount of data and database server cannot be avoided for millions of records...

HTML syntax

HTML is a relatively simple language. It is easy to write directly a Web page in HTML. However, this could become tedious for long pages.

<html>
  <head>
  </head>
  <body>
   <h1>Main title</h1>
   <p>This a paragraph</p>
   <p>And here, a new <strong>paragraph</strong></p>
   <ul>
    <li>And here, an item</li>
    <li>Followed by another item</li>
   </ul>
  </body>
</html>

This is why CMS are very useful: they have generally an online formatted text editor allowing to change the text online. The generated HTLM code, however, can be far from optimized...

Formatting tables can be tedious in HTML:

<html>
  <head>
  </head>
  <body>
   <h1>Table exemple</h1>
   <table>
    <tr>
     <th>head 1</th>
     <th>head 2</th>
    </tr>
    <tr>
     <td>text 11</td>
     <td>text 12</td>
    </tr>
    <tr>
     <td>text 21</td>
     <td>text 22</td>
    </tr>
    <tr>
     <td>text 31</td>
     <td>text 32</td>
    </tr>
   </table>
  </body>
</html>

The pains

Common Website design suffers for several pains:

Our Web design

This Website is designed taking account the pains: it uses static compiled pages, NoDB storage, Ajax and a MVC controller.

PHP is still used for the dynamic parts of the Web pages, such as the user login management and it is used in conjunction with Ajax to updated only some parts of the main page.

Static Website compiler HTgen

We have developed a tool to compile Websites. Of course, there is several Website generators, but they have generally a strong impact on the site organization; HTGEN is versatile and do not impose any structure of the Websites.

Scheme

HTGEN is written in Scheme, a Lisp dialect. Scheme expressions can be included inside the HTML pages with the special tags <?scm ... ?>.

The embedded Scheme code is executed during the Website generation; in the generated Web pages, there is no trace of the Scheme code.

One of the great command is @include "file": it includes the content of the file in place of the include command. Of course, includes can be nested.

The other great command is @files "search" that returns the list of the matching files. With this command, it is possible to obtain the list of some file and to process them.

Our index.html file is no more complicated than:

<;\?scm
 (@set! 'top-page "_top.html")
 (@set! 'title "Home page of Clari3D")
 (@set! 'description "Clari3D is a nice 3D viewer");
 (@set! 'keywords "andéor, 3D, viewer, cad, step, stl, obj, wavefront")
 (@template "_template.html")
\?>

<;\?scm
(for-each (lambda (name)
            (@include (string-append "_index-" name ".html")))
          '("top" "intro" "functions" "webgl" "products"
            "features" "customers" "company" "support"
            "sitemap"  "formats"   "video"   "news"
            "design"))
\?>

The generated file does not contain any call to PHP; this way, it is sent as is, immediately by the Web Server, without any processing, and so, this is very fast.

Output optimization

We have added a command to generate the images tags that automatically generates a low resolution images that is loaded first and the image at the right size that is loaded later.

In addition, the generated HTML code is optimized in order to reduce its size and so, increase the loading speed.

Wait and scan

The compiler can be run in a scan mode: In this mode, it waits for any changes to the source directory and recompile only the files that need recompilation.

This way, if a new file is added to the Website, it is automatically visible.

The News of this Website, for examples, are a collection of Markdown files in a specific directory. The code that statically generates the corresponding Web page runs through these files and generates their HTML counterpart.

If a new file is added, the News part is recompiled, making the changes visible almost immediately, with the difference that this conversion process is executed only once per new file. Generate once, view plenty!

Markdown

Markdown is a text file format that has two major features: be human readable and can be transformed into HTML easily. It is essentially used by open-source software makers for the read-me files and the documentation. However, as it is HTML compatible, it can be advantageously used in replacement for simple Web pages.

In Markdown, the example given above is:

# Main title

This a paragraph

And here, a new **paragraph**

* And here, an item
* Followed by another item

We have included a Markdown compiler in HTGEN that compiles plain Markdown pages in HTML, embedded Markdown pieces of text inside HTML code, using the <?md ... ?> tag. This capability to mix HTML and Markdown is really interesting.

And Markdown is especially powerful for tables:

#### Table exemple

| head 1  | head 2  |
|---------|---------|
| text 11 | text 12 |
| text 21 | text 22 |
| text 31 | text 32 |

Embedded Markdown

A great feature with HTGEN is the ability to mix HTML or PHP and Markdown in the same file.

<div class="split">
  <div>
    <?md
     Here we are in the Markdown world

     | This | is    |
     | a    | table |

     and this a new a new paragraph that can be
     written in several lines...
    ?>
  </div>
  <div>
    <?md
     Of course, as Markdown allows itself to include html
     <span class="italic">there is no limit!</span> to the
     integration!
    ?>
  </div>
</div>

Custom JavaScript framework causal.js

It exists a lot of JavaScript frameworks, dedicated to a specific task or more generalist, such as JQuery, Angular, Dojo, etc.

They are nice, stable, ... and somewhat huge. This is particularly true if a Website needs more than one Framework.

So, we have developed our own framework, Causal.js that automates some tasks such as Ajax queries, and have a nice set of User Interface widgets. We have designed Causal in order to have a small piece of versatile code and native OS Web widgets.

Ajax

Ajax is the way to query the Web server for an URL and to obtain the result of this query. Ajax allows updating dynamically some parts of the Web page without the need of reloading the whole page. Causal.js offers a great Ajax interface that makes the Ajax use simpler than ever.

Light MVC controller mvc.php

MVC is the acronym of Model-View-Controller. It is a programming method that is mainly used for PHP programming; it separates the model (how to access to the data), the view (how to render the Web page) and the controller (how to control the Page rendering).

It exists several MVC based PHP frameworks in the market, such as Laravel or Synphony. The main pain with these frameworks is that they have a strong impact on how the Website is structured and the Web designer is not free to structure the pages as he wants.

We have developed a specific MVC controller that has no impact on the site structure and that is very light and versatile.

Light NoDB file system data_file.php

Database storing is the first thought of the Website designers, as this is an easy way to implement a solution, and data can be queried with the powerful SQL language...

However, database storing has a drawback: it needs a database server! This server becomes quickly the bottleneck of the Website because all the data come from it. In addition, this imposes the pages to be dynamically rendered.

When the amount of data remains small enough (less than 500k records), data can be advantageously stored in the file system. However, file storing lost the record indexing (file names are indexed by the operating system, but not the internal attributes of a record).

We have designed a custom storing system with two drivers, one for file system storing, and one for database storing (they can be exchanged at any time because the API is identical).

This system is a relational database engine based on the file system. It supports fast indexing, 'select' like queries, constraints. We are working in a transaction manager. It is efficient with up to 100000 records.

More information

The API is defined with a PHP abstract class:

<?php
abstract class _Data {
  /*! debug flag */
  public $debug = false;

  /* check a table structure.
   * [attr-name => ['integer' | 'number' | 'string' |
   *                'primary' | 'index' |
   *                'uniq'    | 'not-null']
   * return boolean: the check status.
   */
  protected function _check_structure (& $structure);


  // I N T E R F A C E

  /*! Create a table.
   * @param name: the table name,
   * @param structure: the table structure,
   * @return boolean: the status.
   */
  abstract public function create ($name, $structure);

  /*! Test table existence.
   * @param name: the table name,
   * @return boolean: the status.
   */
  abstract public function exists ($name);

  /*! Delete a table.
   * @param name: the table name,
   * @param structure: the table structure,
   * @return boolean: the status.
   */
  abstract public function delete ($name);

  /*! Insert a new record in a table
   * @param table: the table name,
   * @param values: the values,
   * return record: a record.
   */
  abstract public function insert ($table, $values);

  /*! Remove some records.
   * @param $table: the table,
   * @param $records: the records to remove, as result of a select,
   * @return void.
   */
  abstract public function remove ($table, $records);

  /*! Update the given records of the table with the array of values.
   * @param table: the table,
   * @param records: the recorde, as a result of select,
   * @param values: the array of values to update,
   * @return void.
   */
  abstract public function update ($table, $records, $values);

  /*! Select the value of the matching records from the table where the
   * attributes equal the given values.
   * @param table the table,
   * @param attributes: array of attributes,
   * @param exprs: optional where expression
   * syntax:
   *   test  := [ '=' | '<>' | '<' | '>' | '<=' | '>=' | 'LIKE' ]
   *   exp   := [ 'ident' test 'value' ]
   *   op    := AND | OR
   *   where := exp
   * @return array or false.
   */
  abstract public function select ($table, $attributes, $exprs = false);
}

All the other PHP scripts in our system that access to data (mainly all the model.php files in our MVC implementation) use this class. At the initialization stage, the file driver or the database driver is chosen and instantiated. If we decide for some reasons to switch from a driver to another driver, all the code remains unchanged!

Conclusion

This prospective work about Website compilation was very exiting because the results really change the life of Web designers. We have divided the loading time by four comparing to the previous Website, PHP driven, even with the one-page design.

In addition, we have more control on the SEO aspects because all the generated pages can take advantage of the new compiler feature, once developed.

Markdown syntax has simplified dramatically the content management, making it simple, readable, manageable. Tables are written almost naturally and the other formatting syntax are easy to use. The possibility to combine Markdown and HTML enhances significantly the power of expression.

The SQL engine removal has also increased the loading time of the Website, suppressing the bottleneck of traditional Websites.

HTGen is not available for download; however it is not excluded that Andéor will distribute it as an open-source package...

Links

This Website is generated by HTgen, a Website compiler designed by Andéor.

The big advantage of compiling Websites is to speedup the page generation by the Web server because the pages are already composed...

The main advantages of htgen are: