ggplotnim overview
The ggplotnim packages is actually rather two packages in one.
It may or may not be split into two packages at some point.
The dataframe library was a necessity achieve a ggplot2 like syntax and provide ways to manipulate the data in the desired fashion.
For successful usage three major things have to be known:
- the DataFrame and the procs working on it (see: formula.nim documentation!)
- the f{} macro to create FormulaNodes, which are needed for the DF procs
- the ggplot syntax
Creating a DF
Usage typically starts with one of the following cases:
- data already available in seq[T] or some Nim object from which such can be created
- some CSV / TSV like ascii file
- some binary file like HDF5
- some database
Note about 3 and 4: simple access (without manually reading into a seq[T]) is not supported for these two yet. If there's demand for either of the two please open an issue. It should be easy to add some helper procs.
From seq[T]
For the case of having the data as seq[T], we just use the seqsToDf proc to create a DF from it. There are two ways to use it. Assuming we have three sequences of possibly different types:
let s1 = @[22, 54, 34] let s2: seq[float] = @[1.87, 1.75, 1.78] let s3: seq[string] = @["Mike", "Laura", "Sue"]
we can either create a DF and let the library automatically deduce the column names:
let dfAutoNamed = seqsToDf(s1, s2, s3)
which will give us a DF with column names:
"s1", "s2", "s3"
that is the idenfitier of the seq[T] is stringified. In many cases one might rather like a different name. In this case use the following syntax:
let df = seqsToDf({ "Age" : s1, "Height" : s2, "Name" : s3 })
which will then use the given strings for the column names.
From a CSV / TSV file
The second supported case is a CSV like file. For these the library provides a generalized readCsv proc. Strictly speaking it can also read TSV (or any delimited ascii file) and allows to skip N lines at the beginning.
proc readCsv*(fname: string, sep = ',', header = "#", skipLines = 0): OrderedTable[string, seq[string]]
Note that the header argument is only used to remove the header delimiter from the first line in the file. The header is always the first line! Also note that it returns not a DF, but an OrderedTable of string values only. To get a DF from this, we have to call toDf on the result. Let's use it to read the provided mpg dataset:
let df = toDf(readCsv("../data/mpg.csv"))
The toDf will try to determine the types of the columns. It first assumes something might be a number and an integer and starts with parseInt. If that fails, parseFloat is attempted. If that fails too, the element (and rest of the column) is parsed as a string. This also means that at the moment it's not possible to parse bool values as VBool from an ascii file! The downsides of this approach are visible e.g. when reading the msleep dataset. Since it contains many missing "NA", the columns which contain those will store the other float values as strings (this might be changed soon; at the moment every column is an "object column" in pandas terms anyways).
Manipulating a DF
Now we have a DF. What then?
First of all we can look at it. Echoing a DF calls the pretty proc. For the DF introduced above, this looks like:
echo df
gives for the mpg dataset:
Dataframe with 11 columns and 234 rows: Idx manufacturer model displ ... cyl ... drv cty hwy fl class 0 audi a4 1.8 ... 4 ... f 18 29 p compact 1 audi a4 1.8 ... 4 ... f 21 29 p compact 2 audi a4 2 ... 4 ... f 20 31 p compact 3 audi a4 2 ... 4 ... f 21 30 p compact 4 audi a4 2.8 ... 6 ... f 16 26 p compact 5 audi a4 2.8 ... 6 ... f 18 26 p compact 6 audi a4 3.1 ... 6 ... f 18 27 p compact 7 audi a4 quattro 1.8 ... 4 ... "4" 18 26 p compact 8 audi a4 quattro 1.8 ... 4 ... "4" 16 25 p compact 9 audi a4 quattro 2 ... 4 ... "4" 20 28 p compact 10 audi a4 quattro 2 ... 4 ... "4" 19 27 p compact 11 audi a4 quattro 2.8 ... 6 ... "4" 15 25 p compact 12 audi a4 quattro 2.8 ... 6 ... "4" 17 25 p compact 13 audi a4 quattro 3.1 ... 6 ... "4" 17 25 p compact 14 audi a4 quattro 3.1 ... 6 ... "4" 15 25 p compact 15 audi a6 quattro 2.8 ... 6 ... "4" 15 24 p midsize 16 audi a6 quattro 3.1 ... 6 ... "4" 17 25 p midsize 17 audi a6 quattro 4.2 ... 8 ... "4" 16 23 p midsize 18 chevrolet c1500 suburb... 5.3 ... 8 ... r 14 20 r suv 19 chevrolet c1500 suburb... 5.3 ... 8 ... r 11 15 e suv
(NOTE: I shortened the output for the docs here) Notice how in the drv column the 4WD entries are echoed as "4" instead of just 4. That is to highlight that those values are actually stored as VString.
By default only the first 20 entries will be shown. For more/less elements, call pretty directly:
echo df.pretty(100)
pretty also takes a precision argument. This is given to the string conversion for VFloat values to set the number of digits printed after the decimal point. However, it can also be used to change the width of the columns more generally. Note however the precision is added to a width of 6 by default. Also the column is at least as wide as the longest DF key.
Let's now check which cars in the dataset have the highest and lowest city fuel economy. For that we can simply arrange the dataframe according to the cty column and take the tail or head of the result.
echo df.arrange("cty").head(5)
results in:
Dataframe with 11 columns and 5 rows: Idx manufacturer model displ ... cyl ... drv cty hwy fl class 0 dodge dakota picku... 4.7 ... 8 ... "4" 9 12 e pickup 1 dodge durango 4wd 4.7 ... 8 ... "4" 9 12 e suv 2 dodge ram 1500 pic... 4.7 ... 8 ... "4" 9 12 e pickup 3 dodge ram 1500 pic... 4.7 ... 8 ... "4" 9 12 e pickup 4 jeep grand cherok... 4.7 ... 8 ... "4" 9 12 e suv
and looking at the tail instead:
echo df.arrange("cty").tail(5)
will tell us that a new beetle is the most efficient car in the dataset:
Dataframe with 11 columns and 5 rows: Idx manufacturer model displ ... cyl ... drv cty hwy fl class 0 honda civic 1.6 ... 4 ... f 28 33 r subcompact 1 toyota corolla 1.8 ... 4 ... f 28 37 r compact 2 volkswagen new beetle 1.9 ... 4 ... f 29 41 d subcompact 3 volkswagen jetta 1.9 ... 4 ... f 33 44 d compact 4 volkswagen new beetle 1.9 ... 4 ... f 35 44 d subcompact
(arrange also takes an order argument, using the stdlib's SortOrder enum).
As another example here to showcase the usage of FormulaNodes, let's find some cars with an engine displacement of more than 5 L and which are 2 seaters (I wonder what car might show up…):
echo df.filter(f{"displ" > 5.0 and "class" == "2seater"})
Dataframe with 11 columns and 5 rows: Idx manufacturer model displ ... cyl ... drv cty hwy fl class 0 chevrolet corvette 5.7 ... 8 ... r 16 26 p 2seater 1 chevrolet corvette 5.7 ... 8 ... r 15 23 p 2seater 2 chevrolet corvette 6.2 ... 8 ... r 16 26 p 2seater 3 chevrolet corvette 6.2 ... 8 ... r 15 25 p 2seater 4 chevrolet corvette 7 ... 8 ... r 15 24 p 2seater
Surprise, surprise we found ourselves a bunch of corvettes!
Finally, let's make use of a formula, which takes an assignment. Let's say we want to convert the city fuel economy of the cars from MPG to L/100 km as is the standard in Germany. We'll do this with mutate. mutate will add an additional column to the dataframe. (well, if only it was clear whether the mpg given are US gallon or imperial gallon?)
let dfl100km = df.filter(f{"displ" > 5.0 and "class" == "2seater"}) .mutate(f{"cty / L/100km" ~ 235 / "cty"}) echo dfl100km.pretty(5)
shows us:
Dataframe with 12 columns and 5 rows: Idx manufacturer model displ ... trans ... cty ... cty / L/100km 0 chevrolet corvette 5.7 ... manual(m6) ... 16 ... 14.69 1 chevrolet corvette 5.7 ... auto(l4) ... 15 ... 15.67 2 chevrolet corvette 6.2 ... manual(m6) ... 16 ... 14.69 3 chevrolet corvette 6.2 ... auto(s6) ... 15 ... 15.67 4 chevrolet corvette 7 ... manual(m6) ... 15 ... 15.67
where I removed a couple of columns for better visibility.
I used the chaining of filter and mutate above mainly to showcase that this works reliably. However, there's no magic happening to optimize any chaining!
When looking at the formula above note that as in ggplot2 the tilde ~ is used to indicate a dependency.
Finally it should be mentioned that it's possible to also call procs in the usage of formulas. Two kind of procs are supported. Either a proc takes a seq[T] and returns a T, or it takes a T and returns a T. These have to be lifted to work with PersistentVector[Value]. Helper templates to lift normal procs are provided. See formula.nim and check for lift<X><Y>Proc, where X of {Scalar, Vector} and Y of {Int, Float}.
ggplotnim continues to make the dataframe available of courseProcs
proc `+`(p: GgPlot; annot: Annotation): GgPlot {....raises: [], tags: [].}
- adds the given Annotation to the GgPlot object Source Edit
proc `+`(p: GgPlot; d: Draw) {....raises: [ValueError, Exception, KeyError, FormulaMismatchError, AestheticError, TimeFormatParseError, TimeParseError, OSError, IOError, PixieError], tags: [ RootEffect, TimeEffect, WriteDirEffect, ReadDirEffect, WriteIOEffect, ReadEnvEffect, ReadIOEffect, ExecIOEffect].}
- Source Edit
proc `+`(p: GgPlot; dateScale: DateScale): GgPlot {....raises: [], tags: [].}
- Add the given DateScale to the plot, which means filling the optional dateScale field Source Edit
proc `+`(p: GgPlot; facet: Facet): GgPlot {....raises: [], tags: [].}
- adds the given facet to the GgPlot object Source Edit
proc `+`(p: GgPlot; geom: Geom): GgPlot {....raises: [], tags: [].}
- adds the given geometry to the GgPlot object Source Edit
proc `+`(p: GgPlot; jsDraw: JsonDummyDraw) {....raises: [ValueError, Exception, KeyError, FormulaMismatchError, AestheticError, TimeFormatParseError, TimeParseError, IOError], tags: [RootEffect, TimeEffect, WriteIOEffect].}
- generate a JSON file from the given filename by replacing the file extension by .json and converting the Viewport to JSON after calling ggcreate. Used for CI. Source Edit
proc `+`(p: GgPlot; ridges: Ridges): GgPlot {....raises: [], tags: [].}
- adds the given ridges to the GgPlot object Source Edit
proc annotate(text: string; left = NaN; bottom = NaN; x = NaN; y = NaN; font = font(12.0); rotate = 0.0; backgroundColor = white): Annotation {. ...raises: [ValueError], tags: [].}
-
creates an annotation of text with a background backgroundColor (by default white) using the given font. Line breaks are supported. It is placed either at:
- (left, bottom), where these correspond to relative coordinates mapping out the plot area as (0.0, 1.0). NOTE: smaller and larger values than 0.0 and 1.0 are supported and will put the annotation outside the plot area.
- (x, y) where x and y are values in the scale of the data being plotted. This is useful if the annotation is to be placed relative to specific data points. NOTE: for a discrete axis data scale is not well defined, thus we fall back to relative scaling on that axis!
In principle you can mix and match left/x and bottom/y! If both are given the former will be prioritized.
NOTE: using rotate together with a background is currently broken.
Source Edit func backgroundColor[C: PossibleColor](color: C = grey92): Theme
- Sets the background color of the plotting area to color. Source Edit
proc canvasColor[C: PossibleColor](color: C): Theme
- sets the canvas color of the plot to the given color Source Edit
proc drawAnnotations(view: var Viewport; p: GgPlot) {. ...raises: [ValueError, Exception], tags: [].}
- draws all annotations from p onto the mutable view view. Source Edit
proc facet_wrap[T: FormulaNode | string](fns: varargs[T]; scales = "fixed"): Facet
- Source Edit
proc facetMargin[T: Quantity | SomeNumber](margin: T; quantityKind = ukCentimeter): Theme
- Sets the margin around each subplot when using faceting. The value can either be given directly as a Quantity, in which case the user has control over absolute / relative quantities or as a number. In the latter case the number is interpreted in centimeter! Source Edit
func fillIds(aes: Aesthetics; gids: set[uint16]): Aesthetics {....raises: [], tags: [].}
- Source Edit
proc generateRidge(view: Viewport; ridge: Ridges; p: GgPlot; filledScales: FilledScales; theme: Theme; hideLabels = false; hideTicks = false) {....raises: [KeyError, ValueError, Exception, TimeFormatParseError, TimeParseError], tags: [RootEffect, TimeEffect].}
- Source Edit
proc geom_bar[C: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); alpha: A = missing(); position = "stack"; stat = "count"): Geom
- Source Edit
proc geom_errorbar[C: PossibleColor; S: PossibleFloat; LT: PossibleLineType]( aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); lineType: LT = missing(); errorBarKind = ebLinesT; stat = "identity"; bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
-
NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value!
Possible LineTypes are:
ltNone, ltSolid, ltDashed, ltDotted, ltDotDash, ltLongDash, ltTwoDash
Possible ErrorBarKind are:
ebLines, ebLinesT
Source Edit proc geom_freqpoly[C: PossibleColor; S: PossibleFloat; LT: PossibleLineType; FC: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); lineType: LT = missing(); fillColor: FC = missing(); alpha: A = missing(); bins = 30; binWidth = 0.0; breaks: seq[float] = @[]; position = "identity"; stat = "bin"; binPosition = "center"; binBy = "full"; density = false): Geom
- Source Edit
proc geom_histogram[C: PossibleColor; FC: PossibleColor; LW: PossibleFloat; LT: PossibleLineType; A: PossibleFloat]( aes: Aesthetics = aes(); data = DataFrame(); binWidth = 0.0; bins = 30; breaks: seq[float] = @[]; color: C = missing(); fillColor: FC = missing(); alpha: A = missing(); position = "stack"; stat = "bin"; binPosition = "left"; binBy = "full"; density = false; lineWidth: LW = some(0.2); lineType: LT = some(ltSolid); hdKind: HistogramDrawingStyle = hdBars): Geom
- Source Edit
proc geom_line[C: PossibleColor; S: PossibleFloat; LT: PossibleLineType; FC: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); lineType: LT = missing(); fillColor: FC = missing(); alpha: A = missing(); stat = "identity"; bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
- Source Edit
proc geom_linerange[C: PossibleColor; S: PossibleFloat; LT: PossibleLineType]( aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); lineType: LT = missing(); stat = "identity"; bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
- NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value! Source Edit
proc geom_point[C: PossibleColor; S: PossibleFloat; M: PossibleMarker; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); marker: M = missing(); stat = "identity"; bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false; alpha: A = missing()): Geom
- NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value! Source Edit
proc geom_raster[C: PossibleColor; S: PossibleFloat; FC: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); fillColor: FC = missing(); alpha: A = missing(); size: S = missing(); stat = "identity"; bins = 30; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
- NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value! Source Edit
proc geom_smooth[C: PossibleColor; S: PossibleFloat; LT: PossibleLineType; FC: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); lineType: LT = missing(); fillColor: FC = missing(); alpha: A = missing(); span = 0.7; smoother = "svg"; ## the smoothing method to use `svg`, `lm`, `poly` polyOrder = 5; ## polynomial order to use (no effect for `lm`) bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
-
Draws a smooth line that is a filtered version of the x and y data given as the aes of the plot (or to this geom).
Note: if the input data is considered to be discrete (either manually or automatically if it's an integer column), a ValueError will be raised at runtime as smoothing is incompatible with a discrete plot and thus would lead to an undesirable outcome.
Source Edit proc geom_text[C: PossibleColor; S: PossibleFloat; M: PossibleMarker; A: PossibleFloat; F: PossibleFont](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); size: S = missing(); marker: M = missing(); alpha: A = missing(); font: F = missing(); alignKind = taCenter; stat = "identity"; bins = -1; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
- NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value! Source Edit
proc geom_tile[C: PossibleColor; S: PossibleFloat; FC: PossibleColor; A: PossibleFloat](aes: Aesthetics = aes(); data = DataFrame(); color: C = missing(); fillColor: FC = missing(); alpha: A = missing(); size: S = missing(); stat = "identity"; bins = 30; binWidth = 0.0; breaks: seq[float] = @[]; binPosition = "none"; position = "identity"; binBy = "full"; density = false): Geom
- NOTE: When using a different position than identity, be careful reading the plot! If N classes are stacked and an intermediate class has no entries, it will be drawn on top of the previous value! Source Edit
proc ggcreate[T: SomeNumber](p: GgPlot; width: T = 640.0; height: T = 480.0): PlotView
-
Applies all calculations to the GgPlot object required to draw the plot with the selected backend (either determined via filetype in ggsave, handed manually to ggplot) and returns a PlotView.
The PlotView contains the final Scales built from the GgPlot object and all its geoms plus the final ginger.Viewport which only has to be drawn to produce the plot.
This proc is useful to investigate the data structure that results before actually producing an output file or to combine multiple plots into a combined viewport.
Source Edit proc ggdraw(plt: PlotView; fname: string; texOptions: TeXOptions = TeXOptions()) {. ...raises: [Exception, PixieError, ValueError, OSError, IOError], tags: [ WriteIOEffect, RootEffect, ReadEnvEffect, ReadIOEffect, ReadDirEffect, ExecIOEffect, TimeEffect].}
- draws the viewport of the given PlotView and stores it in fname. It assumes that the plt` was created from a GgPlot object with ggcreate Source Edit
proc ggdraw(view: Viewport; fname: string; texOptions: TeXOptions = TeXOptions()) {. ...raises: [Exception, PixieError, ValueError, OSError, IOError], tags: [ WriteIOEffect, RootEffect, ReadEnvEffect, ReadIOEffect, ReadDirEffect, ExecIOEffect, TimeEffect].}
- draws the given viewport and stores it in fname. It assumes that the view was created as the field of a PlotView object from a GgPlot object with ggcreate Source Edit
proc ggjson(fname: string; width = 640.0; height = 480.0; backend = bkCairo): JsonDummyDraw {. ...raises: [], tags: [].}
- Source Edit
proc ggmulti(plts: openArray[GgPlot]; fname: string; width = 640; height = 480; widths: seq[int] = @[]; heights: seq[int] = @[]; useTeX = false; onlyTikZ = false; standalone = false; texTemplate = ""; caption = ""; label = ""; placement = "htbp") {....raises: [ ValueError, Exception, KeyError, FormulaMismatchError, AestheticError, TimeFormatParseError, TimeParseError, PixieError, OSError, IOError], tags: [ RootEffect, TimeEffect, WriteIOEffect, ReadEnvEffect, ReadIOEffect, ReadDirEffect, ExecIOEffect].}
-
Creates a simple multi plot in a grid. Currently no smart layouting. If widths and heights is given, expects a sequence of numbers of the length of given plots. It will use the width/height of the same index for the corresponding plot.
For an explanaiton of the TeX arguments, see the ggsave docstring.
Source Edit proc ggplot(data: DataFrame; aes: Aesthetics = aes(); backend = bkNone): GgPlot {. ...raises: [], tags: [].}
- Note: The backend argument is required when using ggcreate with a a ggplot argument without ggsave. All string related placements require knowledge of a backend to compute absolute positions. Source Edit
proc ggsave(fname: string; width = 640.0; height = 480.0; useTeX = false; onlyTikZ = false; standalone = false; texTemplate = ""; caption = ""; label = ""; placement = "htbp"; backend = bkNone): Draw {. ...raises: [], tags: [].}
-
Generates the plot and saves it as fname with the given width and height.
Possible file types:
- png
- svg
- tex
The backend argument can be used to overwrite the logic that is normally used to determine the backend used to save the figures. Note that different backends only support different file types. Currently there are no safeguards for mismatching backends and file types! Available backends:
- bkCairo: default backend, supports png, pdf, svg, jpg
- bkTikZ: TikZ backend to generate native LaTeX files or PDFs from TeX
- bkPixie: pure Nim backend supporting png
- bkVega:: Vega-Lite backend for interactive graphs in the browser
If the output file is to be stored as a pdf, useTeX decides whether to create the file using Cairo or a local LaTeX installation (by default system xelatex if available, with pdflatex as the fallback). In case useTeX is true, standalone is always taken to be true (unless a texTemplate is given).
If the File type is tex, onlyTikZ determines whether to output only the TikZ code to a file or create a full TeX document (that can be directly compiled).
Further, standalone decides what kind of document is created in case onlyTikZ is false. standalone means it's meant as a TeX file that only contains the plot and produces a cropped plot upon compilation. If standalone is false the document is an article.
The priority of onlyTikZ, standalone and texTemplate is as follows:
- texTemplate: if given will be used regardless of the others
- onlyTikZ: higher precedence than standalone
- standalone: only chosen if above two are false / empty
Further, if a texTemplate is given that template is used to embed the TikZ code. The template must contain a single $# for the location at which the TikZ code is to be embeded.
Finally, if a caption and / or label are given, the output will wrap the tikzpicture in a figure environment, with placement options placement.
The default TeX templates are found here: https://github.com/Vindaar/ginger/blob/master/src/ginger/backendTikZ.nim#L244-L274
The required TeX packages are thus: inputenc, geometry, unicode-math, amsmath, siunitx, tikz.
Note: unicode-math is incompatible with pdflatex!
Note 2: Placing text on the TikZ backend comes with some quirks:
1. text placement may be slightly different than on the Cairo backend, as we currently
use a hack to determine string widths / heights based on font size alone. ginger needs an overhaul to handle embedding of coordinates into viewports to keep string width / height information until the locations are written to the output file (so that we can make use of text size information straight from TeX)
2. Text is placed into TikZ node elements. These have some quirky behavior for more
complex LaTeX constructs. E.g. it is not really possible to use an equation environment in them (leads to "Missing $ inserted" errors).
3. because of hacky string width / height determination placing a non transparent background
for annotations leads to background rectangles that are too small. Keep the background color transparent for the time being.
4. Do not include line breaks \n in your annotations if you wish to
let LaTeX handle line breaks for you. Any manual line break \n will be handled by ginger. Due to the string height hack, this can give somewhat ugly results.
Source Edit proc ggsave(p: GgPlot; fname: string; width = 640.0; height = 480.0; texOptions: TeXOptions; backend: BackendKind = bkNone) {....raises: [ ValueError, Exception, KeyError, FormulaMismatchError, AestheticError, TimeFormatParseError, TimeParseError, OSError, IOError, PixieError], tags: [ RootEffect, TimeEffect, WriteDirEffect, ReadDirEffect, WriteIOEffect, ReadEnvEffect, ReadIOEffect, ExecIOEffect].}
-
This is the same as the ggsave proc below for the use case of calling it directly on a GgPlot object using a possible TeX options object.
See the docstring there.
Source Edit proc ggsave(p: GgPlot; fname: string; width = 640.0; height = 480.0; useTeX = false; onlyTikZ = false; standalone = false; texTemplate = ""; caption = ""; label = ""; placement = "htbp") {....raises: [ ValueError, Exception, KeyError, FormulaMismatchError, AestheticError, TimeFormatParseError, TimeParseError, OSError, IOError, PixieError], tags: [ RootEffect, TimeEffect, WriteDirEffect, ReadDirEffect, WriteIOEffect, ReadEnvEffect, ReadIOEffect, ExecIOEffect].}
-
This is the same as the ggsave proc below for the use case of calling it directly on a GgPlot object with the possible TeX options.
See the docstring below.
Source Edit proc ggvegatex(fname: string; width = 640.0; height = 480.0; caption = ""; label = ""; placement = "htbp"): VegaTeX {....raises: [], tags: [].}
-
Generates two versions of of the given plot. The filename should not contain any extension. We will generate a .tex file and a .json file.
The TeX file is suppposed to be inserted (using \input) into the LaTeX file.
The JSON file should be stored as a GitHub gist from which it can be imported into the Vega-Lite viewer.
NOTE: The width and height arguments don't have any purpose at this time.
Source Edit func gridLineColor[C: PossibleColor](color: C = white): Theme {....deprecated: "Use the `gridLines` procedure to set the grid line color among other things.".}
- Source Edit Sets the color of the grid lines.
func gridLines[C: PossibleColor](enable = true; width = Inf; color: C = white; onlyAxes = false): Theme
-
Adds major grid lines to a plot
If width != Inf will use the value, else default of 1pt.
The color may also be changed with this proc if multiple changes are to be made.
If onlyAxes is true and enable is false, we will only draw the actual axes and no grid lines.
Source Edit proc hideLegend(): Theme {....raises: [], tags: [].}
- hides the legend, even if it would otherwise be required Source Edit
proc legendOrder(idx: seq[int]): Theme {....raises: [], tags: [].}
- uses the ordering given by the indices idx to arrange the order of the label. idx needs to have as many elements as there are legend entries. The default ordering is lexical ordering. Any custom ordering is a custom permutation of that. TODO: instead of this the legend creation needs to be refactored! This is an experimental, untested feature. A better solution that does not require the user to be keenly aware of the "correct" order of the legend is required. For the time being this is better than nothing though. Source Edit
proc legendPosition(x = 0.0; y = 0.0): Theme {....raises: [], tags: [].}
- puts the legend at position (x, y) in relative coordinates of the plot viewport in range (0.0 .. 1.0) Source Edit
proc margin[T: string | UnitKind](left, right, top, bottom = NaN; unit: T = ukCentimeter): Theme
-
Sets the margin around the actual plot. By default the given values are interpreted as cm. This can be changed using the unit argument, either directly as a UnitKind (ukCentimeter, ukInch, ukPoint) or as a string. Allowed string inputs:
- cm: margin quantity in centimeter
- in, inch: margin quantity in inch
- pt, point, px, pixel: margin quantity in points
- r, rel, relative: margin quantity as relative values
func minorGridLines(enable = true; width = Inf): Theme {....raises: [], tags: [].}
-
Adds minor grid lines to a plot (i.e. between the major grid lines of half width)
If width is != Inf will use the given width. Else will compute to half the width of the major lines.
Source Edit proc orNoneScale[T: string | SomeNumber | FormulaNode](s: T; scKind: static ScaleKind; axKind = akX; hasDiscreteness = false): Option[ Scale]
- returns either a some(Scale) of kind ScaleKind or none[Scale] if s is empty Source Edit
proc prefer_columns(): Theme {....raises: [], tags: [].}
- Sets the preference in a facet to be num(cols) > num(rows) Source Edit
proc prefer_rows(): Theme {....raises: [], tags: [].}
- Sets the preference in a facet to be num(rows) > num(cols) Source Edit
proc scale_alpha_continuous(alphaRange = DefaultAlphaRange): Scale {....raises: [], tags: [].}
- Allows set the alpha scale to continuous values & optionally set the range of allowed values in alphaRange. Source Edit
proc scale_alpha_discrete(alphaRange = DefaultAlphaRange): Scale {....raises: [], tags: [].}
- Allows set the alpha scale to discrete values & optionally set the range of allowed values in alphaRange. Source Edit
proc scale_alpha_identity(col = ""): Scale {....raises: [], tags: [].}
- Given a column col, will treat the values inside the column as 'identity values'. I.e. the values will be used for the values of the associated aesthetic, even if it's a size or color scale. (in this case the alpha scale). Source Edit
proc scale_color_continuous(name: string = ""; scale: ginger.Scale = (low: 0.0, high: 0.0)): Scale {. ...raises: [], tags: [].}
- Source Edit
proc scale_color_gradient(scale: ColorScale | seq[uint32]; name: string = "custom"): Scale
-
Allows to customize the color gradient used for the color aesthetic.
Either call one of:
- viridis(), magma(), plasma(), inferno()
as an argument to the scale argument or hand your own custom color scale (as uint32 colors).
Construction of uint32 colors can be done "by hand": Assuming alpha, r, g, b are uint8 values:
let c = alpha shl 24 or r shl 16 or g shl 8 or b
or by constructing it as a hex color directly:
let c = 0xaa_ff_00_ff'u32 # hex color "#FF00FF" with AA alpha
or finally by converting from a chroma or a stdlib/colors color.
Note: chroma does not have a type that stores uint32 colors, so the conversion must be done by hand. The stdlib / colors module uses int64 values for whatever reason. For those you can just convert them to uint32. Both of these options provide your typical "named CSS-like" colors.
The name argumnet is only used in case a seq[uint32] is given and is not particularly important.
Source Edit proc scale_color_identity(col = ""): Scale {....raises: [], tags: [].}
- Given a column col, will treat the values inside the column as 'identity values'. I.e. the values will be used for the values of the associated aesthetic, even if it's a size or color scale. (in this case the color scale). Source Edit
proc scale_color_manual[T](values: Table[T, Color]): Scale
- allows to set custom colors, by handing a table mapping the keys found in the color column to colors. Source Edit
proc scale_fill_continuous(name: string = ""; scale: ginger.Scale = (low: 0.0, high: 0.0)): Scale {. ...raises: [], tags: [].}
-
Forces the fill scale to be continuous.
If a scale is given, the fill scale range will be drawn in the given range.
Source Edit proc scale_fill_discrete(name: string = ""): Scale {....raises: [], tags: [].}
- Forces the fill scale to be discrete. Source Edit
proc scale_fill_gradient(scale: ColorScale | seq[uint32]; name: string = "custom"): Scale
-
Allows to customize the color gradient used for the color aesthetic.
Either call one of:
- viridis(), magma(), plasma(), inferno()
as an argument to the scale argument or hand your own custom color scale (as uint32 colors).
Construction of uint32 colors can be done "by hand": Assuming alpha, r, g, b are uint8 values:
let c = alpha shl 24 or r shl 16 or g shl 8 or b
or by constructing it as a hex color directly:
let c = 0xaa_ff_00_ff'u32 # hex color "#FF00FF" with AA alpha
or finally by converting from a chroma or a stdlib/colors color.
Note: chroma does not have a type that stores uint32 colors, so the conversion must be done by hand. The stdlib / colors module uses int64 values for whatever reason. For those you can just convert them to uint32. Both of these options provide your typical "named CSS-like" colors.
The name argumnet is only used in case a seq[uint32] is given and is not particularly important.
Source Edit proc scale_fill_identity(col = ""): Scale {....raises: [], tags: [].}
- Given a column col, will treat the values inside the column as 'identity values'. I.e. the values will be used for the values of the associated aesthetic, even if it's a size or color scale. (in this case the fill color scale). Source Edit
proc scale_fill_manual[T](values: Table[T, Color]): Scale
- allows to set custom fill colors, by handing a table mapping the keys found in the fill column to colors. Source Edit
proc scale_size_continuous(sizeRange = DefaultSizeRange): Scale {....raises: [], tags: [].}
- Allows set the size scale to continuous values & optionally set the range of allowed values in sizeRange. Source Edit
proc scale_size_discrete(sizeRange = DefaultSizeRange): Scale {....raises: [], tags: [].}
- Allows set the size scale to discrete values & optionally set the range of allowed values in sizeRange. Source Edit
proc scale_size_identity(col = ""): Scale {....raises: [], tags: [].}
- Given a column col, will treat the values inside the column as 'identity values'. I.e. the values will be used for the values of the associated aesthetic, even if it's a size or color scale. (in this case the size scale). Source Edit
proc scale_size_manual[T](values: Table[T, float]): Scale
- allows to set custom sizes, by handing a table mapping the keys found in the size column to sizes. Source Edit
proc scale_x_continuous[P: PossibleSecondaryAxis; T: int | seq[SomeNumber]]( secAxis: P = missing(); name: string = ""; breaks: T = newSeq[float](); labels: proc (x: float): string = nil; trans: proc (x: float): float = nil; invTrans: proc (x: float): float = nil): Scale
-
Creates a continuous x axis with a possible secondary axis. labels allows to hand a procedure, which maps the values found on the x axis to the tick label that should be shown for it.
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_x_date[T: seq[SomeNumber]](name: string = ""; breaks: T = newSeq[float](); isTimestamp = false; parseDate: proc ( x: string): DateTime = nil; formatString: string = "yyyy-MM-dd"; dateSpacing: Duration = initDuration( days = 1); dateAlgo: DateTickAlgorithmKind = dtaFilter): DateScale
-
Creates a continuous x axis that generates labels according to the desired date time information.
If breaks is given as a sequence of unix timestamps, these will be used over those computed via dateSpacing.
isTimestamp means the corresponding x column of the input data is a unix timestamp, either as an integer or a floating value.
parseDate is required if the data is not a timestamp. It needs to handle the parsing of the stored string data in the x column to convert it to DateTime objects.
dateSpacing is the desired distance between each tick. It is used as a reference taking into account the given formatString. Of all possible ticks allowed by formatString those ticks are used that have the closest distance to dateSpacing, starting with the first tick in the date range that can be represented by formatString.
The dateAlgo argument is an experimental argument that should not be required. It changes the algorithm that is used to determine sensible tick labels based on the given dateSpacing. In the case of dtaFilter (default) we compute the parsed dates for all elements in the date time column first and then attempt to filter out all values to leave those that match the dateSpacing. This works well for densely packed timestamps in a column and deals better with rounding of e.g. 52 weeks =~= 1 year like tick labels. dateAlgo is overwritten if breaks is given.
For sparser time data, use the dtaAddDuration algoritm, which simply determines the first suitable date based on the format string and adds the dateSpacing to each of these. The next matching date based on the formatString is used. This does not handle rounding of dates well (4 weeks =~= 1 month will produce mismatches at certain points for example), but should be more robust.
Source Edit proc scale_x_discrete[P: PossibleSecondaryAxis; T; U](name: string = ""; labels: OrderedTable[T, U]; secAxis: P = missing()): Scale
- creates a discrete x axis with a possible secondary axis. Source Edit
proc scale_x_discrete[P: PossibleSecondaryAxis](name: string = ""; secAxis: P = missing(); labels: proc (x: Value): string = nil): Scale
- creates a discrete x axis with a possible secondary axis. labels allows to hand a procedure, which maps the values found on the x axis to the tick label that should be shown for it. Source Edit
proc scale_x_log2[T: int | seq[SomeNumber]](breaks: T = newSeq[float]()): Scale
-
Sets the X scale of the plot to a log2 scale.
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_x_log10[T: int | seq[SomeNumber]](breaks: T = newSeq[float]()): Scale
-
Sets the X scale of the plot to a log10 scale.
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_x_reverse[P: PossibleSecondaryAxis](name: string = ""; secAxis: P = missing(); dcKind: DiscreteKind = dcContinuous): Scale
- creates a continuous x axis with a possible secondary axis, which is reversed Source Edit
proc scale_y_continuous[P: PossibleSecondaryAxis; T: int | seq[SomeNumber]]( name: string = ""; breaks: T = newSeq[float](); secAxis: P = missing(); labels: proc (x: float): string = nil; trans: proc (x: float): float = nil; invTrans: proc (x: float): float = nil): Scale
-
Creates a continuous y axis with a possible secondary axis. labels allows to hand a procedure, which maps the values found on the y axis to the tick label that should be shown for it.
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_y_date[T: seq[SomeNumber]](name: string = ""; breaks: T = newSeq[float](); isTimestamp = false; parseDate: proc ( x: string): DateTime = nil; formatString: string = "yyyy-MM-dd"; dateSpacing: Duration = initDuration( days = 1); dateAlgo: DateTickAlgorithmKind = dtaFilter): DateScale
-
Creates a continuous y axis that generates labels according to the desired date time information.
If breaks is given as a sequence of unix timestamps, these will be used over those computed via dateSpacing.
isTimestamp means the corresponding y column of the input data is a unix timestamp, either as an integer or a floating value.
parseDate is required if the data is not a timestamp. It needs to handle the parsing of the stored string data in the y column to convert it to DateTime objects.
dateSpacing is the desired distance between each tick. It is used as a reference taking into account the given formatString. Of all possible ticks allowed by formatString those ticks are used that have the closest distance to dateSpacing, starting with the first tick in the date range that can be represented by formatString.
The dateAlgo argument is an experimental argument that should not be required. It changes the algorithm that is used to determine sensible tick labels based on the given dateSpacing. In the case of dtaFilter (default) we compute the parsed dates for all elements in the date time column first and then attempt to filter out all values to leave those that match the dateSpacing. This works well for densely packed timestamps in a column and deals better with rounding of e.g. 52 weeks =~= 1 year like tick labels. dateAlgo is overwritten if breaks is given.
For sparser time data, use the dtaAddDuration algoritm, which simply determines the first suitable date based on the format string and adds the dateSpacing to each of these. The next matching date based on the formatString is used. This does not handle rounding of dates well (4 weeks =~= 1 month will produce mismatches at certain points for example), but should be more robust.
Source Edit proc scale_y_discrete[P: PossibleSecondaryAxis; T; U](name: string = ""; labels: OrderedTable[T, U]; secAxis: P = missing()): Scale
- creates a discrete x axis with a possible secondary axis. Source Edit
proc scale_y_discrete[P: PossibleSecondaryAxis](name: string = ""; secAxis: P = missing(); labels: proc (x: Value): string = nil): Scale
- creates a discrete y axis with a possible secondary axis. labels allows to hand a procedure, which maps the values found on the x axis to the tick label that should be shown for it. Source Edit
proc scale_y_log2[T: int | seq[SomeNumber]](breaks: T = newSeq[float]()): Scale
-
Sets the Y scale of the plot to a log2 scale
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_y_log10[T: int | seq[SomeNumber]](breaks: T = newSeq[float]()): Scale
-
Sets the Y scale of the plot to a log10 scale.
breaks allows to specify either the number of ticks desired (in case an integer is given) or the exact locations of the ticks given in units of the data space belonging to this axis.
Note that the exact number of desired ticks is usually not respected, rather a close number that yields "nice" tick labels is chosen.
Source Edit proc scale_y_reverse[P: PossibleSecondaryAxis](name: string = ""; secAxis: P = missing(); dcKind: DiscreteKind = dcContinuous): Scale
- creates a continuous y axis with a possible secondary axis, which is reversed Source Edit
func sec_axis(trans: FormulaNode = f{""}; transFn: ScaleTransform = nil; invTransFn: ScaleTransform = nil; name: string = ""): SecondaryAxis {. ...raises: [Exception], tags: [].}
- convenience proc to create a SecondaryAxis Source Edit
proc theme_latex(): Theme {....raises: [], tags: [].}
- Returns a theme that is designed to produce figures that look nice in a LaTeX document without manual adjustment of figure sizes. Source Edit
func theme_opaque(): Theme {....raises: [], tags: [].}
- returns the "opaque" theme. For the time being this only means the canvas of the plot is white, which is the default starting from version v0.4.0. For the old behavior add theme_transparent. Source Edit
func theme_transparent(): Theme {....raises: [], tags: [].}
- returns the "transparent" theme. This is the default for plots before version v0.4.0. Source Edit
func theme_void[C: PossibleColor](color: C = white): Theme
-
returns the "void" theme. This means:
- white background
- no grid lines
- no ticks
- no tick labels
- no labels
func xlim[T, U: SomeNumber](low: T; high: U; outsideRange = ""): Theme
-
Sets the limits of the plot range in data scale. This overrides the calculation of the data range, which by default is just (min(dataX), max(dataX)) while ignoring inf values. If the given range is smaller than the actual underlying data range, outsideRange decides how data outside the range is treated.
Supported values are "clip", "drop" and "none":
- "clip": clip all larger values (e.g. inf or all values larger than a user defined xlim) to limit + xMargin (see below).
- "drop": remove all values larger than range
- "none": leave as is. Might result in values outside the plot area. Also -inf values may be shown as large positive values. This is up to the drawing backend!
It defaults to "clip".
Be aware however that the given limit is still subject to calculation of sensible tick values. The algorithm tries to make the plot start and end at "nice" values (either 1/10 or 1/4 steps). Setting the limit to some arbitrary number may not result in the expected plot. If a limit is to be forced, combine this with xMargin! (Note: if for some reason you want more control over the precise limits, please open an issue).
NOTE: for a discrete axis the "data scale" is (0.0, 1.0). You can change it here, but it will probably result in an ugly plot!
Source Edit proc xMargin[T: SomeNumber](margin: T; outsideRange = ""): Theme
-
Sets a margin on the plot data scale for the X axis relative to the full data range. margin = 0.05 extends the data range by 5 % of the difference of xlim.high - xlim.low (see xlim proc) on the left and right side. outsideRange determines the behavior of all points which lie outside the plot data range. If not set via xlim the plot data range is simply the full range of all x values, ignoring all inf values. Supported values are "clip", "drop" and "none":
- "clip": clip all larger values (e.g. inf or all values larger than a user defined xlim) to limit + xMargin.
- "drop": remove all values larger than range
- "none": leave as is. Might result in values outside the plot area. Also -inf values may be shown as large positive values. This is up to the drawing backend!
It defaults to "clip".
NOTE: negative margins are not supported at the moment! They would result in ticks and labels outside the plot area.
Source Edit func ylim[T, U: SomeNumber](low: T; high: U; outsideRange = ""): Theme
-
Sets the limits of the plot range in data scale. This overrides the calculation of the data range, which by default is just (min(dataY), max(dataY)) while ignoring inf values. If the given range is smaller than the actual underlying data range, outsideRange decides how data outside the range is treated.
Supported values are "clip", "drop" and "none":
- "clip": clip all larger values (e.g. inf or all values larger than a user defined ylim) to limit + yMargin (see below).
- "drop": remove all values larger than range
- "none": leave as is. Might result in values outside the plot area. Also -inf values may be shown as large positive values. This is up to the drawing backend!
It defaults to "clip".
Be aware however that the given limit is still subject to calculation of sensible tick values. The algorithm tries to make the plot start and end at "nice" values (either 1/10 or 1/4 steps). Setting the limit to some arbitrary number may not result in the expected plot. If a limit is to be forced, combine this with yMargin! (Note: if for some reason you want more control over the precise limits, please open an issue).
NOTE: for a discrete axis the "data scale" is (0.0, 1.0). You can change it here, but it will probably result in an ugly plot!
Source Edit proc yMargin[T: SomeNumber](margin: T; outsideRange = ""): Theme
-
Sets a margin on the plot data scale for the Y axis relative to the full data range. margin = 0.05 extends the data range by 5 % of the difference of ylim.high - ylim.low (see ylim proc) on the top and bottom side. outsideRange determines the behavior of all points which lie outside the plot data range. If not set via ylim the plot data range is simply the full range of all y values, ignoring all inf values. Supported values are "clip", "drop" and "none":
- "clip": clip all larger values (e.g. inf or all values larger than a user defined ylim) to limit + yMargin.
- "drop": remove all values larger than range
- "none": leave as is. Might result in values outside the plot area. Also -inf values may be shown as large positive values. This is up to the drawing backend!
It defaults to "clip".
NOTE: negative margins are not supported at the moment! They would result in ticks and labels outside the plot area.
Source Edit
Macros
macro aes(args: varargs[untyped]): untyped
-
This macro parses the given arguments and returns an Aesthetics object based on the given input. The argument has to be an argument list, which can have have elements of different forms.
- named / unnamed arguments:
- for named arguments, the name must be a valid field of the Aesthetics object
- unnamed arguments are supported. The macro picks the field corresponding to the index of each field in the order of the fields of Aesthetics fields
In principle unnamed arguments can follow named ones, but better do not abuse that...
- Different types are supported for the values
- literals: string, int, float. Will be treated as constant FormulaNode values for the associated scale (useful for e.g. width = 0.5)
- formula nodes: formula nodes are simply assigned as the columns for the generated scale. Can refer to a column or a complicated expression.
- idents: raw idents are supported. If the identifier refers to something declared, the value of that is used. Else the identifier is treated as a string. Be careful with this feature!
- named / unnamed arguments:
Exports
-
Ridges, PossibleMarker, hash, ColorScale, Facet, PossibleColor, ==, orkNone, PossibleLineType, PlotView, orkDrop, hash, bpNone, hash, MainAddScales, $, stIdentity, orkClip, DateTickAlgorithmKind, Missing, Draw, ThemeMarginLayout, dkMapping, bpRight, Annotation, toOptColor, smLM, OutsideRangeKind, PrevValsCol, FilledScales, stSmooth, smSVG, DiscreteKind, BinByKind, HistogramDrawingStyle, BinPositionKind, SmoothMethodKind, StyleLabel, $, pkDodge, PossibleFloat, ScaleFreeKind, PossibleSecondaryAxis, GgPlot, DiscreteFormat, Theme, JsonDummyDraw, smPoly, CountCol, FilledGeom, hash, VegaDraw, bbFull, stBin, bpCenter, StatKind, hdOutline, DataKind, GeomKind, pkIdentity, ScaleKind, hdBars, PossibleFont, $, assignIdentityScalesGetStyle, ScaleValue, Scale, bpLeft, $, DateScale, SecondaryAxis, Aesthetics, sfFree, ScaleTransform, VegaTeX, $, missing, pkFill, pkStack, hash, PositionKind, SmoothValsCol, sfFreeX, toOptSecAxis, sfFixed, ContinuousFormat, dkSetting, sfFreeY, AestheticError, GgStyle, Geom, PossibleErrorBar, stCount, bbSubset, calcRowsColumns, unwrap, font, viridis, magma, inferno, plasma