annotate th_regex.c @ 789:d61d3eb29053 default tip

Bump copyright.
author Matti Hamalainen <ccr@tnsp.org>
date Fri, 08 Mar 2024 15:26:24 +0200
parents c17eadc60c3d
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1 /*
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
2 * Simple regular expression matching functionality
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
3 * Programmed and designed by Matti 'ccr' Hamalainen
726
29e44a58bc73 Bump copyrights.
Matti Hamalainen <ccr@tnsp.org>
parents: 722
diff changeset
4 * (C) Copyright 2020-2022 Tecnic Software productions (TNSP)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
5 *
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
6 * Please read file 'COPYING' for information on license and distribution.
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
7 */
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
8 #include "th_regex.h"
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
9
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
10
635
d191ded8a790 Improve the experimental regex matching debugging macros.
Matti Hamalainen <ccr@tnsp.org>
parents: 614
diff changeset
11 #ifdef TH_EXPERIMENTAL_REGEX_DEBUG
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
12 th_ioctx_t *th_dbg_fh = NULL;
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
13
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
14 # define DBG_RE_PRINT(...) do { \
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
15 if (th_dbg_fh != NULL) \
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
16 { \
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
17 th_regex_dump_indent(th_dbg_fh, level); \
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
18 thfprintf(th_dbg_fh, __VA_ARGS__); \
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
19 } \
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
20 } while (0)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
21 #else
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
22 # define DBG_RE_PRINT(...)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
23 #endif
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
24
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
25
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
26 /// @cond
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
27 enum
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
28 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
29 TH_RE_MATCH_ONCE,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
30 TH_RE_MATCH_COUNT,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
31 TH_RE_MATCH_ANCHOR_START,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
32 TH_RE_MATCH_ANCHOR_END,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
33 };
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
34
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
35
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
36 enum
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
37 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
38 TH_RE_TYPE_CHAR,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
39 TH_RE_TYPE_STR,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
40 TH_RE_TYPE_ANY_CHAR,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
41 TH_RE_TYPE_LIST,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
42 TH_RE_TYPE_LIST_REVERSE,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
43 TH_RE_TYPE_SUBEXPR,
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
44 };
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
45
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
46
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
47 static const char *re_match_modes[] =
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
48 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
49 "ONCE",
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
50 "COUNT",
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
51 "ANCHOR START",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
52 "ANCHOR END",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
53 };
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
54
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
55
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
56 static const char *re_match_types[] =
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
57 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
58 "CHAR",
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
59 "STR",
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
60 "ANY",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
61 "LIST",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
62 "LIST REVERSE",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
63 "SUBEXPR",
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
64 };
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
65
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
66
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
67 typedef struct
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
68 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
69 int type;
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
70 th_char_t start, end;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
71
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
72 size_t nchars;
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
73 th_char_t *chars;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
74 } th_regex_list_item_t;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
75
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
76
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
77 typedef struct
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
78 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
79 size_t nitems, itemssize;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
80 th_regex_list_item_t *items;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
81 } th_regex_list_t;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
82
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
83
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
84 typedef struct
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
85 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
86 int mode, type;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
87 ssize_t repeatMin, repeatMax;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
88
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
89 struct {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
90 th_char_t chr;
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
91 th_char_t *str;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
92 th_regex_list_t list;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
93
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
94 th_regex_t *expr;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
95 } match;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
96 } th_regex_node_t;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
97
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
98
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
99 typedef struct
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
100 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
101 const th_char_t *pattern;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
102 size_t offs;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
103
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
104 th_regex_t *data;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
105
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
106 size_t nstack, stacksize;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
107 th_regex_t **stack;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
108
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
109 th_char_t *buf;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
110 size_t bufSize, bufPos;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
111
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
112 } th_regex_parse_ctx_t;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
113
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
114
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
115 struct th_regex_t
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
116 {
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
117 size_t nnodes, nodessize;
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
118 th_regex_node_t *nodes;
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
119 };
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
120
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
121 /// @endcond
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
122
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
123
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
124 static void th_regex_node_init(th_regex_node_t *node)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
125 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
126 memset(node, 0, sizeof(th_regex_node_t));
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
127 node->mode = TH_RE_MATCH_ONCE;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
128 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
129
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
130
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
131 static int th_regex_strndup(th_char_t **pdst,
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
132 const th_char_t *src, const size_t len)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
133 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
134 if (pdst == NULL)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
135 return THERR_NULLPTR;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
136
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
137 if (UINTPTR_MAX / sizeof(th_char_t) < len + 1)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
138 return THERR_BOUNDS;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
139
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
140 if ((*pdst = (th_char_t *)
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
141 th_malloc((len + 1) * sizeof(th_char_t))) == NULL)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
142 return THERR_MALLOC;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
143
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
144 memcpy(*pdst, src, len * sizeof(th_char_t));
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
145 (*pdst)[len] = 0;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
146
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
147 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
148 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
149
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
150
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
151 static int th_regex_parse_ctx_get_prev_node(
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
152 th_regex_parse_ctx_t *ctx, th_regex_node_t **pnode)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
153 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
154 if (ctx->data != NULL && ctx->data->nnodes > 0)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
155 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
156 *pnode = &ctx->data->nodes[ctx->data->nnodes - 1];
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
157 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
158 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
159 else
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
160 return THERR_INVALID_DATA;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
161 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
162
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
163
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
164 static int th_regex_parse_ctx_push(th_regex_parse_ctx_t *ctx)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
165 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
166 if (ctx->stack == NULL || ctx->nstack + 1 >= ctx->stacksize)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
167 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
168 ctx->stacksize += 16;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
169
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
170 if ((ctx->stack = th_realloc(ctx->stack,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
171 ctx->stacksize * sizeof(th_regex_node_t *))) == NULL)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
172 return THERR_MALLOC;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
173 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
174
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
175 ctx->stack[ctx->nstack] = ctx->data;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
176 ctx->nstack++;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
177 ctx->data = NULL;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
178
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
179 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
180 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
181
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
182
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
183 static int th_regex_parse_ctx_pop(th_regex_parse_ctx_t *ctx, th_regex_t **data)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
184 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
185 if (ctx->nstack > 0)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
186 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
187 *data = ctx->data;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
188 ctx->nstack--;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
189 ctx->data = ctx->stack[ctx->nstack];
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
190 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
191 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
192 else
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
193 return THERR_INVALID_DATA;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
194 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
195
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
196
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
197 static int th_regex_parse_ctx_node_commit(th_regex_parse_ctx_t *ctx, th_regex_node_t *node)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
198 {
705
dee28d507da7 Plug a memory leak.
Matti Hamalainen <ccr@tnsp.org>
parents: 669
diff changeset
199 th_regex_t *data;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
200
711
c91902120e79 Cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 705
diff changeset
201 if (ctx->data == NULL &&
c91902120e79 Cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 705
diff changeset
202 (data = ctx->data = th_malloc0(sizeof(th_regex_t))) == NULL)
c91902120e79 Cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 705
diff changeset
203 return THERR_MALLOC;
705
dee28d507da7 Plug a memory leak.
Matti Hamalainen <ccr@tnsp.org>
parents: 669
diff changeset
204 else
dee28d507da7 Plug a memory leak.
Matti Hamalainen <ccr@tnsp.org>
parents: 669
diff changeset
205 data = ctx->data;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
206
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
207 if (data->nodes == NULL || data->nnodes + 1 >= data->nodessize)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
208 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
209 data->nodessize += 16;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
210 if ((data->nodes = th_realloc(data->nodes,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
211 data->nodessize * sizeof(th_regex_node_t))) == NULL)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
212 return THERR_MALLOC;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
213 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
214
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
215 memcpy(&data->nodes[data->nnodes], node, sizeof(th_regex_node_t));
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
216 data->nnodes++;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
217
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
218 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
219 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
220
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
221
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
222 static bool th_regex_find_next(const th_char_t *str,
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
223 const size_t start, size_t *offs,
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
224 const th_char_t delim)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
225 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
226 for (*offs = start; str[*offs] != 0; (*offs)++)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
227 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
228 if (str[*offs] == delim)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
229 return true;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
230 }
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
231 return false;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
232 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
233
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
234
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
235 static bool th_regex_parse_ssize_t(const th_char_t *str,
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
236 ssize_t *value)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
237 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
238 th_char_t ch;
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
239 bool neg;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
240
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
241 if (*str == '-')
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
242 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
243 str++;
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
244 neg = true;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
245 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
246 else
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
247 neg = false;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
248
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
249 // Is the value negative?
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
250 while ((ch = *str++))
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
251 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
252 if (ch >= '0' && ch <= '9')
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
253 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
254 *value *= 10;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
255 *value += ch - '0';
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
256 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
257 else
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
258 return false;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
259 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
260
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
261 if (neg)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
262 *value = -(*value);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
263
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
264 return true;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
265 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
266
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
267
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
268 static void th_regex_list_item_init(th_regex_list_item_t *item)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
269 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
270 memset(item, 0, sizeof(th_regex_list_item_t));
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
271 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
272
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
273
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
274 static int th_regex_list_add_item(th_regex_list_t *list, th_regex_list_item_t *item)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
275 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
276 if (list->items == NULL || list->nitems + 1 >= list->itemssize)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
277 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
278 list->itemssize += 16;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
279
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
280 if ((list->items = th_realloc(list->items,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
281 list->itemssize * sizeof(th_regex_list_item_t))) == NULL)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
282 return THERR_MALLOC;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
283 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
284
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
285 memcpy(&list->items[list->nitems], item, sizeof(th_regex_list_item_t));
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
286 list->nitems++;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
287
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
288 return THERR_OK;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
289 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
290
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
291
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
292 static void th_regex_list_free(th_regex_list_t *list)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
293 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
294 if (list != NULL)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
295 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
296 for (size_t n = 0; n < list->nitems; n++)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
297 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
298 th_free(list->items[n].chars);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
299 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
300 th_free(list->items);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
301 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
302 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
303
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
304
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
305 static int th_regex_parse_list(const th_char_t *str,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
306 const size_t slen, th_regex_list_t *list)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
307 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
308 th_char_t *tmp = NULL;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
309 th_regex_list_item_t item;
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
310 int res;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
311
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
312 if ((res = th_regex_strndup(&tmp, str, slen)) != THERR_OK)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
313 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
314
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
315 // Handle ranges like [A-Z]
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
316 for (size_t offs = 0; offs < slen; offs++)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
317 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
318 th_char_t
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
319 *prev = (offs > 0) ? tmp + offs - 1 : NULL,
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
320 *curr = tmp + offs,
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
321 *next = (offs + 1 < slen) ? tmp + offs + 1 : NULL;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
322
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
323 if (*curr == '-')
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
324 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
325 if (prev != NULL && next != NULL)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
326 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
327 // Range
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
328 th_regex_list_item_init(&item);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
329 item.type = 1;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
330 item.start = *prev;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
331 item.end = *next;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
332
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
333 if (item.start >= item.end)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
334 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
335 res = THERR_INVALID_DATA;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
336 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
337 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
338
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
339 *curr = *prev = *next = 0;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
340
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
341 if ((res = th_regex_list_add_item(list, &item)) != THERR_OK)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
342 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
343 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
344 else
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
345 if (next != NULL)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
346 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
347 res = THERR_INVALID_DATA;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
348 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
349 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
350 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
351 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
352
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
353 // Count number of remaining characters
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
354 th_regex_list_item_init(&item);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
355 item.type = 0;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
356 item.nchars = 0;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
357
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
358 for (size_t offs = 0; offs < slen; offs++)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
359 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
360 th_char_t curr = tmp[offs];
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
361 if (curr != 0)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
362 item.nchars++;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
363 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
364
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
365 if (item.nchars > 0)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
366 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
367 if ((item.chars = th_malloc(sizeof(th_char_t) * item.nchars)) == NULL)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
368 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
369 res = THERR_MALLOC;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
370 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
371 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
372
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
373 for (size_t offs = 0, n = 0; offs < slen; offs++)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
374 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
375 th_char_t curr = tmp[offs];
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
376 if (curr != 0)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
377 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
378 item.chars[n] = curr;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
379 n++;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
380 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
381 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
382
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
383 if ((res = th_regex_list_add_item(list, &item)) != THERR_OK)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
384 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
385 th_free(item.chars);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
386 goto out;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
387 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
388 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
389
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
390 out:
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
391 th_free(tmp);
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
392 return res;
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
393 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
394
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
395
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
396 static int th_regex_parse_ctx_node_commit_strchr_do(th_regex_parse_ctx_t *ctx,
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
397 const th_char_t *buf, const size_t bufLen)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
398 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
399 th_regex_node_t node;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
400 th_regex_node_init(&node);
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
401
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
402 if (bufLen > 1)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
403 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
404 int res;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
405 node.type = TH_RE_TYPE_STR;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
406 if ((res = th_regex_strndup(&node.match.str, buf, bufLen)) != THERR_OK)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
407 return res;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
408 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
409 else
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
410 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
411 node.type = TH_RE_TYPE_CHAR;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
412 node.match.chr = buf[0];
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
413 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
414
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
415 return th_regex_parse_ctx_node_commit(ctx, &node);
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
416 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
417
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
418
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
419 static int th_regex_parse_ctx_node_commit_strchr(th_regex_parse_ctx_t *ctx, const bool split)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
420 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
421 int res = THERR_OK;;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
422
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
423 if (ctx->bufPos > 0)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
424 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
425 if (ctx->bufPos > 1 && split)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
426 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
427 if ((res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf, ctx->bufPos - 1)) != THERR_OK)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
428 return res;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
429
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
430 res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf + ctx->bufPos - 1, 1);
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
431 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
432 else
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
433 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
434 res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf, ctx->bufPos);
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
435 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
436 ctx->bufPos = 0;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
437 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
438
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
439 return res;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
440 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
441
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
442
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
443 /**
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
444 * Parse given regular expression @p pattern string into compiled/tokenized
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
445 * form as @c th_regex_t structures. Returns @c THERR_OK if successful,
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
446 * or other @c THERR_* return value if not. In either case, the @p pexpr
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
447 * may have been allocated and must be freed via th_regex_free().
657
253a341216b7 Doxygen fixes.
Matti Hamalainen <ccr@tnsp.org>
parents: 655
diff changeset
448 * @param[in,out] pexpr pointer to a pointer of @c th_regex_t structures to be
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
449 * @param[in] pattern regular expression pattern string
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
450 * @returns @c THERR_* return value indicating success or failure
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
451 */
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
452 int th_regex_compile(th_regex_t **pexpr, const th_char_t *pattern)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
453 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
454 int res = THERR_OK;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
455 th_regex_parse_ctx_t ctx;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
456 th_regex_node_t node, *pnode;
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
457 th_char_t *tmp = NULL;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
458 size_t start;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
459
721
c834e1393eb0 Initialize regex parsing context before checking pointers.
Matti Hamalainen <ccr@tnsp.org>
parents: 712
diff changeset
460 memset(&ctx, 0, sizeof(ctx));
c834e1393eb0 Initialize regex parsing context before checking pointers.
Matti Hamalainen <ccr@tnsp.org>
parents: 712
diff changeset
461
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
462 // Check pointers
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
463 if (pexpr == NULL || pattern == NULL)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
464 {
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
465 res = THERR_NULLPTR;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
466 goto out;
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
467 }
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
468
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
469 // Initialize parsing context
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
470 ctx.pattern = pattern;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
471 ctx.bufSize = 256;
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
472
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
473 if ((ctx.buf = th_malloc(ctx.bufSize * sizeof(th_char_t))) == NULL)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
474 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
475 res = THERR_MALLOC;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
476 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
477 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
478
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
479 // Start parsing the pattern
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
480 for (; ctx.pattern[ctx.offs] != 0; ctx.offs++)
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
481 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
482 th_char_t cch = ctx.pattern[ctx.offs];
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
483
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
484 switch (cch)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
485 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
486 case '?':
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
487 case '*':
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
488 case '+':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
489 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, true)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
490 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
491
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
492 if ((res = th_regex_parse_ctx_get_prev_node(&ctx, &pnode)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
493 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
494
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
495 if (cch == '?')
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
496 {
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
497 // Previous token is optional (repeat 0-1 times) (non-greedy matching)
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
498 pnode->mode = TH_RE_MATCH_COUNT;
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
499 pnode->repeatMin = 0;
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
500 pnode->repeatMax = 1;
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
501 }
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
502 else
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
503 {
641
9a1ed82abefd Fix parsing of +? and *?.
Matti Hamalainen <ccr@tnsp.org>
parents: 640
diff changeset
504 // Check if previous was a count ("**", "*+", etc.)
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
505 if (pnode->mode == TH_RE_MATCH_COUNT)
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
506 {
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
507 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
508 goto out;
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
509 }
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
510
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
511 pnode->mode = TH_RE_MATCH_COUNT;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
512
613
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
513 if (cch == '*')
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
514 {
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
515 // Previous token can repeat 0 or more times
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
516 pnode->repeatMin = 0;
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
517 pnode->repeatMax = -1;
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
518 }
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
519 else
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
520 {
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
521 // Previous token must repeat 1 or more times
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
522 pnode->repeatMin = 1;
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
523 pnode->repeatMax = -1;
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
524 }
2e3b81ae8c8a More work on regexes.
Matti Hamalainen <ccr@tnsp.org>
parents: 612
diff changeset
525 }
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
526 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
527
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
528 case '{':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
529 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, true)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
530 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
531
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
532 // {n} | {min,max}
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
533 start = ctx.offs + 1;
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
534 if (!th_regex_find_next(ctx.pattern, start, &ctx.offs, '}'))
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
535 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
536 // End not found
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
537 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
538 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
539 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
540
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
541 th_free(tmp);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
542
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
543 if ((res = th_regex_parse_ctx_get_prev_node(&ctx, &pnode)) != THERR_OK ||
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
544 (res = th_regex_strndup(&tmp, ctx.pattern + start,
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
545 ctx.offs - start)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
546 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
547
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
548 pnode->mode = TH_RE_MATCH_COUNT;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
549
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
550 if (th_regex_find_next(tmp, 0, &start, ','))
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
551 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
552 tmp[start] = 0;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
553 if (!th_regex_parse_ssize_t(tmp, &pnode->repeatMin) ||
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
554 !th_regex_parse_ssize_t(tmp + start + 1, &pnode->repeatMax))
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
555 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
556 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
557 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
558 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
559 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
560 else
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
561 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
562 if (!th_regex_parse_ssize_t(tmp, &pnode->repeatMin))
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
563 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
564 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
565 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
566 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
567 pnode->repeatMax = pnode->repeatMin;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
568 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
569
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
570 if (pnode->repeatMin < 0 || pnode->repeatMax < 1 ||
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
571 pnode->repeatMax < pnode->repeatMin)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
572 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
573 // Invalid repeat counts
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
574 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
575 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
576 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
577 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
578
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
579 /*
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
580 case '|':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
581 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
582 goto out;
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
583
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
584 // Alt pattern .. how to handle these?
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
585 break;
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
586 */
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
587
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
588 case '(':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
589 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
590 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
591
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
592 // Start of subpattern
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
593 if ((res = th_regex_parse_ctx_push(&ctx)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
594 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
595 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
596
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
597 case ')':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
598 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
599 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
600
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
601 // End of subpattern
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
602 th_regex_node_init(&node);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
603 node.type = TH_RE_TYPE_SUBEXPR;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
604
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
605 if ((res = th_regex_parse_ctx_pop(&ctx, &node.match.expr)) != THERR_OK ||
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
606 (res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
607 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
608 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
609
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
610 case '^':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
611 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
612 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
613
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
614 // Start of line anchor
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
615 th_regex_node_init(&node);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
616 node.mode = TH_RE_MATCH_ANCHOR_START;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
617
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
618 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
619 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
620 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
621
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
622 case '$':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
623 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
624 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
625
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
626 // End of line anchor
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
627 th_regex_node_init(&node);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
628 node.mode = TH_RE_MATCH_ANCHOR_END;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
629
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
630 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
631 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
632 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
633
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
634 case '[':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
635 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
636 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
637
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
638 // Start of char list
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
639 start = ctx.offs + 1;
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
640 if (!th_regex_find_next(ctx.pattern, start, &ctx.offs, ']') ||
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
641 ctx.offs == start)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
642 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
643 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
644 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
645 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
646
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
647 th_regex_node_init(&node);
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
648 if (ctx.pattern[start] == '^')
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
649 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
650 node.type = TH_RE_TYPE_LIST_REVERSE;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
651 start++;
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
652 }
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
653 else
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
654 node.type = TH_RE_TYPE_LIST;
638
c4bca120bfb0 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 635
diff changeset
655
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
656 if ((res = th_regex_parse_list(ctx.pattern + start,
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
657 ctx.offs - start, &node.match.list)) != THERR_OK ||
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
658 (res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
659 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
660 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
661
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
662 case '.':
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
663 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
664 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
665
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
666 // Any single character matches
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
667 th_regex_node_init(&node);
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
668 node.type = TH_RE_TYPE_ANY_CHAR;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
669
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
670 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
671 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
672 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
673
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
674 case '\\':
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
675 // Literal escape
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
676 ctx.offs++;
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
677 if (ctx.pattern[ctx.offs] == 0)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
678 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
679 // End of pattern, error
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
680 res = THERR_INVALID_DATA;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
681 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
682 }
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
683 // fall-through
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
684
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
685 default:
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
686 // Given character must match
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
687 if (ctx.bufPos < ctx.bufSize)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
688 ctx.buf[ctx.bufPos++] = ctx.pattern[ctx.offs];
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
689 else
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
690 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
691 goto out;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
692 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
693 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
694 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
695
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
696 // Commit last string/char if any
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
697 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, false)) != THERR_OK)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
698 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
699
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
700 // Create root node
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
701 th_regex_node_init(&node);
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
702 node.type = TH_RE_TYPE_SUBEXPR;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
703 node.match.expr = ctx.data;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
704 ctx.data = NULL;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
705
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
706 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
707 goto out;
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
708
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
709 out:
611
d895b0fd6ad6 Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents: 610
diff changeset
710 *pexpr = ctx.data;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
711
712
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
712 // Free parse context
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
713 for (size_t n = 0; n < ctx.nstack; n++)
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
714 {
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
715 if (ctx.stack[n] != ctx.data)
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
716 {
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
717 th_regex_free(ctx.stack[n]);
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
718 th_free(ctx.stack[n]);
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
719 }
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
720 }
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
721 th_free(ctx.stack);
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
722
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
723 th_free(tmp);
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
724 th_free(ctx.buf);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
725 return res;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
726 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
727
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
728
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
729 /**
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
730 * Deallocate the given regular expression structure @p expr.
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
731 * All associated data will be freed, though pointers may not
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
732 * be NULLed.
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
733 *
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
734 * @param[in] expr structure to be deallocated
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
735 */
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
736 void th_regex_free(th_regex_t *expr)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
737 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
738 if (expr != NULL)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
739 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
740 for (size_t nnode = 0; nnode < expr->nnodes; nnode++)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
741 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
742 th_regex_node_t *node = &expr->nodes[nnode];
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
743
705
dee28d507da7 Plug a memory leak.
Matti Hamalainen <ccr@tnsp.org>
parents: 669
diff changeset
744 th_free(node->match.str);
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
745 th_regex_free(node->match.expr);
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
746 th_regex_list_free(&node->match.list);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
747 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
748
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
749 th_free(expr->nodes);
712
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
750 th_free(expr);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
751 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
752 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
753
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
754
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
755 static void th_regex_dump_indent(th_ioctx_t *fh, const int level)
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
756 {
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
757 for (int indent = 0; indent < level; indent++)
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
758 thfputs(" ", fh);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
759 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
760
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
761
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
762 static void th_regex_dump_node(th_ioctx_t *fh, const th_regex_node_t *node)
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
763 {
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
764 thfprintf(fh,
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
765 "%s %s ",
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
766 re_match_modes[node->mode],
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
767 re_match_types[node->type]);
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
768
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
769 if (node->mode == TH_RE_MATCH_COUNT)
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
770 {
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
771 thfprintf(fh, "min=%" PRId_SSIZE_T ", max=%" PRId_SSIZE_T " : ",
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
772 node->repeatMin, node->repeatMax);
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
773 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
774
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
775 switch (node->type)
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
776 {
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
777 case TH_RE_TYPE_CHAR:
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
778 thfprintf(fh, "'%c'", node->match.chr);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
779 break;
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
780
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
781 case TH_RE_TYPE_STR:
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
782 thfprintf(fh, "\"%s\"", node->match.str);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
783 break;
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
784
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
785 case TH_RE_TYPE_ANY_CHAR:
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
786 thfprintf(fh, ".");
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
787 break;
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
788
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
789 case TH_RE_TYPE_LIST:
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
790 case TH_RE_TYPE_LIST_REVERSE:
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
791 thfputs("[ ", fh);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
792 for (size_t n = 0; n < node->match.list.nitems; n++)
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
793 {
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
794 const th_regex_list_item_t *li = &node->match.list.items[n];
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
795 if (li->type)
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
796 {
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
797 thfprintf(fh, "'%c-%c' ", li->start, li->end);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
798 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
799 else
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
800 {
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
801 for (size_t i = 0; i < li->nchars; i++)
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
802 thfprintf(fh, "'%c' ", li->chars[i]);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
803 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
804 }
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
805 thfputs("]", fh);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
806 break;
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
807 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
808 }
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
809
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
810
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
811 /**
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
812 * Print out the contents of given regular expression structure @p expr
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
813 * in "human-readable" format to specified @c th_ioctx_t context. Typically
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
814 * useful for debugging purposes only.
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
815 *
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
816 * @param[in,out] fh th_ioctx.handle to be used for output, must be writable.
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
817 * @param[in] level starting whitespace indentation level
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
818 * @param[in] expr regular expression structure to be "dumped"
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
819 */
771
c17eadc60c3d Rename th_ioctx struct to th_ioctx_t, for consistency. Breaks API.
Matti Hamalainen <ccr@tnsp.org>
parents: 735
diff changeset
820 void th_regex_dump(th_ioctx_t *fh, const int level, const th_regex_t *expr)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
821 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
822 if (expr != NULL)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
823 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
824 for (size_t nnode = 0; nnode < expr->nnodes; nnode++)
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
825 {
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
826 th_regex_node_t *node = &expr->nodes[nnode];
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
827
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
828 th_regex_dump_indent(fh, level);
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
829
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
830 thfprintf(fh,
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
831 "[%" PRIu_SIZE_T "/%" PRIu_SIZE_T "] ",
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
832 nnode + 1, expr->nnodes);
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
833
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
834 th_regex_dump_node(fh, node);
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
835 thfputs("\n", fh);
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
836
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
837 if (node->type == TH_RE_TYPE_SUBEXPR)
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
838 th_regex_dump(fh, level + 1, node->match.expr);
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
839 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
840 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
841 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
842
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
843
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
844 static bool th_regex_match_list(const th_regex_list_t *list, const th_char_t cch)
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
845 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
846 // Could be optimized, perhaps .. sort match.chars, binary search etc?
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
847 for (size_t nitem = 0; nitem < list->nitems; nitem++)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
848 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
849 const th_regex_list_item_t *item = &list->items[nitem];
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
850
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
851 if (item->type == 0)
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
852 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
853 for (size_t n = 0; n < item->nchars; n++)
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
854 {
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
855 if (item->chars[n] == cch)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
856 return true;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
857 }
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
858 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
859 else
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
860 {
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
861 if (cch >= item->start && cch <= item->end)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
862 return true;
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
863 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
864 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
865
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
866 return false;
639
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
867 }
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
868
8c957ad9d4c3 Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 638
diff changeset
869
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
870 static bool th_regex_match_expr(
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
871 const th_char_t *haystack,
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
872 size_t *offs,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
873 const th_regex_t *expr,
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
874 const size_t startnode,
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
875 const int flags,
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
876 const int level
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
877 );
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
878
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
879
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
880 static bool th_regex_match_one(
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
881 const th_char_t *haystack,
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
882 size_t *offs,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
883 const th_regex_node_t *node,
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
884 const int flags,
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
885 const int level
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
886 )
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
887 {
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
888 th_char_t cch;
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
889 bool res = false;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
890
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
891 switch (node->type)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
892 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
893 case TH_RE_TYPE_SUBEXPR:
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
894 res = th_regex_match_expr(haystack, offs, node->match.expr, 0, flags, level + 1);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
895 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
896
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
897 case TH_RE_TYPE_LIST:
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
898 case TH_RE_TYPE_LIST_REVERSE:
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
899 if ((cch = haystack[*offs]) == 0)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
900 res = false;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
901 else
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
902 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
903 res = th_regex_match_list(&node->match.list, cch);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
904
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
905 if (node->type == TH_RE_TYPE_LIST_REVERSE)
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
906 res = !res;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
907
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
908 (*offs)++;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
909 }
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
910 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
911
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
912 case TH_RE_TYPE_ANY_CHAR:
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
913 if ((cch = haystack[*offs]) == 0)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
914 res = false;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
915 else
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
916 {
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
917 res = true;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
918 (*offs)++;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
919 }
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
920 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
921
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
922 case TH_RE_TYPE_CHAR:
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
923 if ((cch = haystack[*offs]) == 0)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
924 res = false;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
925 else
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
926 {
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
927 res = (cch == node->match.chr);
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
928 (*offs)++;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
929 }
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
930 break;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
931
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
932 case TH_RE_TYPE_STR:
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
933 res = true;
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
934 for (th_char_t *str = node->match.str;
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
935 res && *str != 0;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
936 str++, (*offs)++)
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
937 {
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
938 if (haystack[*offs] != *str)
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
939 res = false;
645
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
940 }
b897995101b7 More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents: 643
diff changeset
941 break;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
942 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
943
638
c4bca120bfb0 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 635
diff changeset
944 return res;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
945 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
946
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
947
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
948 static bool th_regex_match_count(
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
949 const th_char_t *haystack,
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
950 size_t *offs,
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
951 const th_regex_t *expr,
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
952 const th_regex_node_t *node,
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
953 size_t *nnode,
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
954 const int flags,
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
955 const int level
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
956 )
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
957 {
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
958 size_t toffs = *offs, last_offs = *offs;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
959 ssize_t count = 0;
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
960
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
961 do
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
962 {
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
963 // Attempt to match the repeated node once
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
964 size_t poffs = toffs;
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
965
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
966 if (th_regex_match_one(haystack, &poffs, node, flags, level))
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
967 {
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
968 // Matched, increase count of repeats
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
969 count++;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
970 //DBG_RE_PRINT("#%" PRId_SSIZE_T "\n", count);
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
971
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
972 // poffs should now be at position + 1 from match
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
973 }
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
974 else
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
975 {
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
976 // Did not match, get out if repeatMin > 0
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
977 if (node->repeatMin > 0)
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
978 break;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
979 }
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
980
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
981 // Attempt to match rest of the expression
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
982 size_t qoffs1 = poffs, qoffs2 = toffs;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
983 DBG_RE_PRINT("try rest '%s' :: '%s'\n", haystack + qoffs1, haystack + qoffs2);
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
984 if (th_regex_match_expr(haystack, &qoffs1, expr, *nnode + 1, flags, level + 1))
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
985 {
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
986 // Matched
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
987 toffs = last_offs = qoffs1;
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
988
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
989 DBG_RE_PRINT(" yes1: count=%" PRId_SSIZE_T " [%" PRId_SSIZE_T " .. %" PRId_SSIZE_T "]\n", count, node->repeatMin, node->repeatMax);
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
990
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
991 // Check min repeats and if we are "not greedy".
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
992 if (count >= node->repeatMin && node->repeatMax == 1)
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
993 break;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
994
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
995 // Check max repeats
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
996 if (node->repeatMax > 0 && count >= node->repeatMax)
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
997 break;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
998 }
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
999 else
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1000 if (node->repeatMin == 0 &&
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1001 th_regex_match_expr(haystack, &qoffs2, expr, *nnode + 1, flags, level + 1))
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1002 {
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1003 // Matched
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1004 toffs = last_offs = qoffs2;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1005
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1006 DBG_RE_PRINT(" yes2: count=%" PRId_SSIZE_T " [%" PRId_SSIZE_T " .. %" PRId_SSIZE_T "]\n", count, node->repeatMin, node->repeatMax);
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1007
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1008 // Check min repeats and if we are "not greedy".
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1009 if (count >= node->repeatMin && node->repeatMax == 1)
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1010 break;
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1011
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1012 // Check max repeats
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1013 if (node->repeatMax > 0 && count >= node->repeatMax)
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1014 break;
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1015
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1016 }
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1017 else
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1018 {
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1019 // Rest of expression did not match, try again
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1020 DBG_RE_PRINT(" no\n");
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1021 toffs = poffs;
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1022 }
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1023
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1024
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1025 } while (haystack[toffs] != 0);
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1026
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1027 // Check results
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
1028 bool res = count >= node->repeatMin ||
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1029 (node->repeatMax > 0 && count >= node->repeatMax);
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1030
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1031 if (res)
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1032 {
667
039aa00cbfbf Work on regex matcher.
Matti Hamalainen <ccr@tnsp.org>
parents: 666
diff changeset
1033 *offs = last_offs;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1034 *nnode = expr->nnodes;
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1035 }
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1036
666
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1037 DBG_RE_PRINT("RESULT: %s : offs=%" PRIu_SIZE_T "='%s'\n",
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1038 res ? "YES" : "NO",
e1d27caf0dbd More work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 664
diff changeset
1039 *offs, haystack + *offs);
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1040
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1041 return res;
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1042 }
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1043
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1044
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
1045 static bool th_regex_match_expr(
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
1046 const th_char_t *haystack,
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1047 size_t *offs,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1048 const th_regex_t *expr,
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1049 const size_t startnode,
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1050 const int flags,
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1051 const int level
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1052 )
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1053 {
735
31bc1ed07cf5 Renaming BOOL->bool and TRUE/FALSE to true/false, and using stdbool.h if available.
Matti Hamalainen <ccr@tnsp.org>
parents: 726
diff changeset
1054 bool res = true;
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1055 size_t soffs = *offs;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1056
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1057 for (size_t nnode = startnode; res && nnode < expr->nnodes; nnode++)
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1058 {
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1059 const th_regex_node_t *node = &expr->nodes[nnode];
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1060
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1061 #ifdef TH_EXPERIMENTAL_REGEX_DEBUG
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1062 if (th_dbg_fh != NULL)
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
1063 {
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1064 th_regex_dump_indent(th_dbg_fh, level);
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1065
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1066 thfprintf(th_dbg_fh,
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
1067 "[%" PRIu_SIZE_T "/%" PRIu_SIZE_T "] ",
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
1068 nnode + 1, expr->nnodes);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1069
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1070 th_regex_dump_node(th_dbg_fh, node);
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1071
651
18fe45e61b2b Moar re-work.
Matti Hamalainen <ccr@tnsp.org>
parents: 649
diff changeset
1072 thfprintf(th_dbg_fh, " <-> \"%s\"\n",
648
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
1073 haystack + soffs);
91c43398c6fc Twiddle.
Matti Hamalainen <ccr@tnsp.org>
parents: 647
diff changeset
1074 }
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1075 #endif
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1076
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1077 switch (node->mode)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1078 {
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1079 case TH_RE_MATCH_ONCE:
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1080 res = th_regex_match_one(haystack, &soffs, node, flags, level);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1081 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1082
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
1083 case TH_RE_MATCH_COUNT:
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1084 res = th_regex_match_count(haystack, &soffs, expr, node, &nnode, flags, level);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1085 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1086
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1087 case TH_RE_MATCH_ANCHOR_START:
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
1088 res = (soffs == 0);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1089 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1090
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1091 case TH_RE_MATCH_ANCHOR_END:
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
1092 res = (haystack[soffs] == 0);
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1093 break;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1094 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1095 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1096
643
a2bf1ea05b05 Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 642
diff changeset
1097 if (res)
649
2c9260f5cf44 Tweedle.
Matti Hamalainen <ccr@tnsp.org>
parents: 648
diff changeset
1098 *offs = soffs;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1099
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1100 return res;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1101 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1102
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1103
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1104 /**
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1105 * Match the specified string @p haystack against specified compiled
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1106 * regular expression @p expr and return results in optional variables
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1107 * @p pnmatches for number of matches and/or @p pmatches @c th_regex_match_t
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1108 * structures for matching sequences information. If @p pmatches is used,
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1109 * the resulting linked list should be eventually freed via th_regex_free_matches().
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1110 *
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1111 * @param[in] expr regular expression structure to be matched
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1112 * @param[in] haystack string to be matched against
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1113 * @param[out] pnmatches pointer to variable to be set to number of found matches, or @c NULL if the information is not desired
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1114 * @param[out] pmatches pointer to a pointer of @c th_regex_match_t structures, or @c NULL if the information is not desired
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1115 * @param[in] maxmatches maximum number of matches until bailing out, or @c 0 if no limit
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1116 * @param[in] flags additional flags, see @c TH_REF_*
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1117 */
664
c5aa9ada1051 s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents: 657
diff changeset
1118 int th_regex_match(const th_regex_t *expr, const th_char_t *haystack,
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1119 size_t *pnmatches, th_regex_match_t **pmatches, const size_t maxmatches,
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1120 const int flags)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1121 {
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1122 size_t nmatches = 0;
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1123 int level = 0;
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1124 (void) flags;
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1125
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1126 if (pnmatches != NULL)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1127 *pnmatches = 0;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1128
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1129 // Check given pattern and string
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1130 if (expr == NULL || haystack == NULL)
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1131 return THERR_NULLPTR;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1132
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1133 // Start matching
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1134 // XXX NOTE .. lots to think about and to take into account:
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1135 // - anchored and unanchored expressions
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1136 // - how to check if the expression has consumed all possibilities?
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1137 // ..
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1138 for (size_t soffs = 0; haystack[soffs] != 0; )
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1139 {
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1140 size_t coffs = soffs;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1141
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1142 if (th_regex_match_expr(haystack, &coffs, expr, 0, flags, level))
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1143 {
612
cc9ec51b4875 Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents: 611
diff changeset
1144 // A match was found, increase count
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1145 nmatches++;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1146
612
cc9ec51b4875 Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents: 611
diff changeset
1147 // Deliver to caller if required
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1148 if (pnmatches != NULL)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1149 *pnmatches = nmatches;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1150
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1151 if (pmatches != NULL)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1152 {
647
1e7e3f96632e And some more work.
Matti Hamalainen <ccr@tnsp.org>
parents: 645
diff changeset
1153 // Add the match region to the list
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1154 th_regex_match_t *match = th_malloc0(sizeof(th_regex_match_t));
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1155 if (match == NULL)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1156 return THERR_MALLOC;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1157
669
7493d4c9ff77 Add some regex flags, features to be implemented "some day".
Matti Hamalainen <ccr@tnsp.org>
parents: 667
diff changeset
1158 match->type = TH_RE_MATCH_EXPR;
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1159 match->start = soffs;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1160 match->len = coffs - soffs;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1161
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1162 th_llist_append_node((th_llist_t **) pmatches, (th_llist_t *) match);
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1163 }
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1164
612
cc9ec51b4875 Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents: 611
diff changeset
1165 // Check match count limit, if set
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1166 if (maxmatches > 0 && nmatches >= maxmatches)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1167 break;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1168
612
cc9ec51b4875 Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents: 611
diff changeset
1169 // If offset was not advanced, increase by one
cc9ec51b4875 Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents: 611
diff changeset
1170 // otherwise use end of match offset as new start
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1171 if (soffs == coffs)
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1172 soffs++;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1173 else
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1174 soffs = coffs;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1175 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1176 else
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1177 {
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1178 soffs++;
605
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1179 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1180 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1181
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1182 return THERR_OK;
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1183 }
566e6ef41f9d Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1184
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1185
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1186 static void th_regex_free_match(th_regex_match_t *node)
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1187 {
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1188 (void) node;
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1189 // Nothing to do here at the moment
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1190 }
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1191
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1192
655
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1193 /**
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1194 * Deallocate the given set of @c th_regex_match_t
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1195 * linked list structures pointed by @p matches.
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1196 * All associated data will be freed.
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1197 *
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1198 * @param[in] matches structure to be deallocated
ae601363fdad Doxygenization.
Matti Hamalainen <ccr@tnsp.org>
parents: 652
diff changeset
1199 */
640
9e1f9e1d1487 Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents: 639
diff changeset
1200 void th_regex_free_matches(th_regex_match_t *matches)
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1201 {
712
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
1202 th_llist_free_func_data((th_llist_t *) matches,
838189b856f3 Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents: 711
diff changeset
1203 (void (*)(void *)) th_regex_free_match);
610
a0e8d9c6300b A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents: 609
diff changeset
1204 }