Mercurial > hg > th-libs
annotate th_regex.c @ 721:c834e1393eb0
Initialize regex parsing context before checking pointers.
author | Matti Hamalainen <ccr@tnsp.org> |
---|---|
date | Sun, 13 Dec 2020 13:53:06 +0200 |
parents | 838189b856f3 |
children | 4ca6a3b30fe8 |
rev | line source |
---|---|
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1 /* |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
2 * Simple regular expression matching functionality |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
3 * Programmed and designed by Matti 'ccr' Hamalainen |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
4 * (C) Copyright 2020 Tecnic Software productions (TNSP) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
5 * |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
6 * Please read file 'COPYING' for information on license and distribution. |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
7 */ |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
8 #include "th_regex.h" |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
9 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
10 |
635
d191ded8a790
Improve the experimental regex matching debugging macros.
Matti Hamalainen <ccr@tnsp.org>
parents:
614
diff
changeset
|
11 #ifdef TH_EXPERIMENTAL_REGEX_DEBUG |
651 | 12 th_ioctx *th_dbg_fh = NULL; |
647 | 13 |
651 | 14 # define DBG_RE_PRINT(...) do { \ |
15 if (th_dbg_fh != NULL) \ | |
647 | 16 { \ |
651 | 17 th_regex_dump_indent(th_dbg_fh, level); \ |
18 thfprintf(th_dbg_fh, __VA_ARGS__); \ | |
647 | 19 } \ |
20 } while (0) | |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
21 #else |
651 | 22 # define DBG_RE_PRINT(...) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
23 #endif |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
24 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
25 |
655 | 26 /// @cond |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
27 enum |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
28 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
29 TH_RE_MATCH_ONCE, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
30 TH_RE_MATCH_COUNT, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
31 TH_RE_MATCH_ANCHOR_START, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
32 TH_RE_MATCH_ANCHOR_END, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
33 }; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
34 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
35 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
36 enum |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
37 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
38 TH_RE_TYPE_CHAR, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
39 TH_RE_TYPE_STR, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
40 TH_RE_TYPE_ANY_CHAR, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
41 TH_RE_TYPE_LIST, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
42 TH_RE_TYPE_LIST_REVERSE, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
43 TH_RE_TYPE_SUBEXPR, |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
44 }; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
45 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
46 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
47 static const char *re_match_modes[] = |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
48 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
49 "ONCE", |
643 | 50 "COUNT", |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
51 "ANCHOR START", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
52 "ANCHOR END", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
53 }; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
54 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
55 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
56 static const char *re_match_types[] = |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
57 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
58 "CHAR", |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
59 "STR", |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
60 "ANY", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
61 "LIST", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
62 "LIST REVERSE", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
63 "SUBEXPR", |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
64 }; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
65 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
66 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
67 typedef struct |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
68 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
69 int type; |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
70 th_char_t start, end; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
71 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
72 size_t nchars; |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
73 th_char_t *chars; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
74 } th_regex_list_item_t; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
75 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
76 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
77 typedef struct |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
78 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
79 size_t nitems, itemssize; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
80 th_regex_list_item_t *items; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
81 } th_regex_list_t; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
82 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
83 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
84 typedef struct |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
85 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
86 int mode, type; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
87 ssize_t repeatMin, repeatMax; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
88 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
89 struct { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
90 th_char_t chr; |
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
91 th_char_t *str; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
92 th_regex_list_t list; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
93 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
94 th_regex_t *expr; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
95 } match; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
96 } th_regex_node_t; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
97 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
98 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
99 typedef struct |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
100 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
101 const th_char_t *pattern; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
102 size_t offs; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
103 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
104 th_regex_t *data; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
105 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
106 size_t nstack, stacksize; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
107 th_regex_t **stack; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
108 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
109 th_char_t *buf; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
110 size_t bufSize, bufPos; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
111 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
112 } th_regex_parse_ctx_t; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
113 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
114 |
655 | 115 struct th_regex_t |
116 { | |
117 size_t nnodes, nodessize; | |
118 th_regex_node_t *nodes; | |
119 }; | |
120 | |
121 /// @endcond | |
122 | |
123 | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
124 static void th_regex_node_init(th_regex_node_t *node) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
125 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
126 memset(node, 0, sizeof(th_regex_node_t)); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
127 node->mode = TH_RE_MATCH_ONCE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
128 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
129 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
130 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
131 static int th_regex_strndup(th_char_t **pdst, |
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
132 const th_char_t *src, const size_t len) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
133 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
134 if (pdst == NULL) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
135 return THERR_NULLPTR; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
136 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
137 if (UINTPTR_MAX / sizeof(th_char_t) < len + 1) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
138 return THERR_BOUNDS; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
139 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
140 if ((*pdst = (th_char_t *) |
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
141 th_malloc((len + 1) * sizeof(th_char_t))) == NULL) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
142 return THERR_MALLOC; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
143 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
144 memcpy(*pdst, src, len * sizeof(th_char_t)); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
145 (*pdst)[len] = 0; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
146 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
147 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
148 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
149 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
150 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
151 static int th_regex_parse_ctx_get_prev_node( |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
152 th_regex_parse_ctx_t *ctx, th_regex_node_t **pnode) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
153 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
154 if (ctx->data != NULL && ctx->data->nnodes > 0) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
155 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
156 *pnode = &ctx->data->nodes[ctx->data->nnodes - 1]; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
157 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
158 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
159 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
160 return THERR_INVALID_DATA; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
161 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
162 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
163 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
164 static int th_regex_parse_ctx_push(th_regex_parse_ctx_t *ctx) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
165 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
166 if (ctx->stack == NULL || ctx->nstack + 1 >= ctx->stacksize) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
167 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
168 ctx->stacksize += 16; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
169 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
170 if ((ctx->stack = th_realloc(ctx->stack, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
171 ctx->stacksize * sizeof(th_regex_node_t *))) == NULL) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
172 return THERR_MALLOC; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
173 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
174 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
175 ctx->stack[ctx->nstack] = ctx->data; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
176 ctx->nstack++; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
177 ctx->data = NULL; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
178 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
179 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
180 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
181 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
182 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
183 static int th_regex_parse_ctx_pop(th_regex_parse_ctx_t *ctx, th_regex_t **data) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
184 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
185 if (ctx->nstack > 0) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
186 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
187 *data = ctx->data; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
188 ctx->nstack--; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
189 ctx->data = ctx->stack[ctx->nstack]; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
190 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
191 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
192 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
193 return THERR_INVALID_DATA; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
194 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
195 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
196 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
197 static int th_regex_parse_ctx_node_commit(th_regex_parse_ctx_t *ctx, th_regex_node_t *node) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
198 { |
705 | 199 th_regex_t *data; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
200 |
711 | 201 if (ctx->data == NULL && |
202 (data = ctx->data = th_malloc0(sizeof(th_regex_t))) == NULL) | |
203 return THERR_MALLOC; | |
705 | 204 else |
205 data = ctx->data; | |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
206 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
207 if (data->nodes == NULL || data->nnodes + 1 >= data->nodessize) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
208 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
209 data->nodessize += 16; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
210 if ((data->nodes = th_realloc(data->nodes, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
211 data->nodessize * sizeof(th_regex_node_t))) == NULL) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
212 return THERR_MALLOC; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
213 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
214 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
215 memcpy(&data->nodes[data->nnodes], node, sizeof(th_regex_node_t)); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
216 data->nnodes++; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
217 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
218 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
219 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
220 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
221 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
222 static BOOL th_regex_find_next(const th_char_t *str, |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
223 const size_t start, size_t *offs, |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
224 const th_char_t delim) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
225 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
226 for (*offs = start; str[*offs] != 0; (*offs)++) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
227 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
228 if (str[*offs] == delim) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
229 return TRUE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
230 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
231 return FALSE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
232 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
233 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
234 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
235 static BOOL th_regex_parse_ssize_t(const th_char_t *str, |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
236 ssize_t *value) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
237 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
238 th_char_t ch; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
239 BOOL neg; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
240 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
241 if (*str == '-') |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
242 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
243 str++; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
244 neg = TRUE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
245 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
246 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
247 neg = FALSE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
248 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
249 // Is the value negative? |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
250 while ((ch = *str++)) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
251 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
252 if (ch >= '0' && ch <= '9') |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
253 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
254 *value *= 10; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
255 *value += ch - '0'; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
256 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
257 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
258 return FALSE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
259 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
260 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
261 if (neg) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
262 *value = -(*value); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
263 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
264 return TRUE; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
265 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
266 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
267 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
268 static void th_regex_list_item_init(th_regex_list_item_t *item) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
269 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
270 memset(item, 0, sizeof(th_regex_list_item_t)); |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
271 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
272 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
273 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
274 static int th_regex_list_add_item(th_regex_list_t *list, th_regex_list_item_t *item) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
275 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
276 if (list->items == NULL || list->nitems + 1 >= list->itemssize) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
277 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
278 list->itemssize += 16; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
279 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
280 if ((list->items = th_realloc(list->items, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
281 list->itemssize * sizeof(th_regex_list_item_t))) == NULL) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
282 return THERR_MALLOC; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
283 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
284 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
285 memcpy(&list->items[list->nitems], item, sizeof(th_regex_list_item_t)); |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
286 list->nitems++; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
287 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
288 return THERR_OK; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
289 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
290 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
291 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
292 static void th_regex_list_free(th_regex_list_t *list) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
293 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
294 if (list != NULL) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
295 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
296 for (size_t n = 0; n < list->nitems; n++) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
297 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
298 th_free(list->items[n].chars); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
299 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
300 th_free(list->items); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
301 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
302 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
303 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
304 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
305 static int th_regex_parse_list(const th_char_t *str, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
306 const size_t slen, th_regex_list_t *list) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
307 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
308 th_char_t *tmp = NULL; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
309 th_regex_list_item_t item; |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
310 int res; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
311 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
312 if ((res = th_regex_strndup(&tmp, str, slen)) != THERR_OK) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
313 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
314 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
315 // Handle ranges like [A-Z] |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
316 for (size_t offs = 0; offs < slen; offs++) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
317 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
318 th_char_t |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
319 *prev = (offs > 0) ? tmp + offs - 1 : NULL, |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
320 *curr = tmp + offs, |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
321 *next = (offs + 1 < slen) ? tmp + offs + 1 : NULL; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
322 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
323 if (*curr == '-') |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
324 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
325 if (prev != NULL && next != NULL) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
326 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
327 // Range |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
328 th_regex_list_item_init(&item); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
329 item.type = 1; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
330 item.start = *prev; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
331 item.end = *next; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
332 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
333 if (item.start >= item.end) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
334 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
335 res = THERR_INVALID_DATA; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
336 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
337 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
338 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
339 *curr = *prev = *next = 0; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
340 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
341 if ((res = th_regex_list_add_item(list, &item)) != THERR_OK) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
342 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
343 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
344 else |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
345 if (next != NULL) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
346 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
347 res = THERR_INVALID_DATA; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
348 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
349 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
350 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
351 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
352 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
353 // Count number of remaining characters |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
354 th_regex_list_item_init(&item); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
355 item.type = 0; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
356 item.nchars = 0; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
357 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
358 for (size_t offs = 0; offs < slen; offs++) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
359 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
360 th_char_t curr = tmp[offs]; |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
361 if (curr != 0) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
362 item.nchars++; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
363 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
364 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
365 if (item.nchars > 0) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
366 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
367 if ((item.chars = th_malloc(sizeof(th_char_t) * item.nchars)) == NULL) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
368 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
369 res = THERR_MALLOC; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
370 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
371 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
372 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
373 for (size_t offs = 0, n = 0; offs < slen; offs++) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
374 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
375 th_char_t curr = tmp[offs]; |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
376 if (curr != 0) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
377 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
378 item.chars[n] = curr; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
379 n++; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
380 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
381 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
382 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
383 if ((res = th_regex_list_add_item(list, &item)) != THERR_OK) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
384 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
385 th_free(item.chars); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
386 goto out; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
387 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
388 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
389 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
390 out: |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
391 th_free(tmp); |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
392 return res; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
393 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
394 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
395 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
396 static int th_regex_parse_ctx_node_commit_strchr_do(th_regex_parse_ctx_t *ctx, |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
397 const th_char_t *buf, const size_t bufLen) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
398 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
399 th_regex_node_t node; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
400 th_regex_node_init(&node); |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
401 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
402 if (bufLen > 1) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
403 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
404 int res; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
405 node.type = TH_RE_TYPE_STR; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
406 if ((res = th_regex_strndup(&node.match.str, buf, bufLen)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
407 return res; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
408 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
409 else |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
410 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
411 node.type = TH_RE_TYPE_CHAR; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
412 node.match.chr = buf[0]; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
413 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
414 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
415 return th_regex_parse_ctx_node_commit(ctx, &node); |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
416 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
417 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
418 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
419 static int th_regex_parse_ctx_node_commit_strchr(th_regex_parse_ctx_t *ctx, const BOOL split) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
420 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
421 int res = THERR_OK;; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
422 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
423 if (ctx->bufPos > 0) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
424 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
425 if (ctx->bufPos > 1 && split) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
426 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
427 if ((res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf, ctx->bufPos - 1)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
428 return res; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
429 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
430 res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf + ctx->bufPos - 1, 1); |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
431 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
432 else |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
433 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
434 res = th_regex_parse_ctx_node_commit_strchr_do(ctx, ctx->buf, ctx->bufPos); |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
435 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
436 ctx->bufPos = 0; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
437 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
438 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
439 return res; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
440 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
441 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
442 |
655 | 443 /** |
444 * Parse given regular expression @p pattern string into compiled/tokenized | |
445 * form as @c th_regex_t structures. Returns @c THERR_OK if successful, | |
446 * or other @c THERR_* return value if not. In either case, the @p pexpr | |
447 * may have been allocated and must be freed via th_regex_free(). | |
657 | 448 * @param[in,out] pexpr pointer to a pointer of @c th_regex_t structures to be |
655 | 449 * @param[in] pattern regular expression pattern string |
450 * @returns @c THERR_* return value indicating success or failure | |
451 */ | |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
452 int th_regex_compile(th_regex_t **pexpr, const th_char_t *pattern) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
453 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
454 int res = THERR_OK; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
455 th_regex_parse_ctx_t ctx; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
456 th_regex_node_t node, *pnode; |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
457 th_char_t *tmp = NULL; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
458 size_t start; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
459 |
721
c834e1393eb0
Initialize regex parsing context before checking pointers.
Matti Hamalainen <ccr@tnsp.org>
parents:
712
diff
changeset
|
460 memset(&ctx, 0, sizeof(ctx)); |
c834e1393eb0
Initialize regex parsing context before checking pointers.
Matti Hamalainen <ccr@tnsp.org>
parents:
712
diff
changeset
|
461 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
462 // Check pointers |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
463 if (pexpr == NULL || pattern == NULL) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
464 { |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
465 res = THERR_NULLPTR; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
466 goto out; |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
467 } |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
468 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
469 // Initialize parsing context |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
470 ctx.pattern = pattern; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
471 ctx.bufSize = 256; |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
472 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
473 if ((ctx.buf = th_malloc(ctx.bufSize * sizeof(th_char_t))) == NULL) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
474 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
475 res = THERR_MALLOC; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
476 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
477 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
478 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
479 // Start parsing the pattern |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
480 for (; ctx.pattern[ctx.offs] != 0; ctx.offs++) |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
481 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
482 th_char_t cch = ctx.pattern[ctx.offs]; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
483 |
613 | 484 switch (cch) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
485 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
486 case '?': |
613 | 487 case '*': |
488 case '+': | |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
489 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, TRUE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
490 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
491 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
492 if ((res = th_regex_parse_ctx_get_prev_node(&ctx, &pnode)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
493 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
494 |
613 | 495 if (cch == '?') |
496 { | |
643 | 497 // Previous token is optional (repeat 0-1 times) (non-greedy matching) |
498 pnode->mode = TH_RE_MATCH_COUNT; | |
499 pnode->repeatMin = 0; | |
500 pnode->repeatMax = 1; | |
613 | 501 } |
502 else | |
503 { | |
641 | 504 // Check if previous was a count ("**", "*+", etc.) |
643 | 505 if (pnode->mode == TH_RE_MATCH_COUNT) |
613 | 506 { |
507 res = THERR_INVALID_DATA; | |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
508 goto out; |
613 | 509 } |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
510 |
643 | 511 pnode->mode = TH_RE_MATCH_COUNT; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
512 |
613 | 513 if (cch == '*') |
514 { | |
515 // Previous token can repeat 0 or more times | |
516 pnode->repeatMin = 0; | |
517 pnode->repeatMax = -1; | |
518 } | |
519 else | |
520 { | |
521 // Previous token must repeat 1 or more times | |
522 pnode->repeatMin = 1; | |
523 pnode->repeatMax = -1; | |
524 } | |
525 } | |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
526 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
527 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
528 case '{': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
529 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, TRUE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
530 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
531 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
532 // {n} | {min,max} |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
533 start = ctx.offs + 1; |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
534 if (!th_regex_find_next(ctx.pattern, start, &ctx.offs, '}')) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
535 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
536 // End not found |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
537 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
538 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
539 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
540 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
541 th_free(tmp); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
542 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
543 if ((res = th_regex_parse_ctx_get_prev_node(&ctx, &pnode)) != THERR_OK || |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
544 (res = th_regex_strndup(&tmp, ctx.pattern + start, |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
545 ctx.offs - start)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
546 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
547 |
643 | 548 pnode->mode = TH_RE_MATCH_COUNT; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
549 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
550 if (th_regex_find_next(tmp, 0, &start, ',')) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
551 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
552 tmp[start] = 0; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
553 if (!th_regex_parse_ssize_t(tmp, &pnode->repeatMin) || |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
554 !th_regex_parse_ssize_t(tmp + start + 1, &pnode->repeatMax)) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
555 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
556 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
557 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
558 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
559 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
560 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
561 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
562 if (!th_regex_parse_ssize_t(tmp, &pnode->repeatMin)) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
563 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
564 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
565 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
566 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
567 pnode->repeatMax = pnode->repeatMin; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
568 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
569 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
570 if (pnode->repeatMin < 0 || pnode->repeatMax < 1 || |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
571 pnode->repeatMax < pnode->repeatMin) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
572 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
573 // Invalid repeat counts |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
574 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
575 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
576 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
577 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
578 |
648 | 579 /* |
580 case '|': | |
581 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) | |
582 goto out; | |
583 | |
584 // Alt pattern .. how to handle these? | |
585 break; | |
586 */ | |
587 | |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
588 case '(': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
589 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
590 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
591 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
592 // Start of subpattern |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
593 if ((res = th_regex_parse_ctx_push(&ctx)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
594 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
595 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
596 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
597 case ')': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
598 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
599 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
600 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
601 // End of subpattern |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
602 th_regex_node_init(&node); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
603 node.type = TH_RE_TYPE_SUBEXPR; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
604 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
605 if ((res = th_regex_parse_ctx_pop(&ctx, &node.match.expr)) != THERR_OK || |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
606 (res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
607 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
608 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
609 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
610 case '^': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
611 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
612 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
613 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
614 // Start of line anchor |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
615 th_regex_node_init(&node); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
616 node.mode = TH_RE_MATCH_ANCHOR_START; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
617 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
618 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
619 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
620 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
621 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
622 case '$': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
623 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
624 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
625 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
626 // End of line anchor |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
627 th_regex_node_init(&node); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
628 node.mode = TH_RE_MATCH_ANCHOR_END; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
629 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
630 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
631 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
632 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
633 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
634 case '[': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
635 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
636 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
637 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
638 // Start of char list |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
639 start = ctx.offs + 1; |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
640 if (!th_regex_find_next(ctx.pattern, start, &ctx.offs, ']') || |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
641 ctx.offs == start) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
642 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
643 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
644 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
645 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
646 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
647 th_regex_node_init(&node); |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
648 if (ctx.pattern[start] == '^') |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
649 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
650 node.type = TH_RE_TYPE_LIST_REVERSE; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
651 start++; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
652 } |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
653 else |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
654 node.type = TH_RE_TYPE_LIST; |
638 | 655 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
656 if ((res = th_regex_parse_list(ctx.pattern + start, |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
657 ctx.offs - start, &node.match.list)) != THERR_OK || |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
658 (res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
659 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
660 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
661 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
662 case '.': |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
663 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
664 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
665 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
666 // Any single character matches |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
667 th_regex_node_init(&node); |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
668 node.type = TH_RE_TYPE_ANY_CHAR; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
669 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
670 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
671 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
672 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
673 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
674 case '\\': |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
675 // Literal escape |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
676 ctx.offs++; |
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
677 if (ctx.pattern[ctx.offs] == 0) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
678 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
679 // End of pattern, error |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
680 res = THERR_INVALID_DATA; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
681 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
682 } |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
683 // fall-through |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
684 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
685 default: |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
686 // Given character must match |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
687 if (ctx.bufPos < ctx.bufSize) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
688 ctx.buf[ctx.bufPos++] = ctx.pattern[ctx.offs]; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
689 else |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
690 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
691 goto out; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
692 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
693 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
694 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
695 |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
696 // Commit last string/char if any |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
697 if ((res = th_regex_parse_ctx_node_commit_strchr(&ctx, FALSE)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
698 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
699 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
700 // Create root node |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
701 th_regex_node_init(&node); |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
702 node.type = TH_RE_TYPE_SUBEXPR; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
703 node.match.expr = ctx.data; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
704 ctx.data = NULL; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
705 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
706 if ((res = th_regex_parse_ctx_node_commit(&ctx, &node)) != THERR_OK) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
707 goto out; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
708 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
709 out: |
611
d895b0fd6ad6
Combine code from th_regex_compile() to th_regex_compile_do().
Matti Hamalainen <ccr@tnsp.org>
parents:
610
diff
changeset
|
710 *pexpr = ctx.data; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
711 |
712
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
712 // Free parse context |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
713 for (size_t n = 0; n < ctx.nstack; n++) |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
714 { |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
715 if (ctx.stack[n] != ctx.data) |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
716 { |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
717 th_regex_free(ctx.stack[n]); |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
718 th_free(ctx.stack[n]); |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
719 } |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
720 } |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
721 th_free(ctx.stack); |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
722 |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
723 th_free(tmp); |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
724 th_free(ctx.buf); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
725 return res; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
726 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
727 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
728 |
655 | 729 /** |
730 * Deallocate the given regular expression structure @p expr. | |
731 * All associated data will be freed, though pointers may not | |
732 * be NULLed. | |
733 * | |
734 * @param[in] expr structure to be deallocated | |
735 */ | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
736 void th_regex_free(th_regex_t *expr) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
737 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
738 if (expr != NULL) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
739 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
740 for (size_t nnode = 0; nnode < expr->nnodes; nnode++) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
741 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
742 th_regex_node_t *node = &expr->nodes[nnode]; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
743 |
705 | 744 th_free(node->match.str); |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
745 th_regex_free(node->match.expr); |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
746 th_regex_list_free(&node->match.list); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
747 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
748 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
749 th_free(expr->nodes); |
712
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
750 th_free(expr); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
751 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
752 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
753 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
754 |
651 | 755 static void th_regex_dump_indent(th_ioctx *fh, const int level) |
647 | 756 { |
757 for (int indent = 0; indent < level; indent++) | |
651 | 758 thfputs(" ", fh); |
647 | 759 } |
760 | |
761 | |
651 | 762 static void th_regex_dump_node(th_ioctx *fh, const th_regex_node_t *node) |
647 | 763 { |
651 | 764 thfprintf(fh, |
647 | 765 "%s %s ", |
766 re_match_modes[node->mode], | |
767 re_match_types[node->type]); | |
768 | |
769 if (node->mode == TH_RE_MATCH_COUNT) | |
770 { | |
651 | 771 thfprintf(fh, "min=%" PRId_SSIZE_T ", max=%" PRId_SSIZE_T " : ", |
647 | 772 node->repeatMin, node->repeatMax); |
773 } | |
774 | |
775 switch (node->type) | |
776 { | |
777 case TH_RE_TYPE_CHAR: | |
651 | 778 thfprintf(fh, "'%c'", node->match.chr); |
647 | 779 break; |
780 | |
781 case TH_RE_TYPE_STR: | |
651 | 782 thfprintf(fh, "\"%s\"", node->match.str); |
647 | 783 break; |
784 | |
785 case TH_RE_TYPE_ANY_CHAR: | |
651 | 786 thfprintf(fh, "."); |
647 | 787 break; |
788 | |
789 case TH_RE_TYPE_LIST: | |
790 case TH_RE_TYPE_LIST_REVERSE: | |
651 | 791 thfputs("[ ", fh); |
647 | 792 for (size_t n = 0; n < node->match.list.nitems; n++) |
793 { | |
794 const th_regex_list_item_t *li = &node->match.list.items[n]; | |
795 if (li->type) | |
796 { | |
651 | 797 thfprintf(fh, "'%c-%c' ", li->start, li->end); |
647 | 798 } |
799 else | |
800 { | |
801 for (size_t i = 0; i < li->nchars; i++) | |
651 | 802 thfprintf(fh, "'%c' ", li->chars[i]); |
647 | 803 } |
804 } | |
651 | 805 thfputs("]", fh); |
647 | 806 break; |
807 } | |
808 } | |
809 | |
810 | |
655 | 811 /** |
812 * Print out the contents of given regular expression structure @p expr | |
813 * in "human-readable" format to specified @c th_ioctx context. Typically | |
814 * useful for debugging purposes only. | |
815 * | |
816 * @param[in,out] fh th_ioctx handle to be used for output, must be writable. | |
817 * @param[in] level starting whitespace indentation level | |
818 * @param[in] expr regular expression structure to be "dumped" | |
819 */ | |
651 | 820 void th_regex_dump(th_ioctx *fh, const int level, const th_regex_t *expr) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
821 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
822 if (expr != NULL) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
823 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
824 for (size_t nnode = 0; nnode < expr->nnodes; nnode++) |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
825 { |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
826 th_regex_node_t *node = &expr->nodes[nnode]; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
827 |
647 | 828 th_regex_dump_indent(fh, level); |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
829 |
651 | 830 thfprintf(fh, |
647 | 831 "[%" PRIu_SIZE_T "/%" PRIu_SIZE_T "] ", |
832 nnode + 1, expr->nnodes); | |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
833 |
647 | 834 th_regex_dump_node(fh, node); |
651 | 835 thfputs("\n", fh); |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
836 |
647 | 837 if (node->type == TH_RE_TYPE_SUBEXPR) |
838 th_regex_dump(fh, level + 1, node->match.expr); | |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
839 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
840 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
841 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
842 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
843 |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
844 static BOOL th_regex_match_list(const th_regex_list_t *list, const th_char_t cch) |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
845 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
846 // Could be optimized, perhaps .. sort match.chars, binary search etc? |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
847 for (size_t nitem = 0; nitem < list->nitems; nitem++) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
848 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
849 const th_regex_list_item_t *item = &list->items[nitem]; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
850 |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
851 if (item->type == 0) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
852 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
853 for (size_t n = 0; n < item->nchars; n++) |
649 | 854 { |
855 if (item->chars[n] == cch) | |
856 return TRUE; | |
857 } | |
639
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
858 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
859 else |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
860 { |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
861 if (cch >= item->start && cch <= item->end) |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
862 return TRUE; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
863 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
864 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
865 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
866 return FALSE; |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
867 } |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
868 |
8c957ad9d4c3
Some more work on regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
638
diff
changeset
|
869 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
870 static BOOL th_regex_match_expr( |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
871 const th_char_t *haystack, |
649 | 872 size_t *offs, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
873 const th_regex_t *expr, |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
874 const size_t startnode, |
647 | 875 const int flags, |
876 const int level | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
877 ); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
878 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
879 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
880 static BOOL th_regex_match_one( |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
881 const th_char_t *haystack, |
649 | 882 size_t *offs, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
883 const th_regex_node_t *node, |
647 | 884 const int flags, |
885 const int level | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
886 ) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
887 { |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
888 th_char_t cch; |
638 | 889 BOOL res = FALSE; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
890 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
891 switch (node->type) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
892 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
893 case TH_RE_TYPE_SUBEXPR: |
649 | 894 res = th_regex_match_expr(haystack, offs, node->match.expr, 0, flags, level + 1); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
895 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
896 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
897 case TH_RE_TYPE_LIST: |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
898 case TH_RE_TYPE_LIST_REVERSE: |
649 | 899 if ((cch = haystack[*offs]) == 0) |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
900 res = FALSE; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
901 else |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
902 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
903 res = th_regex_match_list(&node->match.list, cch); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
904 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
905 if (node->type == TH_RE_TYPE_LIST_REVERSE) |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
906 res = !res; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
907 |
649 | 908 (*offs)++; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
909 } |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
910 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
911 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
912 case TH_RE_TYPE_ANY_CHAR: |
649 | 913 if ((cch = haystack[*offs]) == 0) |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
914 res = FALSE; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
915 else |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
916 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
917 res = TRUE; |
649 | 918 (*offs)++; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
919 } |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
920 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
921 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
922 case TH_RE_TYPE_CHAR: |
649 | 923 if ((cch = haystack[*offs]) == 0) |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
924 res = FALSE; |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
925 else |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
926 { |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
927 res = (cch == node->match.chr); |
649 | 928 (*offs)++; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
929 } |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
930 break; |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
931 |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
932 case TH_RE_TYPE_STR: |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
933 res = TRUE; |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
934 for (th_char_t *str = node->match.str; |
648 | 935 res && *str != 0; |
649 | 936 str++, (*offs)++) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
937 { |
649 | 938 if (haystack[*offs] != *str) |
645
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
939 res = FALSE; |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
940 } |
b897995101b7
More fiddling and twiddling. Add parsing to string nodes instead of separate character nodes.
Matti Hamalainen <ccr@tnsp.org>
parents:
643
diff
changeset
|
941 break; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
942 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
943 |
638 | 944 return res; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
945 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
946 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
947 |
649 | 948 static BOOL th_regex_match_count( |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
949 const th_char_t *haystack, |
649 | 950 size_t *offs, |
951 const th_regex_t *expr, | |
952 const th_regex_node_t *node, | |
953 size_t *nnode, | |
954 const int flags, | |
955 const int level | |
956 ) | |
957 { | |
667 | 958 size_t toffs = *offs, last_offs = *offs; |
649 | 959 ssize_t count = 0; |
960 | |
961 do | |
962 { | |
666 | 963 // Attempt to match the repeated node once |
667 | 964 size_t poffs = toffs; |
965 if (th_regex_match_one(haystack, &poffs, node, flags, level)) | |
966 { | |
967 // Matched, increase count of repeats | |
968 count++; | |
969 //DBG_RE_PRINT("#%" PRId_SSIZE_T "\n", count); | |
970 | |
971 // poffs should now be at position + 1 from match | |
972 } | |
973 else | |
974 { | |
975 // Did not match, get out if repeatMin > 0 | |
976 if (node->repeatMin > 0) | |
977 break; | |
978 } | |
979 | |
980 // Attempt to match rest of the expression | |
981 size_t qoffs1 = poffs, qoffs2 = toffs; | |
982 DBG_RE_PRINT("try rest '%s' :: '%s'\n", haystack + qoffs1, haystack + qoffs2); | |
983 if (th_regex_match_expr(haystack, &qoffs1, expr, *nnode + 1, flags, level + 1)) | |
984 { | |
985 // Matched | |
986 toffs = last_offs = qoffs1; | |
666 | 987 |
667 | 988 DBG_RE_PRINT(" yes1: count=%" PRId_SSIZE_T " [%" PRId_SSIZE_T " .. %" PRId_SSIZE_T "]\n", count, node->repeatMin, node->repeatMax); |
989 | |
990 // Check min repeats and if we are "not greedy". | |
991 if (count >= node->repeatMin && node->repeatMax == 1) | |
992 break; | |
993 | |
994 // Check max repeats | |
995 if (node->repeatMax > 0 && count >= node->repeatMax) | |
996 break; | |
997 } | |
998 else | |
999 if (node->repeatMin == 0 && | |
1000 th_regex_match_expr(haystack, &qoffs2, expr, *nnode + 1, flags, level + 1)) | |
649 | 1001 { |
667 | 1002 // Matched |
1003 toffs = last_offs = qoffs2; | |
1004 | |
1005 DBG_RE_PRINT(" yes2: count=%" PRId_SSIZE_T " [%" PRId_SSIZE_T " .. %" PRId_SSIZE_T "]\n", count, node->repeatMin, node->repeatMax); | |
1006 | |
1007 // Check min repeats and if we are "not greedy". | |
1008 if (count >= node->repeatMin && node->repeatMax == 1) | |
1009 break; | |
1010 | |
1011 // Check max repeats | |
1012 if (node->repeatMax > 0 && count >= node->repeatMax) | |
1013 break; | |
666 | 1014 |
649 | 1015 } |
1016 else | |
666 | 1017 { |
667 | 1018 // Rest of expression did not match, try again |
1019 DBG_RE_PRINT(" no\n"); | |
1020 toffs = poffs; | |
666 | 1021 } |
649 | 1022 |
1023 | |
667 | 1024 } while (haystack[toffs] != 0); |
649 | 1025 |
667 | 1026 // Check results |
1027 BOOL res = count >= node->repeatMin || | |
1028 (node->repeatMax > 0 && count >= node->repeatMax); | |
666 | 1029 |
1030 if (res) | |
649 | 1031 { |
667 | 1032 *offs = last_offs; |
649 | 1033 *nnode = expr->nnodes; |
1034 } | |
1035 | |
666 | 1036 DBG_RE_PRINT("RESULT: %s : offs=%" PRIu_SIZE_T "='%s'\n", |
1037 res ? "YES" : "NO", | |
1038 *offs, haystack + *offs); | |
649 | 1039 |
1040 return res; | |
1041 } | |
1042 | |
1043 | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1044 static BOOL th_regex_match_expr( |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
1045 const th_char_t *haystack, |
649 | 1046 size_t *offs, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1047 const th_regex_t *expr, |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1048 const size_t startnode, |
647 | 1049 const int flags, |
1050 const int level | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1051 ) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1052 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1053 BOOL res = TRUE; |
649 | 1054 size_t soffs = *offs; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1055 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1056 for (size_t nnode = startnode; res && nnode < expr->nnodes; nnode++) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1057 { |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1058 const th_regex_node_t *node = &expr->nodes[nnode]; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1059 |
647 | 1060 #ifdef TH_EXPERIMENTAL_REGEX_DEBUG |
651 | 1061 if (th_dbg_fh != NULL) |
648 | 1062 { |
651 | 1063 th_regex_dump_indent(th_dbg_fh, level); |
1064 | |
1065 thfprintf(th_dbg_fh, | |
648 | 1066 "[%" PRIu_SIZE_T "/%" PRIu_SIZE_T "] ", |
1067 nnode + 1, expr->nnodes); | |
647 | 1068 |
651 | 1069 th_regex_dump_node(th_dbg_fh, node); |
647 | 1070 |
651 | 1071 thfprintf(th_dbg_fh, " <-> \"%s\"\n", |
648 | 1072 haystack + soffs); |
1073 } | |
647 | 1074 #endif |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1075 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1076 switch (node->mode) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1077 { |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1078 case TH_RE_MATCH_ONCE: |
647 | 1079 res = th_regex_match_one(haystack, &soffs, node, flags, level); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1080 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1081 |
643 | 1082 case TH_RE_MATCH_COUNT: |
649 | 1083 res = th_regex_match_count(haystack, &soffs, expr, node, &nnode, flags, level); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1084 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1085 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1086 case TH_RE_MATCH_ANCHOR_START: |
643 | 1087 res = (soffs == 0); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1088 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1089 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1090 case TH_RE_MATCH_ANCHOR_END: |
643 | 1091 res = (haystack[soffs] == 0); |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1092 break; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1093 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1094 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1095 |
643 | 1096 if (res) |
649 | 1097 *offs = soffs; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1098 |
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1099 return res; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1100 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1101 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1102 |
655 | 1103 /** |
1104 * Match the specified string @p haystack against specified compiled | |
1105 * regular expression @p expr and return results in optional variables | |
1106 * @p pnmatches for number of matches and/or @p pmatches @c th_regex_match_t | |
1107 * structures for matching sequences information. If @p pmatches is used, | |
1108 * the resulting linked list should be eventually freed via th_regex_free_matches(). | |
1109 * | |
1110 * @param[in] expr regular expression structure to be matched | |
1111 * @param[in] haystack string to be matched against | |
1112 * @param[out] pnmatches pointer to variable to be set to number of found matches, or @c NULL if the information is not desired | |
1113 * @param[out] pmatches pointer to a pointer of @c th_regex_match_t structures, or @c NULL if the information is not desired | |
1114 * @param[in] maxmatches maximum number of matches until bailing out, or @c 0 if no limit | |
1115 * @param[in] flags additional flags, see @c TH_REF_* | |
1116 */ | |
664
c5aa9ada1051
s/th_regex_char_t/th_char_t/g
Matti Hamalainen <ccr@tnsp.org>
parents:
657
diff
changeset
|
1117 int th_regex_match(const th_regex_t *expr, const th_char_t *haystack, |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1118 size_t *pnmatches, th_regex_match_t **pmatches, const size_t maxmatches, |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1119 const int flags) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1120 { |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1121 size_t nmatches = 0; |
647 | 1122 int level = 0; |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1123 (void) flags; |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1124 |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1125 if (pnmatches != NULL) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1126 *pnmatches = 0; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1127 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1128 // Check given pattern and string |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1129 if (expr == NULL || haystack == NULL) |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1130 return THERR_NULLPTR; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1131 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1132 // Start matching |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1133 // XXX NOTE .. lots to think about and to take into account: |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1134 // - anchored and unanchored expressions |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1135 // - how to check if the expression has consumed all possibilities? |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1136 // .. |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1137 for (size_t soffs = 0; haystack[soffs] != 0; ) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1138 { |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1139 size_t coffs = soffs; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1140 |
647 | 1141 if (th_regex_match_expr(haystack, &coffs, expr, 0, flags, level)) |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1142 { |
612
cc9ec51b4875
Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents:
611
diff
changeset
|
1143 // A match was found, increase count |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1144 nmatches++; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1145 |
612
cc9ec51b4875
Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents:
611
diff
changeset
|
1146 // Deliver to caller if required |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1147 if (pnmatches != NULL) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1148 *pnmatches = nmatches; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1149 |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1150 if (pmatches != NULL) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1151 { |
647 | 1152 // Add the match region to the list |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1153 th_regex_match_t *match = th_malloc0(sizeof(th_regex_match_t)); |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1154 if (match == NULL) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1155 return THERR_MALLOC; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1156 |
669
7493d4c9ff77
Add some regex flags, features to be implemented "some day".
Matti Hamalainen <ccr@tnsp.org>
parents:
667
diff
changeset
|
1157 match->type = TH_RE_MATCH_EXPR; |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1158 match->start = soffs; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1159 match->len = coffs - soffs; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1160 |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1161 th_llist_append_node((th_llist_t **) pmatches, (th_llist_t *) match); |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1162 } |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1163 |
612
cc9ec51b4875
Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents:
611
diff
changeset
|
1164 // Check match count limit, if set |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1165 if (maxmatches > 0 && nmatches >= maxmatches) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1166 break; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1167 |
612
cc9ec51b4875
Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents:
611
diff
changeset
|
1168 // If offset was not advanced, increase by one |
cc9ec51b4875
Add some comments and debug messages.
Matti Hamalainen <ccr@tnsp.org>
parents:
611
diff
changeset
|
1169 // otherwise use end of match offset as new start |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1170 if (soffs == coffs) |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1171 soffs++; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1172 else |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1173 soffs = coffs; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1174 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1175 else |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1176 { |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1177 soffs++; |
605
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1178 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1179 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1180 |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1181 return THERR_OK; |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1182 } |
566e6ef41f9d
Initial commit of the highly experimental and unfinished regular expression
Matti Hamalainen <ccr@tnsp.org>
parents:
diff
changeset
|
1183 |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1184 |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1185 static void th_regex_free_match(th_regex_match_t *node) |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1186 { |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1187 (void) node; |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1188 // Nothing to do here at the moment |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1189 } |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1190 |
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1191 |
655 | 1192 /** |
1193 * Deallocate the given set of @c th_regex_match_t | |
1194 * linked list structures pointed by @p matches. | |
1195 * All associated data will be freed. | |
1196 * | |
1197 * @param[in] matches structure to be deallocated | |
1198 */ | |
640
9e1f9e1d1487
Aaand some more work. Still just a broken concept.
Matti Hamalainen <ccr@tnsp.org>
parents:
639
diff
changeset
|
1199 void th_regex_free_matches(th_regex_match_t *matches) |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1200 { |
712
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
1201 th_llist_free_func_data((th_llist_t *) matches, |
838189b856f3
Fix various memory leaks in th_regex. Not that it is usable anyway yet.
Matti Hamalainen <ccr@tnsp.org>
parents:
711
diff
changeset
|
1202 (void (*)(void *)) th_regex_free_match); |
610
a0e8d9c6300b
A bit more work on the regex stuff.
Matti Hamalainen <ccr@tnsp.org>
parents:
609
diff
changeset
|
1203 } |